-
Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures
Authors:
Paul Viallard,
Rémi Emonet,
Amaury Habrard,
Emilie Morvant,
Valentina Zantedeschi
Abstract:
In statistical learning theory, a generalization bound usually involves a complexity measure imposed by the considered theoretical framework. This limits the scope of such bounds, as other forms of capacity measures or regularizations are used in algorithms. In this paper, we leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitra…
▽ More
In statistical learning theory, a generalization bound usually involves a complexity measure imposed by the considered theoretical framework. This limits the scope of such bounds, as other forms of capacity measures or regularizations are used in algorithms. In this paper, we leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures. One trick to prove such a result involves considering a commonly used family of distributions: the Gibbs distributions. Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap as it can be customized to fit both the hypothesis class and the task.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound
Authors:
Valentina Zantedeschi,
Paul Viallard,
Emilie Morvant,
Rémi Emonet,
Amaury Habrard,
Pascal Germain,
Benjamin Guedj
Abstract:
We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective.…
▽ More
We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective. The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PAC-Bayes objectives -- both with uninformed (data-independent) and informed (data-dependent) priors.
△ Less
Submitted 19 October, 2021; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Self-Bounding Majority Vote Learning Algorithms by the Direct Minimization of a Tight PAC-Bayesian C-Bound
Authors:
Paul Viallard,
Pascal Germain,
Amaury Habrard,
Emilie Morvant
Abstract:
In the PAC-Bayesian literature, the C-Bound refers to an insightful relation between the risk of a majority vote classifier (under the zero-one loss) and the first two moments of its margin (i.e., the expected margin and the voters' diversity). Until now, learning algorithms developed in this framework minimize the empirical version of the C-Bound, instead of explicit PAC-Bayesian generalization b…
▽ More
In the PAC-Bayesian literature, the C-Bound refers to an insightful relation between the risk of a majority vote classifier (under the zero-one loss) and the first two moments of its margin (i.e., the expected margin and the voters' diversity). Until now, learning algorithms developed in this framework minimize the empirical version of the C-Bound, instead of explicit PAC-Bayesian generalization bounds. In this paper, by directly optimizing PAC-Bayesian guarantees on the C-Bound, we derive self-bounding majority vote learning algorithms. Moreover, our algorithms based on gradient descent are scalable and lead to accurate predictors paired with non-vacuous guarantees.
△ Less
Submitted 31 August, 2021; v1 submitted 28 April, 2021;
originally announced April 2021.
-
A PAC-Bayes Analysis of Adversarial Robustness
Authors:
Paul Viallard,
Guillaume Vidot,
Amaury Habrard,
Emilie Morvant
Abstract:
We propose the first general PAC-Bayesian generalization bounds for adversarial robustness, that estimate, at test time, how much a model will be invariant to imperceptible perturbations in the input. Instead of deriving a worst-case analysis of the risk of a hypothesis over all the possible perturbations, we leverage the PAC-Bayesian framework to bound the averaged risk on the perturbations for m…
▽ More
We propose the first general PAC-Bayesian generalization bounds for adversarial robustness, that estimate, at test time, how much a model will be invariant to imperceptible perturbations in the input. Instead of deriving a worst-case analysis of the risk of a hypothesis over all the possible perturbations, we leverage the PAC-Bayesian framework to bound the averaged risk on the perturbations for majority votes (over the whole class of hypotheses). Our theoretically founded analysis has the advantage to provide general bounds (i) that are valid for any kind of attacks (i.e., the adversarial attacks), (ii) that are tight thanks to the PAC-Bayesian framework, (iii) that can be directly minimized during the learning phase to obtain a robust model on different attacks at test time.
△ Less
Submitted 27 October, 2021; v1 submitted 19 February, 2021;
originally announced February 2021.
-
A General Framework for the Practical Disintegration of PAC-Bayesian Bounds
Authors:
Paul Viallard,
Pascal Germain,
Amaury Habrard,
Emilie Morvant
Abstract:
PAC-Bayesian bounds are known to be tight and informative when studying the generalization ability of randomized classifiers. However, they require a loose and costly derandomization step when applied to some families of deterministic models such as neural networks. As an alternative to this step, we introduce new PAC-Bayesian generalization bounds that have the originality to provide disintegrate…
▽ More
PAC-Bayesian bounds are known to be tight and informative when studying the generalization ability of randomized classifiers. However, they require a loose and costly derandomization step when applied to some families of deterministic models such as neural networks. As an alternative to this step, we introduce new PAC-Bayesian generalization bounds that have the originality to provide disintegrated bounds, i.e., they give guarantees over one single hypothesis instead of the usual averaged analysis. Our bounds are easily optimizable and can be used to design learning algorithms. We illustrate this behavior on neural networks, and we show a significant practical improvement over the state-of-the-art framework.
△ Less
Submitted 18 September, 2023; v1 submitted 17 February, 2021;
originally announced February 2021.
-
A survey on domain adaptation theory: learning bounds and theoretical guarantees
Authors:
Ievgen Redko,
Emilie Morvant,
Amaury Habrard,
Marc Sebban,
Younès Bennani
Abstract:
All famous machine learning algorithms that comprise both supervised and semi-supervised learning work well only under a common assumption: the training and test data follow the same distribution. When the distribution changes, most statistical models must be reconstructed from newly collected data, which for some applications can be costly or impossible to obtain. Therefore, it has become necessa…
▽ More
All famous machine learning algorithms that comprise both supervised and semi-supervised learning work well only under a common assumption: the training and test data follow the same distribution. When the distribution changes, most statistical models must be reconstructed from newly collected data, which for some applications can be costly or impossible to obtain. Therefore, it has become necessary to develop approaches that reduce the need and the effort to obtain new labeled samples by exploiting data that are available in related areas, and using these further across similar fields. This has given rise to a new machine learning framework known as transfer learning: a learning setting inspired by the capability of a human being to extrapolate knowledge across tasks to learn more efficiently. Despite a large amount of different transfer learning scenarios, the main objective of this survey is to provide an overview of the state-of-the-art theoretical results in a specific, and arguably the most popular, sub-field of transfer learning, called domain adaptation. In this sub-field, the data distribution is assumed to change across the training and the test data, while the learning task remains the same. We provide a first up-to-date description of existing results related to domain adaptation problem that cover learning bounds based on different statistical learning frameworks.
△ Less
Submitted 13 July, 2022; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Metric Learning from Imbalanced Data
Authors:
Léo Gautheron,
Emilie Morvant,
Amaury Habrard,
Marc Sebban
Abstract:
A key element of any machine learning algorithm is the use of a function that measures the dis/similarity between data points. Given a task, such a function can be optimized with a metric learning algorithm. Although this research field has received a lot of attention during the past decade, very few approaches have focused on learning a metric in an imbalanced scenario where the number of positiv…
▽ More
A key element of any machine learning algorithm is the use of a function that measures the dis/similarity between data points. Given a task, such a function can be optimized with a metric learning algorithm. Although this research field has received a lot of attention during the past decade, very few approaches have focused on learning a metric in an imbalanced scenario where the number of positive examples is much smaller than the negatives. Here, we address this challenging task by designing a new Mahalanobis metric learning algorithm (IML) which deals with class imbalance. The empirical study performed shows the efficiency of IML.
△ Less
Submitted 4 September, 2019;
originally announced September 2019.
-
Learning Landmark-Based Ensembles with Random Fourier Features and Gradient Boosting
Authors:
Léo Gautheron,
Pascal Germain,
Amaury Habrard,
Emilie Morvant,
Marc Sebban,
Valentina Zantedeschi
Abstract:
We propose a Gradient Boosting algorithm for learning an ensemble of kernel functions adapted to the task at hand. Unlike state-of-the-art Multiple Kernel Learning techniques that make use of a pre-computed dictionary of kernel functions to select from, at each iteration we fit a kernel by approximating it as a weighted sum of Random Fourier Features (RFF) and by optimizing their barycenter. This…
▽ More
We propose a Gradient Boosting algorithm for learning an ensemble of kernel functions adapted to the task at hand. Unlike state-of-the-art Multiple Kernel Learning techniques that make use of a pre-computed dictionary of kernel functions to select from, at each iteration we fit a kernel by approximating it as a weighted sum of Random Fourier Features (RFF) and by optimizing their barycenter. This allows us to obtain a more versatile method, easier to setup and likely to have better performance. Our study builds on a recent result showing one can learn a kernel from RFF by computing the minimum of a PAC-Bayesian bound on the kernel alignment generalization loss, which is obtained efficiently from a closed-form solution. We conduct an experimental analysis to highlight the advantages of our method w.r.t. both Boosting-based and kernel-learning state-of-the-art methods.
△ Less
Submitted 14 June, 2019;
originally announced June 2019.
-
Pseudo-Bayesian Learning with Kernel Fourier Transform as Prior
Authors:
Gaël Letarte,
Emilie Morvant,
Pascal Germain
Abstract:
We revisit Rahimi and Recht (2007)'s kernel random Fourier features (RFF) method through the lens of the PAC-Bayesian theory. While the primary goal of RFF is to approximate a kernel, we look at the Fourier transform as a prior distribution over trigonometric hypotheses. It naturally suggests learning a posterior on these hypotheses. We derive generalization bounds that are optimized by learning a…
▽ More
We revisit Rahimi and Recht (2007)'s kernel random Fourier features (RFF) method through the lens of the PAC-Bayesian theory. While the primary goal of RFF is to approximate a kernel, we look at the Fourier transform as a prior distribution over trigonometric hypotheses. It naturally suggests learning a posterior on these hypotheses. We derive generalization bounds that are optimized by learning a pseudo-posterior obtained from a closed-form expression. Based on this study, we consider two learning strategies: The first one finds a compact landmarks-based representation of the data where each landmark is given by a distribution-tailored similarity measure, while the second one provides a PAC-Bayesian justification to the kernel alignment method of Sinha and Duchi (2016).
△ Less
Submitted 27 March, 2019; v1 submitted 30 October, 2018;
originally announced October 2018.
-
Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters
Authors:
Anil Goyal,
Emilie Morvant,
Pascal Germain,
Massih-Reza Amini
Abstract:
In this paper we propose a boosting based multiview learning algorithm, referred to as PB-MVBoost, which iteratively learns i) weights over view-specific voters capturing view-specific information; and ii) weights over views by optimizing a PAC-Bayes multiview C-Bound that takes into account the accuracy of view-specific classifiers and the diversity between the views. We derive a generalization b…
▽ More
In this paper we propose a boosting based multiview learning algorithm, referred to as PB-MVBoost, which iteratively learns i) weights over view-specific voters capturing view-specific information; and ii) weights over views by optimizing a PAC-Bayes multiview C-Bound that takes into account the accuracy of view-specific classifiers and the diversity between the views. We derive a generalization bound for this strategy following the PAC-Bayes theory which is a suitable tool to deal with models expressed as weighted combination over a set of voters. Different experiments on three publicly available datasets show the efficiency of the proposed approach with respect to state-of-art models.
△ Less
Submitted 27 August, 2018; v1 submitted 17 August, 2018;
originally announced August 2018.
-
Multiview Learning of Weighted Majority Vote by Bregman Divergence Minimization
Authors:
Anil Goyal,
Emilie Morvant,
Massih-Reza Amini
Abstract:
We tackle the issue of classifier combinations when observations have multiple views. Our method jointly learns view-specific weighted majority vote classifiers (i.e. for each view) over a set of base voters, and a second weighted majority vote classifier over the set of these view-specific weighted majority vote classifiers. We show that the empirical risk minimization of the final majority vote…
▽ More
We tackle the issue of classifier combinations when observations have multiple views. Our method jointly learns view-specific weighted majority vote classifiers (i.e. for each view) over a set of base voters, and a second weighted majority vote classifier over the set of these view-specific weighted majority vote classifiers. We show that the empirical risk minimization of the final majority vote given a multiview training set can be cast as the minimization of Bregman divergences. This allows us to derive a parallel-update optimization algorithm for learning our multiview model. We empirically study our algorithm with a particular focus on the impact of the training set size on the multiview learning results. The experiments show that our approach is able to overcome the lack of labeled information.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
PAC-Bayes and Domain Adaptation
Authors:
Pascal Germain,
Amaury Habrard,
François Laviolette,
Emilie Morvant
Abstract:
We provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different, but related, target distribution. Firstly, we propose an improvement of the previous approach we proposed in Germain et al. (2013), which relies on a novel distribution pseudodistance based on a disagreement aver…
▽ More
We provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different, but related, target distribution. Firstly, we propose an improvement of the previous approach we proposed in Germain et al. (2013), which relies on a novel distribution pseudodistance based on a disagreement averaging, allowing us to derive a new tighter domain adaptation bound for the target risk. While this bound stands in the spirit of common domain adaptation works, we derive a second bound (introduced in Germain et al., 2016) that brings a new perspective on domain adaptation by deriving an upper bound on the target risk where the distributions' divergence-expressed as a ratio-controls the trade-off between a source error measure and the target voters' disagreement. We discuss and compare both results, from which we obtain PAC-Bayesian generalization bounds. Furthermore, from the PAC-Bayesian specialization to linear classifiers, we infer two learning algorithms, and we evaluate them on real data.
△ Less
Submitted 15 November, 2019; v1 submitted 17 July, 2017;
originally announced July 2017.
-
PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach
Authors:
Anil Goyal,
Emilie Morvant,
Pascal Germain,
Massih-Reza Amini
Abstract:
We study a two-level multiview learning with more than two views under the PAC-Bayesian framework. This approach, sometimes referred as late fusion, consists in learning sequentially multiple view-specific classifiers at the first level, and then combining these view-specific classifiers at the second level. Our main theoretical result is a generalization bound on the risk of the majority vote whi…
▽ More
We study a two-level multiview learning with more than two views under the PAC-Bayesian framework. This approach, sometimes referred as late fusion, consists in learning sequentially multiple view-specific classifiers at the first level, and then combining these view-specific classifiers at the second level. Our main theoretical result is a generalization bound on the risk of the majority vote which exhibits a term of diversity in the predictions of the view-specific classifiers. From this result it comes out that controlling the trade-off between diversity and accuracy is a key element for multiview learning, which complements other results in multiview learning. Finally, we experiment our principle on multiview datasets extracted from the Reuters RCV1/RCV2 collection.
△ Less
Submitted 13 July, 2017; v1 submitted 23 June, 2016;
originally announced June 2016.
-
A New PAC-Bayesian Perspective on Domain Adaptation
Authors:
Pascal Germain,
Amaury Habrard,
François Laviolette,
Emilie Morvant
Abstract:
We study the issue of PAC-Bayesian domain adaptation: We want to learn, from a source domain, a majority vote model dedicated to a target one. Our theoretical contribution brings a new perspective by deriving an upper-bound on the target risk where the distributions' divergence---expressed as a ratio---controls the trade-off between a source error measure and the target voters' disagreement. Our b…
▽ More
We study the issue of PAC-Bayesian domain adaptation: We want to learn, from a source domain, a majority vote model dedicated to a target one. Our theoretical contribution brings a new perspective by deriving an upper-bound on the target risk where the distributions' divergence---expressed as a ratio---controls the trade-off between a source error measure and the target voters' disagreement. Our bound suggests that one has to focus on regions where the source data is informative.From this result, we derive a PAC-Bayesian generalization bound, and specialize it to linear classifiers. Then, we infer a learning algorithmand perform experiments on real data.
△ Less
Submitted 26 July, 2016; v1 submitted 15 June, 2015;
originally announced June 2015.
-
PAC-Bayesian Theorems for Domain Adaptation with Specialization to Linear Classifiers
Authors:
Pascal Germain,
Amaury Habrard,
François Laviolette,
Emilie Morvant
Abstract:
In this paper, we provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different target distribution. On the one hand, we propose an improvement of the previous approach proposed by Germain et al. (2013), that relies on a novel distribution pseudodistance based on a disagreement…
▽ More
In this paper, we provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different target distribution. On the one hand, we propose an improvement of the previous approach proposed by Germain et al. (2013), that relies on a novel distribution pseudodistance based on a disagreement averaging, allowing us to derive a new tighter PAC-Bayesian domain adaptation bound for the stochastic Gibbs classifier. We specialize it to linear classifiers, and design a learning algorithm which shows interesting results on a synthetic problem and on a popular sentiment annotation task. On the other hand, we generalize these results to multisource domain adaptation allowing us to take into account different source domains. This study opens the door to tackle domain adaptation tasks by making use of all the PAC-Bayesian tools.
△ Less
Submitted 9 August, 2016; v1 submitted 24 March, 2015;
originally announced March 2015.
-
An Improvement to the Domain Adaptation Bound in a PAC-Bayesian context
Authors:
Pascal Germain,
Amaury Habrard,
Francois Laviolette,
Emilie Morvant
Abstract:
This paper provides a theoretical analysis of domain adaptation based on the PAC-Bayesian theory. We propose an improvement of the previous domain adaptation bound obtained by Germain et al. in two ways. We first give another generalization bound tighter and easier to interpret. Moreover, we provide a new analysis of the constant term appearing in the bound that can be of high interest for develop…
▽ More
This paper provides a theoretical analysis of domain adaptation based on the PAC-Bayesian theory. We propose an improvement of the previous domain adaptation bound obtained by Germain et al. in two ways. We first give another generalization bound tighter and easier to interpret. Moreover, we provide a new analysis of the constant term appearing in the bound that can be of high interest for develo** new algorithmic solutions.
△ Less
Submitted 13 January, 2015;
originally announced January 2015.
-
On Generalizing the C-Bound to the Multiclass and Multi-label Settings
Authors:
Francois Laviolette,
Emilie Morvant,
Liva Ralaivola,
Jean-Francis Roy
Abstract:
The C-bound, introduced in Lacasse et al., gives a tight upper bound on the risk of a binary majority vote classifier. In this work, we present a first step towards extending this work to more complex outputs, by providing generalizations of the C-bound to the multiclass and multi-label settings.
The C-bound, introduced in Lacasse et al., gives a tight upper bound on the risk of a binary majority vote classifier. In this work, we present a first step towards extending this work to more complex outputs, by providing generalizations of the C-bound to the multiclass and multi-label settings.
△ Less
Submitted 13 January, 2015;
originally announced January 2015.
-
Domain adaptation of weighted majority votes via perturbed variation-based self-labeling
Authors:
Emilie Morvant
Abstract:
In machine learning, the domain adaptation problem arrives when the test (target) and the train (source) data are generated from different distributions. A key applied issue is thus the design of algorithms able to generalize on a new distribution, for which we have no label information. We focus on learning classification models defined as a weighted majority vote over a set of real-val ued funct…
▽ More
In machine learning, the domain adaptation problem arrives when the test (target) and the train (source) data are generated from different distributions. A key applied issue is thus the design of algorithms able to generalize on a new distribution, for which we have no label information. We focus on learning classification models defined as a weighted majority vote over a set of real-val ued functions. In this context, Germain et al. (2013) have shown that a measure of disagreement between these functions is crucial to control. The core of this measure is a theoretical bound--the C-bound (Lacasse et al., 2007)--which involves the disagreement and leads to a well performing majority vote learning algorithm in usual non-adaptative supervised setting: MinCq. In this work, we propose a framework to extend MinCq to a domain adaptation scenario. This procedure takes advantage of the recent perturbed variation divergence between distributions proposed by Harel and Mannor (2012). Justified by a theoretical bound on the target risk of the vote, we provide to MinCq a target sample labeled thanks to a perturbed variation-based self-labeling focused on the regions where the source and target marginals appear similar. We also study the influence of our self-labeling, from which we deduce an original process for tuning the hyperparameters. Finally, our framework called PV-MinCq shows very promising results on a rotation and translation synthetic problem.
△ Less
Submitted 1 October, 2014;
originally announced October 2014.
-
On the Generalization of the C-Bound to Structured Output Ensemble Methods
Authors:
François Laviolette,
Emilie Morvant,
Liva Ralaivola,
Jean-Francis Roy
Abstract:
This paper generalizes an important result from the PAC-Bayesian literature for binary classification to the case of ensemble methods for structured outputs. We prove a generic version of the \Cbound, an upper bound over the risk of models expressed as a weighted majority vote that is based on the first and second statistical moments of the vote's margin. This bound may advantageously $(i)$ be app…
▽ More
This paper generalizes an important result from the PAC-Bayesian literature for binary classification to the case of ensemble methods for structured outputs. We prove a generic version of the \Cbound, an upper bound over the risk of models expressed as a weighted majority vote that is based on the first and second statistical moments of the vote's margin. This bound may advantageously $(i)$ be applied on more complex outputs such as multiclass labels and multilabel, and $(ii)$ allow to consider margin relaxations. These results open the way to develop new ensemble methods for structured output prediction with PAC-Bayesian guarantees.
△ Less
Submitted 15 June, 2015; v1 submitted 6 August, 2014;
originally announced August 2014.
-
Majority Vote of Diverse Classifiers for Late Fusion
Authors:
Emilie Morvant,
Amaury Habrard,
Stéphane Ayache
Abstract:
In the past few years, a lot of attention has been devoted to multimedia indexing by fusing multimodal informations. Two kinds of fusion schemes are generally considered: The early fusion and the late fusion. We focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic pr…
▽ More
In the past few years, a lot of attention has been devoted to multimedia indexing by fusing multimodal informations. Two kinds of fusion schemes are generally considered: The early fusion and the late fusion. We focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program named MinCq coming from the machine learning PAC-Bayesian theory. MinCq looks for the weighted combination, over a set of real-valued functions seen as voters, leading to the lowest misclassification rate, while maximizing the voters' diversity. We propose an extension of MinCq tailored to multimedia indexing. Our method is based on an order-preserving pairwise loss adapted to ranking that allows us to improve Mean Averaged Precision measure while taking into account the diversity of the voters that we want to fuse. We provide evidence that this method is naturally adapted to late fusion procedures and confirm the good behavior of our approach on the challenging PASCAL VOC'07 benchmark.
△ Less
Submitted 19 June, 2014; v1 submitted 30 April, 2014;
originally announced April 2014.
-
Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer
Authors:
Emilie Morvant
Abstract:
We tackle the PAC-Bayesian Domain Adaptation (DA) problem. This arrives when one desires to learn, from a source distribution, a good weighted majority vote (over a set of classifiers) on a different target distribution. In this context, the disagreement between classifiers is known crucial to control. In non-DA supervised setting, a theoretical bound - the C-bound - involves this disagreement and…
▽ More
We tackle the PAC-Bayesian Domain Adaptation (DA) problem. This arrives when one desires to learn, from a source distribution, a good weighted majority vote (over a set of classifiers) on a different target distribution. In this context, the disagreement between classifiers is known crucial to control. In non-DA supervised setting, a theoretical bound - the C-bound - involves this disagreement and leads to a majority vote learning algorithm: MinCq. In this work, we extend MinCq to DA by taking advantage of an elegant divergence between distribution called the Perturbed Varation (PV). Firstly, justified by a new formulation of the C-bound, we provide to MinCq a target sample labeled thanks to a PV-based self-labeling focused on regions where the source and target marginal distributions are closer. Secondly, we propose an original process for tuning the hyperparameters. Our framework shows very promising results on a toy problem.
△ Less
Submitted 19 November, 2013;
originally announced November 2013.
-
PAC-Bayesian Learning and Domain Adaptation
Authors:
Pascal Germain,
Amaury Habrard,
François Laviolette,
Emilie Morvant
Abstract:
In machine learning, Domain Adaptation (DA) arises when the distribution gen- erating the test (target) data differs from the one generating the learning (source) data. It is well known that DA is an hard task even under strong assumptions, among which the covariate-shift where the source and target distributions diverge only in their marginals, i.e. they have the same labeling function. Another p…
▽ More
In machine learning, Domain Adaptation (DA) arises when the distribution gen- erating the test (target) data differs from the one generating the learning (source) data. It is well known that DA is an hard task even under strong assumptions, among which the covariate-shift where the source and target distributions diverge only in their marginals, i.e. they have the same labeling function. Another popular approach is to consider an hypothesis class that moves closer the two distributions while implying a low-error for both tasks. This is a VC-dim approach that restricts the complexity of an hypothesis class in order to get good generalization. Instead, we propose a PAC-Bayesian approach that seeks for suitable weights to be given to each hypothesis in order to build a majority vote. We prove a new DA bound in the PAC-Bayesian context. This leads us to design the first DA-PAC-Bayesian algorithm based on the minimization of the proposed bound. Doing so, we seek for a ρ-weighted majority vote that takes into account a trade-off between three quantities. The first two quantities being, as usual in the PAC-Bayesian approach, (a) the complexity of the majority vote (measured by a Kullback-Leibler divergence) and (b) its empirical risk (measured by the ρ-average errors on the source sample). The third quantity is (c) the capacity of the majority vote to distinguish some structural difference between the source and target samples.
△ Less
Submitted 11 December, 2012;
originally announced December 2012.
-
PAC-Bayesian Majority Vote for Late Classifier Fusion
Authors:
Emilie Morvant,
Amaury Habrard,
Stéphane Ayache
Abstract:
A lot of attention has been devoted to multimedia indexing over the past few years. In the literature, we often consider two kinds of fusion schemes: The early fusion and the late fusion. In this paper we focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program n…
▽ More
A lot of attention has been devoted to multimedia indexing over the past few years. In the literature, we often consider two kinds of fusion schemes: The early fusion and the late fusion. In this paper we focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program named MinCq coming from the Machine Learning PAC-Bayes theory. MinCq looks for the weighted combination, over a set of real-valued functions seen as voters, leading to the lowest misclassification rate, while making use of the voters' diversity. We provide evidence that this method is naturally adapted to late fusion procedure. We propose an extension of MinCq by adding an order- preserving pairwise loss for ranking, hel** to improve Mean Averaged Precision measure. We confirm the good behavior of the MinCq-based fusion approaches with experiments on a real image benchmark.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification
Authors:
Emilie Morvant,
Sokol Koço,
Liva Ralaivola
Abstract:
In this work, we propose a PAC-Bayes bound for the generalization risk of the Gibbs classifier in the multi-class classification framework. The novelty of our work is the critical use of the confusion matrix of a classifier as an error measure; this puts our contribution in the line of work aiming at dealing with performance measure that are richer than mere scalar criterion such as the misclassif…
▽ More
In this work, we propose a PAC-Bayes bound for the generalization risk of the Gibbs classifier in the multi-class classification framework. The novelty of our work is the critical use of the confusion matrix of a classifier as an error measure; this puts our contribution in the line of work aiming at dealing with performance measure that are richer than mere scalar criterion such as the misclassification rate. Thanks to very recent and beautiful results on matrix concentration inequalities, we derive two bounds showing that the true confusion risk of the Gibbs classifier is upper-bounded by its empirical risk plus a term depending on the number of training examples in each class. To the best of our knowledge, this is the first PAC-Bayes bounds based on confusion matrices.
△ Less
Submitted 22 October, 2013; v1 submitted 28 February, 2012;
originally announced February 2012.