Search | arXiv e-print repository

Statistical Guarantees for Variational Autoencoders using PAC-Bayesian Theory

Authors: Sokhna Diarra Mbacke, Florence Clerc, Pascal Germain

Abstract: Since their inception, Variational Autoencoders (VAEs) have become central in machine learning. Despite their widespread use, numerous questions regarding their theoretical properties remain open. Using PAC-Bayesian theory, this work develops statistical guarantees for VAEs. First, we derive the first PAC-Bayesian bound for posterior distributions conditioned on individual samples from the data-ge… ▽ More Since their inception, Variational Autoencoders (VAEs) have become central in machine learning. Despite their widespread use, numerous questions regarding their theoretical properties remain open. Using PAC-Bayesian theory, this work develops statistical guarantees for VAEs. First, we derive the first PAC-Bayesian bound for posterior distributions conditioned on individual samples from the data-generating distribution. Then, we utilize this result to develop generalization guarantees for the VAE's reconstruction loss, as well as upper bounds on the distance between the input and the regenerated distributions. More importantly, we provide upper bounds on the Wasserstein distance between the input distribution and the distribution defined by the VAE's generative model. △ Less

Submitted 7 December, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: Spotlight Paper at NeurIPS 2023

arXiv:2306.04777 [pdf, other]

Invariant Causal Set Covering Machines

Authors: Thibaud Godon, Baptiste Bauvin, Pascal Germain, Jacques Corbeil, Alexandre Drouin

Abstract: Rule-based models, such as decision trees, appeal to practitioners due to their interpretable nature. However, the learning algorithms that produce such models are often vulnerable to spurious associations and thus, they are not guaranteed to extract causally-relevant insights. In this work, we build on ideas from the invariant causal prediction literature to propose Invariant Causal Set Covering… ▽ More Rule-based models, such as decision trees, appeal to practitioners due to their interpretable nature. However, the learning algorithms that produce such models are often vulnerable to spurious associations and thus, they are not guaranteed to extract causally-relevant insights. In this work, we build on ideas from the invariant causal prediction literature to propose Invariant Causal Set Covering Machines, an extension of the classical Set Covering Machine algorithm for conjunctions/disjunctions of binary-valued rules that provably avoids spurious associations. We demonstrate both theoretically and empirically that our method can identify the causal parents of a variable of interest in polynomial time. △ Less

Submitted 19 July, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2302.08942 [pdf, other]

PAC-Bayesian Generalization Bounds for Adversarial Generative Models

Authors: Sokhna Diarra Mbacke, Florence Clerc, Pascal Germain

Abstract: We extend PAC-Bayesian theory to generative models and develop generalization bounds for models based on the Wasserstein distance and the total variation distance. Our first result on the Wasserstein distance assumes the instance space is bounded, while our second result takes advantage of dimensionality reduction. Our results naturally apply to Wasserstein GANs and Energy-Based GANs, and our boun… ▽ More We extend PAC-Bayesian theory to generative models and develop generalization bounds for models based on the Wasserstein distance and the total variation distance. Our first result on the Wasserstein distance assumes the instance space is bounded, while our second result takes advantage of dimensionality reduction. Our results naturally apply to Wasserstein GANs and Energy-Based GANs, and our bounds provide new training objectives for these two. Although our work is mainly theoretical, we perform numerical experiments showing non-vacuous generalization bounds for Wasserstein GANs on synthetic datasets. △ Less

Submitted 13 November, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: Published at ICML 2023

arXiv:2106.12535 [pdf, other]

Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound

Authors: Valentina Zantedeschi, Paul Viallard, Emilie Morvant, Rémi Emonet, Amaury Habrard, Pascal Germain, Benjamin Guedj

Abstract: We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective.… ▽ More We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective. The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PAC-Bayes objectives -- both with uninformed (data-independent) and informed (data-dependent) priors. △ Less

Submitted 19 October, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

Journal ref: Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2104.13626 [pdf, other]

Self-Bounding Majority Vote Learning Algorithms by the Direct Minimization of a Tight PAC-Bayesian C-Bound

Authors: Paul Viallard, Pascal Germain, Amaury Habrard, Emilie Morvant

Abstract: In the PAC-Bayesian literature, the C-Bound refers to an insightful relation between the risk of a majority vote classifier (under the zero-one loss) and the first two moments of its margin (i.e., the expected margin and the voters' diversity). Until now, learning algorithms developed in this framework minimize the empirical version of the C-Bound, instead of explicit PAC-Bayesian generalization b… ▽ More In the PAC-Bayesian literature, the C-Bound refers to an insightful relation between the risk of a majority vote classifier (under the zero-one loss) and the first two moments of its margin (i.e., the expected margin and the voters' diversity). Until now, learning algorithms developed in this framework minimize the empirical version of the C-Bound, instead of explicit PAC-Bayesian generalization bounds. In this paper, by directly optimizing PAC-Bayesian guarantees on the C-Bound, we derive self-bounding majority vote learning algorithms. Moreover, our algorithms based on gradient descent are scalable and lead to accurate predictors paired with non-vacuous guarantees. △ Less

Submitted 31 August, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

Journal ref: ECML PKDD 2021, Sep 2021, Bilbao, Spain

arXiv:2102.08649 [pdf, other]

A General Framework for the Practical Disintegration of PAC-Bayesian Bounds

Authors: Paul Viallard, Pascal Germain, Amaury Habrard, Emilie Morvant

Abstract: PAC-Bayesian bounds are known to be tight and informative when studying the generalization ability of randomized classifiers. However, they require a loose and costly derandomization step when applied to some families of deterministic models such as neural networks. As an alternative to this step, we introduce new PAC-Bayesian generalization bounds that have the originality to provide disintegrate… ▽ More PAC-Bayesian bounds are known to be tight and informative when studying the generalization ability of randomized classifiers. However, they require a loose and costly derandomization step when applied to some families of deterministic models such as neural networks. As an alternative to this step, we introduce new PAC-Bayesian generalization bounds that have the originality to provide disintegrated bounds, i.e., they give guarantees over one single hypothesis instead of the usual averaged analysis. Our bounds are easily optimizable and can be used to design learning algorithms. We illustrate this behavior on neural networks, and we show a significant practical improvement over the state-of-the-art framework. △ Less

Submitted 18 September, 2023; v1 submitted 17 February, 2021; originally announced February 2021.

Comments: Machine Learning, In press

arXiv:2010.12995 [pdf, other]

Out-of-distribution detection for regression tasks: parameter versus predictor entropy

Authors: Yann Pequignot, Mathieu Alain, Patrick Dallaire, Alireza Yeganehparast, Pascal Germain, Josée Desharnais, François Laviolette

Abstract: It is crucial to detect when an instance lies downright too far from the training samples for the machine learning model to be trusted, a challenge known as out-of-distribution (OOD) detection. For neural networks, one approach to this task consists of learning a diversity of predictors that all can explain the training data. This information can be used to estimate the epistemic uncertainty at a… ▽ More It is crucial to detect when an instance lies downright too far from the training samples for the machine learning model to be trusted, a challenge known as out-of-distribution (OOD) detection. For neural networks, one approach to this task consists of learning a diversity of predictors that all can explain the training data. This information can be used to estimate the epistemic uncertainty at a given newly observed instance in terms of a measure of the disagreement of the predictions. Evaluation and certification of the ability of a method to detect OOD require specifying instances which are likely to occur in deployment yet on which no prediction is available. Focusing on regression tasks, we choose a simple yet insightful model for this OOD distribution and conduct an empirical evaluation of the ability of various methods to discriminate OOD samples from the data. Moreover, we exhibit evidence that a diversity of parameters may fail to translate to a diversity of predictors. Based on the choice of an OOD distribution, we propose a new way of estimating the entropy of a distribution on predictors based on nearest neighbors in function space. This leads to a variational objective which, combined with the family of distributions given by a generative neural network, systematically produces a diversity of predictors that provides a robust way to detect OOD samples. △ Less

Submitted 11 September, 2023; v1 submitted 24 October, 2020; originally announced October 2020.

arXiv:1912.03036 [pdf, ps, other]

Improved PAC-Bayesian Bounds for Linear Regression

Authors: Vera Shalaeva, Alireza Fakhrizadeh Esfahani, Pascal Germain, Mihaly Petreczky

Abstract: In this paper, we improve the PAC-Bayesian error bound for linear regression derived in Germain et al. [10]. The improvements are twofold. First, the proposed error bound is tighter, and converges to the generalization loss with a well-chosen temperature parameter. Second, the error bound also holds for training data that are not independently sampled. In particular, the error bound applies to cer… ▽ More In this paper, we improve the PAC-Bayesian error bound for linear regression derived in Germain et al. [10]. The improvements are twofold. First, the proposed error bound is tighter, and converges to the generalization loss with a well-chosen temperature parameter. Second, the error bound also holds for training data that are not independently sampled. In particular, the error bound applies to certain time series generated by well-known classes of dynamical models, such as ARX models. △ Less

Submitted 6 December, 2019; originally announced December 2019.

Journal ref: Thirty-Fourth AAAI Conference on Artificial Intelligence, Feb 2020, New York, United States

arXiv:1910.04464 [pdf, ps, other]

PAC-Bayesian Contrastive Unsupervised Representation Learning

Authors: Kento Nozawa, Pascal Germain, Benjamin Guedj

Abstract: Contrastive unsupervised representation learning (CURL) is the state-of-the-art technique to learn representations (as a set of features) from unlabelled data. While CURL has collected several empirical successes recently, theoretical understanding of its performance was still missing. In a recent work, Arora et al. (2019) provide the first generalisation bounds for CURL, relying on a Rademacher c… ▽ More Contrastive unsupervised representation learning (CURL) is the state-of-the-art technique to learn representations (as a set of features) from unlabelled data. While CURL has collected several empirical successes recently, theoretical understanding of its performance was still missing. In a recent work, Arora et al. (2019) provide the first generalisation bounds for CURL, relying on a Rademacher complexity. We extend their framework to the flexible PAC-Bayes setting, allowing us to deal with the non-iid setting. We present PAC-Bayesian generalisation bounds for CURL, which are then used to derive a new representation learning algorithm. Numerical experiments on real-life datasets illustrate that our algorithm achieves competitive accuracy, and yields non-vacuous generalisation bounds. △ Less

Submitted 17 July, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

Comments: Published in the proceedings of the Conference on Uncertainty in Artificial Intelligence 2020 (UAI)

Journal ref: PMLR, volume 124 (UAI 2020), 2020

arXiv:1906.06203 [pdf, other]

Learning Landmark-Based Ensembles with Random Fourier Features and Gradient Boosting

Authors: Léo Gautheron, Pascal Germain, Amaury Habrard, Emilie Morvant, Marc Sebban, Valentina Zantedeschi

Abstract: We propose a Gradient Boosting algorithm for learning an ensemble of kernel functions adapted to the task at hand. Unlike state-of-the-art Multiple Kernel Learning techniques that make use of a pre-computed dictionary of kernel functions to select from, at each iteration we fit a kernel by approximating it as a weighted sum of Random Fourier Features (RFF) and by optimizing their barycenter. This… ▽ More We propose a Gradient Boosting algorithm for learning an ensemble of kernel functions adapted to the task at hand. Unlike state-of-the-art Multiple Kernel Learning techniques that make use of a pre-computed dictionary of kernel functions to select from, at each iteration we fit a kernel by approximating it as a weighted sum of Random Fourier Features (RFF) and by optimizing their barycenter. This allows us to obtain a more versatile method, easier to setup and likely to have better performance. Our study builds on a recent result showing one can learn a kernel from RFF by computing the minimum of a PAC-Bayesian bound on the kernel alignment generalization loss, which is obtained efficiently from a closed-form solution. We conduct an experimental analysis to highlight the advantages of our method w.r.t. both Boosting-based and kernel-learning state-of-the-art methods. △ Less

Submitted 14 June, 2019; originally announced June 2019.

arXiv:1905.10259 [pdf, other]

Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks

Authors: Gaël Letarte, Pascal Germain, Benjamin Guedj, François Laviolette

Abstract: We present a comprehensive study of multilayer neural networks with binary activation, relying on the PAC-Bayesian theory. Our contributions are twofold: (i) we develop an end-to-end framework to train a binary activated deep neural network, (ii) we provide nonvacuous PAC-Bayesian generalization bounds for binary activated deep neural networks. Our results are obtained by minimizing the expected l… ▽ More We present a comprehensive study of multilayer neural networks with binary activation, relying on the PAC-Bayesian theory. Our contributions are twofold: (i) we develop an end-to-end framework to train a binary activated deep neural network, (ii) we provide nonvacuous PAC-Bayesian generalization bounds for binary activated deep neural networks. Our results are obtained by minimizing the expected loss of an architecture-dependent aggregation of binary activated deep neural networks. Our analysis inherently overcomes the fact that binary activation function is non-differentiable. The performance of our approach is assessed on a thorough numerical experiment protocol on real-life datasets. △ Less

Submitted 4 February, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

Journal ref: NeurIPS 2019

arXiv:1810.12683 [pdf, other]

Pseudo-Bayesian Learning with Kernel Fourier Transform as Prior

Authors: Gaël Letarte, Emilie Morvant, Pascal Germain

Abstract: We revisit Rahimi and Recht (2007)'s kernel random Fourier features (RFF) method through the lens of the PAC-Bayesian theory. While the primary goal of RFF is to approximate a kernel, we look at the Fourier transform as a prior distribution over trigonometric hypotheses. It naturally suggests learning a posterior on these hypotheses. We derive generalization bounds that are optimized by learning a… ▽ More We revisit Rahimi and Recht (2007)'s kernel random Fourier features (RFF) method through the lens of the PAC-Bayesian theory. While the primary goal of RFF is to approximate a kernel, we look at the Fourier transform as a prior distribution over trigonometric hypotheses. It naturally suggests learning a posterior on these hypotheses. We derive generalization bounds that are optimized by learning a pseudo-posterior obtained from a closed-form expression. Based on this study, we consider two learning strategies: The first one finds a compact landmarks-based representation of the data where each landmark is given by a distribution-tailored similarity measure, while the second one provides a PAC-Bayesian justification to the kernel alignment method of Sinha and Duchi (2016). △ Less

Submitted 27 March, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

Comments: Published at AISTATS 2019

arXiv:1808.05784 [pdf, other]

Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters

Authors: Anil Goyal, Emilie Morvant, Pascal Germain, Massih-Reza Amini

Abstract: In this paper we propose a boosting based multiview learning algorithm, referred to as PB-MVBoost, which iteratively learns i) weights over view-specific voters capturing view-specific information; and ii) weights over views by optimizing a PAC-Bayes multiview C-Bound that takes into account the accuracy of view-specific classifiers and the diversity between the views. We derive a generalization b… ▽ More In this paper we propose a boosting based multiview learning algorithm, referred to as PB-MVBoost, which iteratively learns i) weights over view-specific voters capturing view-specific information; and ii) weights over views by optimizing a PAC-Bayes multiview C-Bound that takes into account the accuracy of view-specific classifiers and the diversity between the views. We derive a generalization bound for this strategy following the PAC-Bayes theory which is a suitable tool to deal with models expressed as weighted combination over a set of voters. Different experiments on three publicly available datasets show the efficiency of the proposed approach with respect to state-of-art models. △ Less

Submitted 27 August, 2018; v1 submitted 17 August, 2018; originally announced August 2018.

arXiv:1707.05712 [pdf, other]

doi 10.1016/j.neucom.2019.10.105

PAC-Bayes and Domain Adaptation

Authors: Pascal Germain, Amaury Habrard, François Laviolette, Emilie Morvant

Abstract: We provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different, but related, target distribution. Firstly, we propose an improvement of the previous approach we proposed in Germain et al. (2013), which relies on a novel distribution pseudodistance based on a disagreement aver… ▽ More We provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different, but related, target distribution. Firstly, we propose an improvement of the previous approach we proposed in Germain et al. (2013), which relies on a novel distribution pseudodistance based on a disagreement averaging, allowing us to derive a new tighter domain adaptation bound for the target risk. While this bound stands in the spirit of common domain adaptation works, we derive a second bound (introduced in Germain et al., 2016) that brings a new perspective on domain adaptation by deriving an upper bound on the target risk where the distributions' divergence-expressed as a ratio-controls the trade-off between a source error measure and the target voters' disagreement. We discuss and compare both results, from which we obtain PAC-Bayesian generalization bounds. Furthermore, from the PAC-Bayesian specialization to linear classifiers, we infer two learning algorithms, and we evaluate them on real data. △ Less

Submitted 15 November, 2019; v1 submitted 17 July, 2017; originally announced July 2017.

Comments: Neurocomputing, Elsevier, 2019. arXiv admin note: substantial text overlap with arXiv:1503.06944

arXiv:1606.07240 [pdf, other]

PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach

Authors: Anil Goyal, Emilie Morvant, Pascal Germain, Massih-Reza Amini

Abstract: We study a two-level multiview learning with more than two views under the PAC-Bayesian framework. This approach, sometimes referred as late fusion, consists in learning sequentially multiple view-specific classifiers at the first level, and then combining these view-specific classifiers at the second level. Our main theoretical result is a generalization bound on the risk of the majority vote whi… ▽ More We study a two-level multiview learning with more than two views under the PAC-Bayesian framework. This approach, sometimes referred as late fusion, consists in learning sequentially multiple view-specific classifiers at the first level, and then combining these view-specific classifiers at the second level. Our main theoretical result is a generalization bound on the risk of the majority vote which exhibits a term of diversity in the predictions of the view-specific classifiers. From this result it comes out that controlling the trade-off between diversity and accuracy is a key element for multiview learning, which complements other results in multiview learning. Finally, we experiment our principle on multiview datasets extracted from the Reuters RCV1/RCV2 collection. △ Less

Submitted 13 July, 2017; v1 submitted 23 June, 2016; originally announced June 2016.

arXiv:1605.08636 [pdf, other]

PAC-Bayesian Theory Meets Bayesian Inference

Authors: Pascal Germain, Francis Bach, Alexandre Lacoste, Simon Lacoste-Julien

Abstract: We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam's razor criteria, under the assumption that the data is… ▽ More We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam's razor criteria, under the assumption that the data is generated by an i.i.d distribution. Moreover, as the negative log-likelihood is an unbounded loss function, we motivate and propose a PAC-Bayesian theorem tailored for the sub-gamma loss family, and we show that our approach is sound on classical Bayesian linear regression tasks. △ Less

Submitted 13 February, 2017; v1 submitted 27 May, 2016; originally announced May 2016.

Comments: Published at NIPS 2015 (http://papers.nips.cc/paper/6569-pac-bayesian-theory-meets-bayesian-inference)

Journal ref: Advances in Neural Information Processing Systems 29 (NIPS 2016), p. 1884-1892

arXiv:1506.04573 [pdf, other]

A New PAC-Bayesian Perspective on Domain Adaptation

Authors: Pascal Germain, Amaury Habrard, François Laviolette, Emilie Morvant

Abstract: We study the issue of PAC-Bayesian domain adaptation: We want to learn, from a source domain, a majority vote model dedicated to a target one. Our theoretical contribution brings a new perspective by deriving an upper-bound on the target risk where the distributions' divergence---expressed as a ratio---controls the trade-off between a source error measure and the target voters' disagreement. Our b… ▽ More We study the issue of PAC-Bayesian domain adaptation: We want to learn, from a source domain, a majority vote model dedicated to a target one. Our theoretical contribution brings a new perspective by deriving an upper-bound on the target risk where the distributions' divergence---expressed as a ratio---controls the trade-off between a source error measure and the target voters' disagreement. Our bound suggests that one has to focus on regions where the source data is informative.From this result, we derive a PAC-Bayesian generalization bound, and specialize it to linear classifiers. Then, we infer a learning algorithmand perform experiments on real data. △ Less

Submitted 26 July, 2016; v1 submitted 15 June, 2015; originally announced June 2015.

Comments: Published at ICML 2016

arXiv:1505.07818 [pdf, other]

Domain-Adversarial Training of Neural Networks

Authors: Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, Victor Lempitsky

Abstract: We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test… ▽ More We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains. The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of features that are (i) discriminative for the main learning task on the source domain and (ii) indiscriminate with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation and stochastic gradient descent, and can thus be implemented with little effort using any of the deep learning packages. We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application. △ Less

Submitted 26 May, 2016; v1 submitted 28 May, 2015; originally announced May 2015.

Comments: Published in JMLR: http://jmlr.org/papers/v17/15-239.html

Journal ref: Journal of Machine Learning Research 2016, vol. 17, p. 1-35

arXiv:1503.08329 [pdf, other]

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm

Authors: Pascal Germain, Alexandre Lacasse, François Laviolette, Mario Marchand, Jean-Francis Roy

Abstract: We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in th… ▽ More We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in the training data. The analysis intends to be self-contained and can be used as introductory material to PAC-Bayesian statistical learning theory. It starts from a general PAC-Bayesian perspective and ends with uncommon PAC-Bayesian bounds. Some of these bounds contain no Kullback-Leibler divergence and others allow kernel functions to be used as voters (via the sample compression setting). Finally, out of the analysis, we propose the MinCq learning algorithm that basically minimizes the C-bound. MinCq reduces to a simple quadratic program. Aside from being theoretically grounded, MinCq achieves state-of-the-art performance, as shown in our extensive empirical comparison with both AdaBoost and the Support Vector Machine. △ Less

Submitted 28 July, 2015; v1 submitted 28 March, 2015; originally announced March 2015.

Comments: Published in JMLR http://jmlr.org/papers/v16/germain15a.html

Journal ref: Journal of Machine Learning Research 2015, vol. 16, p. 787-860

arXiv:1503.06944 [pdf, other]

PAC-Bayesian Theorems for Domain Adaptation with Specialization to Linear Classifiers

Authors: Pascal Germain, Amaury Habrard, François Laviolette, Emilie Morvant

Abstract: In this paper, we provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different target distribution. On the one hand, we propose an improvement of the previous approach proposed by Germain et al. (2013), that relies on a novel distribution pseudodistance based on a disagreement… ▽ More In this paper, we provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different target distribution. On the one hand, we propose an improvement of the previous approach proposed by Germain et al. (2013), that relies on a novel distribution pseudodistance based on a disagreement averaging, allowing us to derive a new tighter PAC-Bayesian domain adaptation bound for the stochastic Gibbs classifier. We specialize it to linear classifiers, and design a learning algorithm which shows interesting results on a synthetic problem and on a popular sentiment annotation task. On the other hand, we generalize these results to multisource domain adaptation allowing us to take into account different source domains. This study opens the door to tackle domain adaptation tasks by making use of all the PAC-Bayesian tools. △ Less

Submitted 9 August, 2016; v1 submitted 24 March, 2015; originally announced March 2015.

Comments: This report is a long version of our paper entitled A PAC-Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers published in the proceedings of the International Conference on Machine Learning (ICML) 2013. We improved our main results, extended our experiments, and proposed an extension to multisource domain adaptation

arXiv:1501.03002 [pdf, ps, other]

An Improvement to the Domain Adaptation Bound in a PAC-Bayesian context

Authors: Pascal Germain, Amaury Habrard, Francois Laviolette, Emilie Morvant

Abstract: This paper provides a theoretical analysis of domain adaptation based on the PAC-Bayesian theory. We propose an improvement of the previous domain adaptation bound obtained by Germain et al. in two ways. We first give another generalization bound tighter and easier to interpret. Moreover, we provide a new analysis of the constant term appearing in the bound that can be of high interest for develop… ▽ More This paper provides a theoretical analysis of domain adaptation based on the PAC-Bayesian theory. We propose an improvement of the previous domain adaptation bound obtained by Germain et al. in two ways. We first give another generalization bound tighter and easier to interpret. Moreover, we provide a new analysis of the constant term appearing in the bound that can be of high interest for develo** new algorithmic solutions. △ Less

Submitted 13 January, 2015; originally announced January 2015.

Comments: NIPS 2014 Workshop on Transfer and Multi-task learning: Theory Meets Practice, Dec 2014, Montr{é}al, Canada

arXiv:1412.4446 [pdf, other]

Domain-Adversarial Neural Networks

Authors: Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand

Abstract: We introduce a new representation learning algorithm suited to the context of domain adaptation, in which data at training and test time come from similar but different distributions. Our algorithm is directly inspired by theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on a data representation that cannot discriminate betwee… ▽ More We introduce a new representation learning algorithm suited to the context of domain adaptation, in which data at training and test time come from similar but different distributions. Our algorithm is directly inspired by theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on a data representation that cannot discriminate between the training (source) and test (target) domains. We propose a training objective that implements this idea in the context of a neural network, whose hidden layer is trained to be predictive of the classification task, but uninformative as to the domain of the input. Our experiments on a sentiment analysis classification benchmark, where the target domain data available at training time is unlabeled, show that our neural network for domain adaption algorithm has better performance than either a standard neural network or an SVM, even if trained on input features extracted with the state-of-the-art marginalized stacked denoising autoencoders of Chen et al. (2012). △ Less

Submitted 9 February, 2015; v1 submitted 14 December, 2014; originally announced December 2014.

Comments: The first version of this paper was accepted at the "Second Workshop on Transfer and Multi-Task Learning: Theory meets Practice" (NIPS 2014, Montreal, Canada). See: https://sites.google.com/site/multitaskwsnips2014/

arXiv:1212.2340 [pdf, other]

PAC-Bayesian Learning and Domain Adaptation

Authors: Pascal Germain, Amaury Habrard, François Laviolette, Emilie Morvant

Abstract: In machine learning, Domain Adaptation (DA) arises when the distribution gen- erating the test (target) data differs from the one generating the learning (source) data. It is well known that DA is an hard task even under strong assumptions, among which the covariate-shift where the source and target distributions diverge only in their marginals, i.e. they have the same labeling function. Another p… ▽ More In machine learning, Domain Adaptation (DA) arises when the distribution gen- erating the test (target) data differs from the one generating the learning (source) data. It is well known that DA is an hard task even under strong assumptions, among which the covariate-shift where the source and target distributions diverge only in their marginals, i.e. they have the same labeling function. Another popular approach is to consider an hypothesis class that moves closer the two distributions while implying a low-error for both tasks. This is a VC-dim approach that restricts the complexity of an hypothesis class in order to get good generalization. Instead, we propose a PAC-Bayesian approach that seeks for suitable weights to be given to each hypothesis in order to build a majority vote. We prove a new DA bound in the PAC-Bayesian context. This leads us to design the first DA-PAC-Bayesian algorithm based on the minimization of the proposed bound. Doing so, we seek for a ρ-weighted majority vote that takes into account a trade-off between three quantities. The first two quantities being, as usual in the PAC-Bayesian approach, (a) the complexity of the majority vote (measured by a Kullback-Leibler divergence) and (b) its empirical risk (measured by the ρ-average errors on the source sample). The third quantity is (c) the capacity of the majority vote to distinguish some structural difference between the source and target samples. △ Less

Submitted 11 December, 2012; originally announced December 2012.

Comments: https://sites.google.com/site/multitradeoffs2012/

Journal ref: Multi-Trade-offs in Machine Learning, NIPS 2012 Workshop, Lake Tahoe : United States (2012)

Showing 1–23 of 23 results for author: Germain, P