Search | arXiv e-print repository

PAC-Bayes-Chernoff bounds for unbounded losses

Authors: Ioar Casado, Luis A. Ortega, Andrés R. Masegosa, Aritz Pérez

Abstract: We introduce a new PAC-Bayes oracle bound for unbounded losses. This result can be understood as a PAC-Bayesian version of the Cramér-Chernoff bound. The proof technique relies on controlling the tails of certain random variables involving the Cramér transform of the loss. We highlight several applications of the main theorem. First, we show that our result naturally allows exact optimization of t… ▽ More We introduce a new PAC-Bayes oracle bound for unbounded losses. This result can be understood as a PAC-Bayesian version of the Cramér-Chernoff bound. The proof technique relies on controlling the tails of certain random variables involving the Cramér transform of the loss. We highlight several applications of the main theorem. First, we show that our result naturally allows exact optimization of the free parameter on many PAC-Bayes bounds. Second, we recover and generalize previous results. Finally, we show that our approach allows working with richer assumptions that result in more informative and potentially tighter bounds. In this direction, we provide a general bound under a new ``model-dependent bounded CGF" assumption from which we obtain bounds based on parameter norms and log-Sobolev inequalities. All these bounds can be minimized to obtain novel posteriors. △ Less

Submitted 6 February, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

Comments: Updated Section 5

arXiv:2310.01189 [pdf, other]

If there is no underfitting, there is no Cold Posterior Effect

Authors: Yijie Zhang, Yi-Shan Wu, Luis A. Ortega, Andrés R. Masegosa

Abstract: The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature $T<1$, the resulting posterior predictive could have better performances than the Bayesian posterior ($T=1$). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification p… ▽ More The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature $T<1$, the resulting posterior predictive could have better performances than the Bayesian posterior ($T=1$). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood function. In this work, we provide a more nuanced understanding of the CPE as we show that misspecification leads to CPE only when the resulting Bayesian posterior underfits. In fact, we theoretically show that if there is no underfitting, there is no CPE. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 9 pages, 3 figures, ICLR 2024

arXiv:2306.10947 [pdf, other]

PAC-Chernoff Bounds: Understanding Generalization in the Interpolation Regime

Authors: Andrés R. Masegosa, Luis A. Ortega

Abstract: This paper introduces a distribution-dependent PAC-Chernoff bound that exhibits perfect tightness for interpolators, even within over-parameterized model classes. This bound, which relies on basic principles of Large Deviation Theory, defines a natural measure of the smoothness of a model, characterized by simple real-valued functions. Building upon this bound and the new concept of smoothness, we… ▽ More This paper introduces a distribution-dependent PAC-Chernoff bound that exhibits perfect tightness for interpolators, even within over-parameterized model classes. This bound, which relies on basic principles of Large Deviation Theory, defines a natural measure of the smoothness of a model, characterized by simple real-valued functions. Building upon this bound and the new concept of smoothness, we present an unified theoretical framework revealing why certain interpolators show an exceptional generalization, while others falter. We theoretically show how a wide spectrum of modern learning methodologies, encompassing techniques such as $\ell_2$-norm, distance-from-initialization and input-gradient regularization, in combination with data augmentation, invariant architectures, and over-parameterization, collectively guide the optimizer toward smoother interpolators, which, according to our theoretical framework, are the ones exhibiting superior generalization performance. This study shows that distribution-dependent bounds serve as a powerful tool to understand the complex dynamics behind the generalization capabilities of over-parameterized interpolators. △ Less

Submitted 29 April, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 56 pages, 11 figures, Pre-print

arXiv:2106.13624 [pdf, other]

Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote

Authors: Yi-Shan Wu, Andrés R. Masegosa, Stephan S. Lorenzen, Christian Igel, Yevgeny Seldin

Abstract: We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev- Cantelli inequality (a.k.a. one-sided Chebyshev's), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain e… ▽ More We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev- Cantelli inequality (a.k.a. one-sided Chebyshev's), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain et al., 2015], and, at the same time, it improves on the oracle bound based on second order Markov's inequality introduced by Masegosa et al. [2020]. We also derive a new concentration of measure inequality, which we name PAC-Bayes-Bennett, since it combines PAC-Bayesian bounding with Bennett's inequality. We use it for empirical estimation of the oracle bound. The PAC-Bayes-Bennett inequality improves on the PAC-Bayes-Bernstein inequality of Seldin et al. [2012]. We provide an empirical evaluation demonstrating that the new bounds can improve on the work of Masegosa et al. [2020]. Both the parametric form of the Chebyshev-Cantelli inequality and the PAC-Bayes-Bennett inequality may be of independent interest for the study of concentration of measure in other domains. △ Less

Submitted 17 January, 2023; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: aligned with the camera-ready version published at NeurIPS 2021

arXiv:2007.13532 [pdf, other]

Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

Authors: Andrés R. Masegosa, Stephan S. Lorenzen, Christian Igel, Yevgeny Seldin

Abstract: We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to ex… ▽ More We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble. △ Less

Submitted 17 December, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

arXiv:1912.08335 [pdf, other]

Learning under Model Misspecification: Applications to Variational and Ensemble methods

Authors: Andres R. Masegosa

Abstract: Virtually any model we use in machine learning to make predictions does not perfectly represent reality. So, most of the learning happens under model misspecification. In this work, we present a novel analysis of the generalization performance of Bayesian model averaging under model misspecification and i.i.d. data using a new family of second-order PAC-Bayes bounds. This analysis shows, in simple… ▽ More Virtually any model we use in machine learning to make predictions does not perfectly represent reality. So, most of the learning happens under model misspecification. In this work, we present a novel analysis of the generalization performance of Bayesian model averaging under model misspecification and i.i.d. data using a new family of second-order PAC-Bayes bounds. This analysis shows, in simple and intuitive terms, that Bayesian model averaging provides suboptimal generalization performance when the model is misspecified. In consequence, we provide strong theoretical arguments showing that Bayesian methods are not optimal for learning predictive models, unless the model class is perfectly specified. Using novel second-order PAC-Bayes bounds, we derive a new family of Bayesian-like algorithms, which can be implemented as variational and ensemble methods. The output of these algorithms is a new posterior distribution, different from the Bayesian posterior, which induces a posterior predictive distribution with better generalization performance. Experiments with Bayesian neural networks illustrate these findings. △ Less

Submitted 22 October, 2020; v1 submitted 17 December, 2019; originally announced December 2019.

Comments: Camera-Ready Version. NeurIPS 2020. Minor changes

arXiv:1908.11161 [pdf, other]

InferPy: Probabilistic Modeling with Deep Neural Networks Made Easy

Authors: Javier Cózar, Rafael Cabañas, Antonio Salmerón, Andrés R. Masegosa

Abstract: InferPy is a Python package for probabilistic modeling with deep neural networks. It defines a user-friendly API that trades-off model complexity with ease of use, unlike other libraries whose focus is on dealing with very general probabilistic models at the cost of having a more complex API. In particular, this package allows to define, learn and evaluate general hierarchical probabilistic models… ▽ More InferPy is a Python package for probabilistic modeling with deep neural networks. It defines a user-friendly API that trades-off model complexity with ease of use, unlike other libraries whose focus is on dealing with very general probabilistic models at the cost of having a more complex API. In particular, this package allows to define, learn and evaluate general hierarchical probabilistic models containing deep neural networks in a compact and simple way. InferPy is built on top of Tensorflow Probability and Keras. △ Less

Submitted 12 February, 2020; v1 submitted 29 August, 2019; originally announced August 2019.

Comments: 5 pages limit (paper submitted to an original software publication track). This paper briefly describes a scientific software

arXiv:1908.03442 [pdf, other]

Probabilistic Models with Deep Neural Networks

Authors: Andrés R. Masegosa, Rafael Cabañas, Helge Langseth, Thomas D. Nielsen, Antonio Salmerón

Abstract: Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to (i) very restricted model classes where exact or approximate probabilistic inference were feasible, and (ii) small or medium-sized data sets which fit within the main memory of the computer. However, developments in variational inf… ▽ More Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to (i) very restricted model classes where exact or approximate probabilistic inference were feasible, and (ii) small or medium-sized data sets which fit within the main memory of the computer. However, developments in variational inference, a general form of approximate probabilistic inference originated in statistical physics, are allowing probabilistic modeling to overcome these restrictions: (i) Approximate probabilistic inference is now possible over a broad class of probabilistic models containing a large number of parameters, and (ii) scalable inference methods based on stochastic gradient descent and distributed computation engines allow to apply probabilistic modeling over massive data sets. One important practical consequence of these advances is the possibility to include deep neural networks within a probabilistic model to capture complex non-linear stochastic relationships between random variables. These advances in conjunction with the release of novel probabilistic modeling toolboxes have greatly expanded the scope of application of probabilistic models, and allow these models to take advantage of the recent strides made by the deep learning community. In this paper we review the main concepts, methods and tools needed to use deep neural networks within a probabilistic modeling framework. △ Less

Submitted 2 October, 2019; v1 submitted 9 August, 2019; originally announced August 2019.

arXiv:1704.01427 [pdf, ps, other]

doi 10.1016/j.knosys.2018.09.019

AMIDST: a Java Toolbox for Scalable Probabilistic Machine Learning

Authors: Andrés R. Masegosa, Ana M. Martínez, Darío Ramos-López, Rafael Cabañas, Antonio Salmerón, Thomas D. Nielsen, Helge Langseth, Anders L. Madsen

Abstract: The AMIDST Toolbox is a software for scalable probabilistic machine learning with a spe- cial focus on (massive) streaming data. The toolbox supports a flexible modeling language based on probabilistic graphical models with latent variables and temporal dependencies. The specified models can be learnt from large data sets using parallel or distributed implementa- tions of Bayesian learning algorit… ▽ More The AMIDST Toolbox is a software for scalable probabilistic machine learning with a spe- cial focus on (massive) streaming data. The toolbox supports a flexible modeling language based on probabilistic graphical models with latent variables and temporal dependencies. The specified models can be learnt from large data sets using parallel or distributed implementa- tions of Bayesian learning algorithms for either streaming or batch data. These algorithms are based on a flexible variational message passing scheme, which supports discrete and continu- ous variables from a wide range of probability distributions. AMIDST also leverages existing functionality and algorithms by interfacing to software tools such as Flink, Spark, MOA, Weka, R and HUGIN. AMIDST is an open source toolbox written in Java and available at http://www.amidsttoolbox.com under the Apache Software License version 2.0. △ Less

Submitted 4 April, 2017; originally announced April 2017.

ACM Class: I.2.6

arXiv:1604.07990 [pdf, other]

doi 10.1109/MCI.2016.2532267

Probabilistic Graphical Models on Multi-Core CPUs using Java 8

Authors: Andres R. Masegosa, Ana M. Martinez, Hanen Borchani

Abstract: In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum… ▽ More In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimisation problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multi-core processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions. △ Less

Submitted 27 April, 2016; originally announced April 2016.

Comments: Pre-print version of the paper presented in the special issue on Computational Intelligence Software at IEEE Computational Intelligence Magazine journal

Journal ref: IEEE Computational Intelligence Magazine, 11(2), 41-54. 2016

Showing 1–10 of 10 results for author: Masegosa, A R