-
Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence
Authors:
Gholamali Aminian,
Amirhossien Bagheri,
Mahyar JafariNodeh,
Radmehr Karimian,
Mohammad-Hossein Yassaee
Abstract:
This paper investigates a range of empirical risk functions and regularization methods suitable for self-training methods in semi-supervised learning. These approaches draw inspiration from various divergence measures, such as $f$-divergences and $α$-Rényi divergences. Inspired by the theoretical foundations rooted in divergences, i.e., $f$-divergences and $α$-Rényi divergence, we also provide val…
▽ More
This paper investigates a range of empirical risk functions and regularization methods suitable for self-training methods in semi-supervised learning. These approaches draw inspiration from various divergence measures, such as $f$-divergences and $α$-Rényi divergences. Inspired by the theoretical foundations rooted in divergences, i.e., $f$-divergences and $α$-Rényi divergence, we also provide valuable insights to enhance the understanding of our empirical risk functions and regularization techniques. In the pseudo-labeling and entropy minimization techniques as self-training methods for effective semi-supervised learning, the self-training process has some inherent mismatch between the true label and pseudo-label (noisy pseudo-labels) and some of our empirical risk functions are robust, concerning noisy pseudo-labels. Under some conditions, our empirical risk functions demonstrate better performance when compared to traditional self-training methods.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Generalization Error of Graph Neural Networks in the Mean-field Regime
Authors:
Gholamali Aminian,
Yixuan He,
Gesine Reinert,
Łukasz Szpruch,
Samuel N. Cohen
Abstract:
This work provides a theoretical framework for assessing the generalization error of graph neural networks in the over-parameterized regime, where the number of parameters surpasses the quantity of data points. We explore two widely utilized types of graph neural networks: graph convolutional neural networks and message passing graph neural networks. Prior to this study, existing bounds on the gen…
▽ More
This work provides a theoretical framework for assessing the generalization error of graph neural networks in the over-parameterized regime, where the number of parameters surpasses the quantity of data points. We explore two widely utilized types of graph neural networks: graph convolutional neural networks and message passing graph neural networks. Prior to this study, existing bounds on the generalization error in the over-parametrized regime were uninformative, limiting our understanding of over-parameterized network performance. Our novel approach involves deriving upper bounds within the mean-field regime for evaluating the generalization error of these graph neural networks. We establish upper bounds with a convergence rate of $O(1/n)$, where $n$ is the number of graph samples. These upper bounds offer a theoretical assurance of the networks' performance on unseen data in the challenging over-parameterized regime and overall contribute to our understanding of their performance.
△ Less
Submitted 1 July, 2024; v1 submitted 10 February, 2024;
originally announced February 2024.
-
Mean-field Analysis of Generalization Errors
Authors:
Gholamali Aminian,
Samuel N. Cohen,
Łukasz Szpruch
Abstract:
We propose a novel framework for exploring weak and $L_2$ generalization errors of algorithms through the lens of differential calculus on the space of probability measures. Specifically, we consider the KL-regularized empirical risk minimization problem and establish generic conditions under which the generalization error convergence rate, when training on a sample of size $n$, is…
▽ More
We propose a novel framework for exploring weak and $L_2$ generalization errors of algorithms through the lens of differential calculus on the space of probability measures. Specifically, we consider the KL-regularized empirical risk minimization problem and establish generic conditions under which the generalization error convergence rate, when training on a sample of size $n$, is $\mathcal{O}(1/n)$. In the context of supervised learning with a one-hidden layer neural network in the mean-field regime, these conditions are reflected in suitable integrability and regularity assumptions on the loss and activation functions.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Learning Algorithm Generalization Error Bounds via Auxiliary Distributions
Authors:
Gholamali Aminian,
Saeed Masiha,
Laura Toni,
Miguel R. D. Rodrigues
Abstract:
Generalization error bounds are essential for comprehending how well machine learning models work. In this work, we suggest a novel method, i.e., the Auxiliary Distribution Method, that leads to new upper bounds on expected generalization errors that are appropriate for supervised learning scenarios. We show that our general upper bounds can be specialized under some conditions to new bounds invol…
▽ More
Generalization error bounds are essential for comprehending how well machine learning models work. In this work, we suggest a novel method, i.e., the Auxiliary Distribution Method, that leads to new upper bounds on expected generalization errors that are appropriate for supervised learning scenarios. We show that our general upper bounds can be specialized under some conditions to new bounds involving the $α$-Jensen-Shannon, $α$-Rényi ($0< α< 1$) information between a random variable modeling the set of training samples and another random variable modeling the set of hypotheses. Our upper bounds based on $α$-Jensen-Shannon information are also finite. Additionally, we demonstrate how our auxiliary distribution method can be used to derive the upper bounds on excess risk of some learning algorithms in the supervised learning context {\blue and the generalization error under the distribution mismatch scenario in supervised learning algorithms, where the distribution mismatch is modeled as $α$-Jensen-Shannon or $α$-Rényi divergence between the distribution of test and training data samples distributions.} We also outline the conditions for which our proposed upper bounds might be tighter than other earlier upper bounds.
△ Less
Submitted 16 April, 2024; v1 submitted 2 October, 2022;
originally announced October 2022.
-
An Information-theoretical Approach to Semi-supervised Learning under Covariate-shift
Authors:
Gholamali Aminian,
Mahed Abroshan,
Mohammad Mahdi Khalili,
Laura Toni,
Miguel R. D. Rodrigues
Abstract:
A common assumption in semi-supervised learning is that the labeled, unlabeled, and test data are drawn from the same distribution. However, this assumption is not satisfied in many applications. In many scenarios, the data is collected sequentially (e.g., healthcare) and the distribution of the data may change over time often exhibiting so-called covariate shifts. In this paper, we propose an app…
▽ More
A common assumption in semi-supervised learning is that the labeled, unlabeled, and test data are drawn from the same distribution. However, this assumption is not satisfied in many applications. In many scenarios, the data is collected sequentially (e.g., healthcare) and the distribution of the data may change over time often exhibiting so-called covariate shifts. In this paper, we propose an approach for semi-supervised learning algorithms that is capable of addressing this issue. Our framework also recovers some popular methods, including entropy minimization and pseudo-labeling. We provide new information-theoretical based generalization error upper bounds inspired by our novel framework. Our bounds are applicable to both general semi-supervised learning and the covariate-shift scenario. Finally, we show numerically that our method outperforms previous approaches proposed for semi-supervised learning under the covariate shift.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm
Authors:
Yuheng Bu,
Gholamali Aminian,
Laura Toni,
Miguel Rodrigues,
Gregory Wornell
Abstract:
We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, $α$-weighted-ERM and two-stage-ERM. Our key result is an exact characterization of the generalization behaviour using the conditional symmetrized KL information between the output hypothesis and the target training samples g…
▽ More
We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, $α$-weighted-ERM and two-stage-ERM. Our key result is an exact characterization of the generalization behaviour using the conditional symmetrized KL information between the output hypothesis and the target training samples given the source samples. Our results can also be applied to provide novel distribution-free generalization error upper bounds on these two aforementioned Gibbs algorithms. Our approach is versatile, as it also characterizes the generalization errors and excess risks of these two Gibbs algorithms in the asymptotic regime, where they converge to the $α$-weighted-ERM and two-stage-ERM, respectively. Based on our theoretical results, we show that the benefits of transfer learning can be viewed as a bias-variance trade-off, with the bias induced by the source distribution and the variance induced by the lack of target samples. We believe this viewpoint can guide the choice of transfer learning algorithms in practice.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Characterizing the Generalization Error of Gibbs Algorithm with Symmetrized KL information
Authors:
Gholamali Aminian,
Yuheng Bu,
Laura Toni,
Miguel R. D. Rodrigues,
Gregory Wornell
Abstract:
Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory, and various approaches have been developed. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expec…
▽ More
Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory, and various approaches have been developed. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm in terms of symmetrized KL information between the input training samples and the output hypothesis. Such a result can be applied to tighten existing expected generalization error bound. Our analysis provides more insight on the fundamental role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Information-Theoretic Bounds on the Moments of the Generalization Error of Learning Algorithms
Authors:
Gholamali Aminian,
Laura Toni,
Miguel R. D. Rodrigues
Abstract:
Generalization error bounds are critical to understanding the performance of machine learning models. In this work, building upon a new bound of the expected value of an arbitrary function of the population and empirical risk of a learning algorithm, we offer a more refined analysis of the generalization behaviour of a machine learning models based on a characterization of (bounds) to their genera…
▽ More
Generalization error bounds are critical to understanding the performance of machine learning models. In this work, building upon a new bound of the expected value of an arbitrary function of the population and empirical risk of a learning algorithm, we offer a more refined analysis of the generalization behaviour of a machine learning models based on a characterization of (bounds) to their generalization error moments. We discuss how the proposed bounds -- which also encompass new bounds to the expected generalization error -- relate to existing bounds in the literature. We also discuss how the proposed generalization error moment bounds can be used to construct new generalization error high-probability bounds.
△ Less
Submitted 5 May, 2021; v1 submitted 3 February, 2021;
originally announced February 2021.
-
Jensen-Shannon Information Based Characterization of the Generalization Error of Learning Algorithms
Authors:
Gholamali Aminian,
Laura Toni,
Miguel R. D. Rodrigues
Abstract:
Generalization error bounds are critical to understanding the performance of machine learning models. In this work, we propose a new information-theoretic based generalization error upper bound applicable to supervised learning scenarios. We show that our general bound can specialize in various previous bounds. We also show that our general bound can be specialized under some conditions to a new b…
▽ More
Generalization error bounds are critical to understanding the performance of machine learning models. In this work, we propose a new information-theoretic based generalization error upper bound applicable to supervised learning scenarios. We show that our general bound can specialize in various previous bounds. We also show that our general bound can be specialized under some conditions to a new bound involving the Jensen-Shannon information between a random variable modelling the set of training samples and another random variable modelling the hypothesis. We also prove that our bound can be tighter than mutual information-based bounds under some conditions.
△ Less
Submitted 8 January, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.