-
Orthogonal AMP for Problems with Multiple Measurement Vectors and/or Multiple Transforms
Authors:
Yiyao Cheng,
Lei Liu,
Shansuo Liang,
Jonathan. H. Manton,
Li **
Abstract:
Approximate message passing (AMP) algorithms break a (high-dimensional) statistical problem into parts then repeatedly solve each part in turn, akin to alternating projections. A distinguishing feature is their asymptotic behaviours can be accurately predicted via their associated state evolution equations. Orthogonal AMP (OAMP) was recently developed to avoid the need for computing the so-called…
▽ More
Approximate message passing (AMP) algorithms break a (high-dimensional) statistical problem into parts then repeatedly solve each part in turn, akin to alternating projections. A distinguishing feature is their asymptotic behaviours can be accurately predicted via their associated state evolution equations. Orthogonal AMP (OAMP) was recently developed to avoid the need for computing the so-called Onsager term in traditional AMP algorithms, providing two clear benefits: the derivation of an OAMP algorithm is both straightforward and more broadly applicable. OAMP was originally demonstrated for statistical problems with a single measurement vector and single transform. This paper extends OAMP to statistical problems with multiple measurement vectors (MMVs) and multiple transforms (MTs). We name the resulting algorithms as OAMP-MMV and OAMP-MT respectively, and their combination as augmented OAMP (A-OAMP). Whereas the extension of traditional AMP algorithms to such problems would be challenging, the orthogonal principle underpinning OAMP makes these extensions straightforward.
The MMV and MT models are widely applicable to signal processing and communications. We present an example of MIMO relay system with correlated source data and signal clip**, which can be modelled as a joint MMV-MT system. While existing methods meet with difficulties in this example, OAMP offers an efficient solution with excellent performance.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
On the tightness of information-theoretic bounds on generalization error of learning algorithms
Authors:
Xuetong Wu,
Jonathan H. Manton,
Uwe Aickelin,
**gge Zhu
Abstract:
A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of $O(\sqrt{λ/n})$ where $λ$ is some information-theoretic quantities such as the mutual information or conditional mutual information…
▽ More
A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of $O(\sqrt{λ/n})$ where $λ$ is some information-theoretic quantities such as the mutual information or conditional mutual information between the data and the learned hypothesis. However, such a learning rate is typically considered to be ``slow", compared to a ``fast rate" of $O(λ/n)$ in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the critical conditions needed for the fast rate generalization error, which we call the $(η,c)$-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a fast convergence rate for specific learning algorithms such as empirical risk minimization and its regularized version. Finally, several analytical examples are given to show the effectiveness of the bounds.
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
An Information-Theoretic Analysis for Transfer Learning: Error Bounds and Applications
Authors:
Xuetong Wu,
Jonathan H. Manton,
Uwe Aickelin,
**gge Zhu
Abstract:
Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different probability distributions. In this work, we give an information-theoretic analysis on the generalization error and excess risk of transfer learning algorithms, following a line of work initiated by Russo and Xu. Our results suggest, perhaps as expecte…
▽ More
Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different probability distributions. In this work, we give an information-theoretic analysis on the generalization error and excess risk of transfer learning algorithms, following a line of work initiated by Russo and Xu. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence $D(μ||μ')$ plays an important role in the characterizations where $μ$ and $μ'$ denote the distribution of the training data and the testing test, respectively. Specifically, we provide generalization error upper bounds for the empirical risk minimization (ERM) algorithm where data from both distributions are available in the training phase. We further apply the analysis to approximated ERM methods such as the Gibbs algorithm and the stochastic gradient descent method. We then generalize the mutual information bound with $φ$-divergence and Wasserstein distance. These generalizations lead to tighter bounds and can handle the case when $μ$ is not absolutely continuous with respect to $μ'$. Furthermore, we apply a new set of techniques to obtain an alternative upper bound which gives a fast (and optimal) learning rate for some learning problems. Finally, inspired by the derived bounds, we propose the InfoBoost algorithm in which the importance weights for source and target data are adjusted adaptively in accordance to information measures. The empirical results show the effectiveness of the proposed algorithm.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
On Causality in Domain Adaptation and Semi-Supervised Learning: an Information-Theoretic Analysis
Authors:
Xuetong Wu,
Mingming Gong,
Jonathan H. Manton,
Uwe Aickelin,
**gge Zhu
Abstract:
The establishment of the link between causality and unsupervised domain adaptation (UDA)/semi-supervised learning (SSL) has led to methodological advances in these learning problems in recent years. However, a formal theory that explains the role of causality in the generalization performance of UDA/SSL is still lacking. In this paper, we consider the UDA/SSL setting where we access m labeled sour…
▽ More
The establishment of the link between causality and unsupervised domain adaptation (UDA)/semi-supervised learning (SSL) has led to methodological advances in these learning problems in recent years. However, a formal theory that explains the role of causality in the generalization performance of UDA/SSL is still lacking. In this paper, we consider the UDA/SSL setting where we access m labeled source data and n unlabeled target data as training instances under a parametric probabilistic model. We study the learning performance (e.g., excess risk) of prediction in the target domain. Specifically, we distinguish two scenarios: the learning problem is called causal learning if the feature is the cause and the label is the effect, and is called anti-causal learning otherwise. We show that in causal learning, the excess risk depends on the size of the source sample at a rate of O(1/m) only if the labelling distribution between the source and target domains remains unchanged. In anti-causal learning, we show that the unlabeled data dominate the performance at a rate of typically O(1/n). Our analysis is based on the notion of potential outcome random variables and information theory. These results bring out the relationship between the data sample size and the hardness of the learning problem with different causal mechanisms.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Fast Rate Generalization Error Bounds: Variations on a Theme
Authors:
Xuetong Wu,
Jonathan H. Manton,
Uwe Aickelin,
**gge Zhu
Abstract:
A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O(sqrt{lambda/n}) where lambda is some information-theoretic quantities such as the mutual information between the data sample and t…
▽ More
A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O(sqrt{lambda/n}) where lambda is some information-theoretic quantities such as the mutual information between the data sample and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate" of O(1/n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate (O(1/n)) result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the key conditions needed for the fast rate generalization error, which we call the (eta,c)-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O(λ/{n}) for specific learning algorithms such as empirical risk minimization. Finally, analytical examples are given to show the effectiveness of the bounds.
△ Less
Submitted 13 May, 2022; v1 submitted 6 May, 2022;
originally announced May 2022.
-
On Orthogonal Approximate Message Passing
Authors:
Lei Liu,
Yiyao Cheng,
Shansuo Liang,
Jonathan H. Manton,
Li **
Abstract:
Approximate Message Passing (AMP) is an efficient iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions, such as sparse systems. In AMP, a so-called Onsager term is added to keep estimation errors approximately Gaussian. Orthogonal AMP (OAMP) does not require this Onsager term, relying instead on an orthogonalization procedure to keep…
▽ More
Approximate Message Passing (AMP) is an efficient iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions, such as sparse systems. In AMP, a so-called Onsager term is added to keep estimation errors approximately Gaussian. Orthogonal AMP (OAMP) does not require this Onsager term, relying instead on an orthogonalization procedure to keep the current errors uncorrelated with (i.e., orthogonal to) past errors. \LL{In this paper, we show the generality and significance of the orthogonality in ensuring that errors are "asymptotically independently and identically distributed Gaussian" (AIIDG).} This AIIDG property, which is essential for the attractive performance of OAMP, holds for separable functions. \LL{We present a simple and versatile procedure to establish the orthogonality through Gram-Schmidt (GS) orthogonalization, which is applicable to any prototype. We show that different AMP-type algorithms, such as expectation propagation (EP), turbo, AMP and OAMP, can be unified under the orthogonal principle.} The simplicity and generality of OAMP provide efficient solutions for estimation problems beyond the classical linear models. \LL{As an example, we study the optimization of OAMP via the GS model and GS orthogonalization.} More related applications will be discussed in a companion paper where new algorithms are developed for problems with multiple constraints and multiple measurement variables.
△ Less
Submitted 13 January, 2023; v1 submitted 28 February, 2022;
originally announced March 2022.
-
A Bayesian Approach to (Online) Transfer Learning: Theory and Algorithms
Authors:
Xuetong Wu,
Jonathan H. Manton,
Uwe Aickelin,
**gge Zhu
Abstract:
Transfer learning is a machine learning paradigm where knowledge from one problem is utilized to solve a new but related problem. While conceivable that knowledge from one task could be useful for solving a related task, if not executed properly, transfer learning algorithms can impair the learning performance instead of improving it -- commonly known as negative transfer. In this paper, we study…
▽ More
Transfer learning is a machine learning paradigm where knowledge from one problem is utilized to solve a new but related problem. While conceivable that knowledge from one task could be useful for solving a related task, if not executed properly, transfer learning algorithms can impair the learning performance instead of improving it -- commonly known as negative transfer. In this paper, we study transfer learning from a Bayesian perspective, where a parametric statistical model is used. Specifically, we study three variants of transfer learning problems, instantaneous, online, and time-variant transfer learning. For each problem, we define an appropriate objective function, and provide either exact expressions or upper bounds on the learning performance using information-theoretic quantities, which allow simple and explicit characterizations when the sample size becomes large. Furthermore, examples show that the derived bounds are accurate even for small sample sizes. The obtained bounds give valuable insights into the effect of prior knowledge for transfer learning, at least with respect to our Bayesian formulation of the transfer learning problem. In particular, we formally characterize the conditions under which negative transfer occurs. Lastly, we devise two (online) transfer learning algorithms that are amenable to practical implementations, one of which does not require the parametric assumption. We demonstrate the effectiveness of our algorithms with real data sets, focusing primarily on when the source and target data have strong similarities.
△ Less
Submitted 30 September, 2021; v1 submitted 3 September, 2021;
originally announced September 2021.
-
Online Transfer Learning: Negative Transfer and Effect of Prior Knowledge
Authors:
Xuetong Wu,
Jonathan H. Manton,
Uwe Aickelin,
**gge Zhu
Abstract:
Transfer learning is a machine learning paradigm where the knowledge from one task is utilized to resolve the problem in a related task. On the one hand, it is conceivable that knowledge from one task could be useful for solving a related problem. On the other hand, it is also recognized that if not executed properly, transfer learning algorithms could in fact impair the learning performance inste…
▽ More
Transfer learning is a machine learning paradigm where the knowledge from one task is utilized to resolve the problem in a related task. On the one hand, it is conceivable that knowledge from one task could be useful for solving a related problem. On the other hand, it is also recognized that if not executed properly, transfer learning algorithms could in fact impair the learning performance instead of improving it - commonly known as "negative transfer". In this paper, we study the online transfer learning problems where the source samples are given in an offline way while the target samples arrive sequentially. We define the expected regret of the online transfer learning problem and provide upper bounds on the regret using information-theoretic quantities. We also obtain exact expressions for the bounds when the sample size becomes large. Examples show that the derived bounds are accurate even for small sample sizes. Furthermore, the obtained bounds give valuable insight on the effect of prior knowledge for transfer learning in our formulation. In particular, we formally characterize the conditions under which negative transfer occurs.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
New Insights on Learning Rules for Hopfield Networks: Memory and Objective Function Minimisation
Authors:
Pavel Tolmachev,
Jonathan H. Manton
Abstract:
Hopfield neural networks are a possible basis for modelling associative memory in living organisms. After summarising previous studies in the field, we take a new look at learning rules, exhibiting them as descent-type algorithms for various cost functions. We also propose several new cost functions suitable for learning. We discuss the role of biases (the external inputs) in the learning process…
▽ More
Hopfield neural networks are a possible basis for modelling associative memory in living organisms. After summarising previous studies in the field, we take a new look at learning rules, exhibiting them as descent-type algorithms for various cost functions. We also propose several new cost functions suitable for learning. We discuss the role of biases (the external inputs) in the learning process in Hopfield networks. Furthermore, we apply Newtons method for learning memories, and experimentally compare the performances of various learning rules. Finally, to add to the debate whether allowing connections of a neuron to itself enhances memory capacity, we numerically investigate the effects of self coupling.
Keywords: Hopfield Networks, associative memory, content addressable memory, learning rules, gradient descent, attractor networks
△ Less
Submitted 3 October, 2020;
originally announced October 2020.
-
Information-theoretic analysis for transfer learning
Authors:
Xuetong Wu,
Jonathan H. Manton,
Uwe Aickelin,
**gge Zhu
Abstract:
Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different distributions (denoted as $μ$ and $μ'$, respectively). In this work, we give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms, following a line of work initiated by Russo and Zhou. Our r…
▽ More
Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different distributions (denoted as $μ$ and $μ'$, respectively). In this work, we give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms, following a line of work initiated by Russo and Zhou. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence $D(mu||mu')$ plays an important role in characterizing the generalization error in the settings of domain adaptation. Specifically, we provide generalization error upper bounds for general transfer learning algorithms and extend the results to a specific empirical risk minimization (ERM) algorithm where data from both distributions are available in the training phase. We further apply the method to iterative, noisy gradient descent algorithms, and obtain upper bounds which can be easily calculated, only using parameters from the learning algorithms. A few illustrative examples are provided to demonstrate the usefulness of the results. In particular, our bound is tighter in specific classification problems than the bound derived using Rademacher complexity.
△ Less
Submitted 18 May, 2020; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Isotropic Multiple Scattering Processes on Hyperspheres
Authors:
Nicolas Le Bihan,
Florent Chatelain,
Jonathan H. Manton
Abstract:
This paper presents several results about isotropic random walks and multiple scattering processes on hyperspheres ${\mathbb S}^{p-1}$. It allows one to derive the Fourier expansions on ${\mathbb S}^{p-1}$ of these processes. A result of unimodality for the multiconvolution of symmetrical probability density functions (pdf) on ${\mathbb S}^{p-1}$ is also introduced. Such processes are then studied…
▽ More
This paper presents several results about isotropic random walks and multiple scattering processes on hyperspheres ${\mathbb S}^{p-1}$. It allows one to derive the Fourier expansions on ${\mathbb S}^{p-1}$ of these processes. A result of unimodality for the multiconvolution of symmetrical probability density functions (pdf) on ${\mathbb S}^{p-1}$ is also introduced. Such processes are then studied in the case where the scattering distribution is von Mises Fisher (vMF). Asymptotic distributions for the multiconvolution of vMFs on ${\mathbb S}^{p-1}$ are obtained. Both Fourier expansion and asymptotic approximation allows us to compute estimation bounds for the parameters of Compound Cox Processes (CCP) on ${\mathbb S}^{p-1}$.
△ Less
Submitted 13 December, 2015; v1 submitted 12 August, 2014;
originally announced August 2014.
-
An Introductory Review of Information Theory in the Context of Computational Neuroscience
Authors:
Mark D. McDonnell,
Shiro Ikeda,
Jonathan H. Manton
Abstract:
This paper introduces several fundamental concepts in information theory from the perspective of their origins in engineering. Understanding such concepts is important in neuroscience for two reasons. Simply applying formulae from information theory without understanding the assumptions behind their definitions can lead to erroneous results and conclusions. Furthermore, this century will see a con…
▽ More
This paper introduces several fundamental concepts in information theory from the perspective of their origins in engineering. Understanding such concepts is important in neuroscience for two reasons. Simply applying formulae from information theory without understanding the assumptions behind their definitions can lead to erroneous results and conclusions. Furthermore, this century will see a convergence of information theory and neuroscience; information theory will expand its foundations to incorporate more comprehensively biological processes thereby hel** reveal how neuronal networks achieve their remarkable information processing abilities.
△ Less
Submitted 14 July, 2011;
originally announced July 2011.
-
Decompounding on compact Lie groups
Authors:
Salem Said,
Christian Lageman,
Nicolas Le Bihan,
Jonathan H. Manton
Abstract:
Noncommutative harmonic analysis is used to solve a nonparametric estimation problem stated in terms of compound Poisson processes on compact Lie groups. This problem of decompounding is a generalization of a similar classical problem. The proposed solution is based on a char- acteristic function method. The treated problem is important to recent models of the physical inverse problem of multipl…
▽ More
Noncommutative harmonic analysis is used to solve a nonparametric estimation problem stated in terms of compound Poisson processes on compact Lie groups. This problem of decompounding is a generalization of a similar classical problem. The proposed solution is based on a char- acteristic function method. The treated problem is important to recent models of the physical inverse problem of multiple scattering.
△ Less
Submitted 15 July, 2009;
originally announced July 2009.