-
A note on generalization bounds for losses with finite moments
Authors:
Borja Rodríguez-Gálvez,
Omar Rivasplata,
Ragnar Thobaben,
Mikael Skoglund
Abstract:
This paper studies the truncation method from Alquier [1] to derive high-probability PAC-Bayes bounds for unbounded losses with heavy tails. Assuming that the $p$-th moment is bounded, the resulting bounds interpolate between a slow rate $1 / \sqrt{n}$ when $p=2$, and a fast rate $1 / n$ when $p \to \infty$ and the loss is essentially bounded. Moreover, the paper derives a high-probability PAC-Bay…
▽ More
This paper studies the truncation method from Alquier [1] to derive high-probability PAC-Bayes bounds for unbounded losses with heavy tails. Assuming that the $p$-th moment is bounded, the resulting bounds interpolate between a slow rate $1 / \sqrt{n}$ when $p=2$, and a fast rate $1 / n$ when $p \to \infty$ and the loss is essentially bounded. Moreover, the paper derives a high-probability PAC-Bayes bound for losses with a bounded variance. This bound has an exponentially better dependence on the confidence parameter and the dependency measure than previous bounds in the literature. Finally, the paper extends all results to guarantees in expectation and single-draw PAC-Bayes. In order to so, it obtains analogues of the PAC-Bayes fast rate bound for bounded losses from [2] in these settings.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity
Authors:
Borja Rodríguez-Gálvez,
Ragnar Thobaben,
Mikael Skoglund
Abstract:
In this paper, we present new high-probability PAC-Bayes bounds for different types of losses. Firstly, for losses with a bounded range, we recover a strengthened version of Catoni's bound that holds uniformly for all parameter values. This leads to new fast-rate and mixed-rate bounds that are interpretable and tighter than previous bounds in the literature. In particular, the fast-rate bound is e…
▽ More
In this paper, we present new high-probability PAC-Bayes bounds for different types of losses. Firstly, for losses with a bounded range, we recover a strengthened version of Catoni's bound that holds uniformly for all parameter values. This leads to new fast-rate and mixed-rate bounds that are interpretable and tighter than previous bounds in the literature. In particular, the fast-rate bound is equivalent to the Seeger--Langford bound. Secondly, for losses with more general tail behaviors, we introduce two new parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative generating function is bounded, and a bound when the loss' second moment is bounded. These two bounds are obtained using a new technique based on a discretization of the space of possible events for the ``in probability'' parameter optimization problem. This technique is both simpler and more general than previous approaches optimizing over a grid on the parameters' space. Finally, using a simple technique that is applicable to any existing bound, we extend all previous results to anytime-valid bounds.
△ Less
Submitted 4 June, 2024; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization
Authors:
Mahdi Haghifam,
Borja Rodríguez-Gálvez,
Ragnar Thobaben,
Mikael Skoglund,
Daniel M. Roy,
Gintare Karolina Dziugaite
Abstract:
To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds…
▽ More
To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
△ Less
Submitted 13 July, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.
-
Quadratic Signaling Games with Channel Combining Ratio
Authors:
Serkan Sarıtaş,
Photios A. Stavrou,
Ragnar Thobaben,
Mikael Skoglund
Abstract:
In this study, Nash and Stackelberg equilibria of single-stage and multi-stage quadratic signaling games between an encoder and a decoder are investigated. In the considered setup, the objective functions of the encoder and the decoder are misaligned, there is a noisy channel between the encoder and the decoder, the encoder has a soft power constraint, and the decoder has also noisy observation of…
▽ More
In this study, Nash and Stackelberg equilibria of single-stage and multi-stage quadratic signaling games between an encoder and a decoder are investigated. In the considered setup, the objective functions of the encoder and the decoder are misaligned, there is a noisy channel between the encoder and the decoder, the encoder has a soft power constraint, and the decoder has also noisy observation of the source to be estimated. We show that there exist only linear encoding and decoding strategies at the Stackelberg equilibrium, and derive the equilibrium strategies and costs. Regarding the Nash equilibrium, we explicitly characterize affine equilibria for the single-stage setup and show that the optimal encoder (resp. decoder) is affine for an affine decoder (resp. encoder) for the multi-stage setup. For the decoder side, between the information coming from the encoder and noisy observation of the source, our results describe what should be the combining ratio of these two channels. Regarding the encoder, we derive the conditions under which it is meaningful to transmit a message.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Tighter expected generalization error bounds via Wasserstein distance
Authors:
Borja Rodríguez-Gálvez,
Germán Bassi,
Ragnar Thobaben,
Mikael Skoglund
Abstract:
This work presents several expected generalization error bounds based on the Wasserstein distance. More specifically, it introduces full-dataset, single-letter, and random-subset bounds, and their analogues in the randomized subsample setting from Steinke and Zakynthinou [1]. Moreover, when the loss function is bounded and the geometry of the space is ignored by the choice of the metric in the Was…
▽ More
This work presents several expected generalization error bounds based on the Wasserstein distance. More specifically, it introduces full-dataset, single-letter, and random-subset bounds, and their analogues in the randomized subsample setting from Steinke and Zakynthinou [1]. Moreover, when the loss function is bounded and the geometry of the space is ignored by the choice of the metric in the Wasserstein distance, these bounds recover from below (and thus, are tighter than) current bounds based on the relative entropy. In particular, they generate new, non-vacuous bounds based on the relative entropy. Therefore, these results can be seen as a bridge between works that account for the geometry of the hypothesis space and those based on the relative entropy, which is agnostic to such geometry. Furthermore, it is shown how to produce various new bounds based on different information measures (e.g., the lautum information or several $f$-divergences) based on these bounds and how to derive similar bounds with respect to the backward channel using the presented proof techniques.
△ Less
Submitted 25 March, 2022; v1 submitted 22 January, 2021;
originally announced January 2021.
-
On Random Subset Generalization Error Bounds and the Stochastic Gradient Langevin Dynamics Algorithm
Authors:
Borja Rodríguez-Gálvez,
Germán Bassi,
Ragnar Thobaben,
Mikael Skoglund
Abstract:
In this work, we unify several expected generalization error bounds based on random subsets using the framework developed by Hellström and Durisi [1]. First, we recover the bounds based on the individual sample mutual information from Bu et al. [2] and on a random subset of the dataset from Negrea et al. [3]. Then, we introduce their new, analogous bounds in the randomized subsample setting from S…
▽ More
In this work, we unify several expected generalization error bounds based on random subsets using the framework developed by Hellström and Durisi [1]. First, we recover the bounds based on the individual sample mutual information from Bu et al. [2] and on a random subset of the dataset from Negrea et al. [3]. Then, we introduce their new, analogous bounds in the randomized subsample setting from Steinke and Zakynthinou [4], and we identify some limitations of the framework. Finally, we extend the bounds from Haghifam et al. [5] for Langevin dynamics to stochastic gradient Langevin dynamics and we refine them for loss functions with potentially large gradient norms.
△ Less
Submitted 16 January, 2021; v1 submitted 21 October, 2020;
originally announced October 2020.
-
Region-based Energy Neural Network for Approximate Inference
Authors:
Dong Liu,
Ragnar Thobaben,
Lars K. Rasmussen
Abstract:
Region-based free energy was originally proposed for generalized belief propagation (GBP) to improve loopy belief propagation (loopy BP). In this paper, we propose a neural network based energy model for inference in general Markov random fields (MRFs), which directly minimizes the region-based free energy defined on region graphs. We term our model Region-based Energy Neural Network (RENN). Unlik…
▽ More
Region-based free energy was originally proposed for generalized belief propagation (GBP) to improve loopy belief propagation (loopy BP). In this paper, we propose a neural network based energy model for inference in general Markov random fields (MRFs), which directly minimizes the region-based free energy defined on region graphs. We term our model Region-based Energy Neural Network (RENN). Unlike message-passing algorithms, RENN avoids iterative message propagation and is faster. Also different from recent deep neural network based models, inference by RENN does not require sampling, and RENN works on general MRFs. RENN can also be employed for MRF learning. Our experiments on marginal distribution estimation, partition function estimation, and learning of MRFs show that RENN outperforms the mean field method, loopy BP, GBP, and the state-of-the-art neural network based model.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
A Variational Approach to Privacy and Fairness
Authors:
Borja Rodríguez-Gálvez,
Ragnar Thobaben,
Mikael Skoglund
Abstract:
In this article, we propose a new variational approach to learn private and/or fair representations. This approach is based on the Lagrangians of a new formulation of the privacy and fairness optimization problems that we propose. In this formulation, we aim to generate representations of the data that keep a prescribed level of the relevant information that is not shared by the private or sensiti…
▽ More
In this article, we propose a new variational approach to learn private and/or fair representations. This approach is based on the Lagrangians of a new formulation of the privacy and fairness optimization problems that we propose. In this formulation, we aim to generate representations of the data that keep a prescribed level of the relevant information that is not shared by the private or sensitive data, while minimizing the remaining information they keep. The proposed approach (i) exhibits the similarities of the privacy and fairness problems, (ii) allows us to control the trade-off between utility and privacy or fairness through the Lagrange multiplier parameter, and (iii) can be comfortably incorporated to common representation learning algorithms such as the VAE, the $β$-VAE, the VIB, or the nonlinear IB.
△ Less
Submitted 6 September, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
The Convex Information Bottleneck Lagrangian
Authors:
Borja Rodríguez-Gálvez,
Ragnar Thobaben,
Mikael Skoglund
Abstract:
The information bottleneck (IB) problem tackles the issue of obtaining relevant compressed representations $T$ of some random variable $X$ for the task of predicting $Y$. It is defined as a constrained optimization problem which maximizes the information the representation has about the task, $I(T;Y)$, while ensuring that a certain level of compression $r$ is achieved (i.e., $ I(X;T) \leq r$). For…
▽ More
The information bottleneck (IB) problem tackles the issue of obtaining relevant compressed representations $T$ of some random variable $X$ for the task of predicting $Y$. It is defined as a constrained optimization problem which maximizes the information the representation has about the task, $I(T;Y)$, while ensuring that a certain level of compression $r$ is achieved (i.e., $ I(X;T) \leq r$). For practical reasons, the problem is usually solved by maximizing the IB Lagrangian (i.e., $\mathcal{L}_{\text{IB}}(T;β) = I(T;Y) - βI(X;T)$) for many values of $β\in [0,1]$. Then, the curve of maximal $I(T;Y)$ for a given $I(X;T)$ is drawn and a representation with the desired predictability and compression is selected. It is known when $Y$ is a deterministic function of $X$, the IB curve cannot be explored and another Lagrangian has been proposed to tackle this problem: the squared IB Lagrangian: $\mathcal{L}_{\text{sq-IB}}(T;β_{\text{sq}})=I(T;Y)-β_{\text{sq}}I(X;T)^2$. In this paper, we (i) present a general family of Lagrangians which allow for the exploration of the IB curve in all scenarios; (ii) provide the exact one-to-one map** between the Lagrange multiplier and the desired compression rate $r$ for known IB curve shapes; and (iii) show we can approximately obtain a specific compression level with the convex IB Lagrangian for both known and unknown IB curve shapes. This eliminates the burden of solving the optimization problem for many values of the Lagrange multiplier. That is, we prove that we can solve the original constrained problem with a single optimization.
△ Less
Submitted 10 January, 2020; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Multi-tone Signal Optimization for Wireless Power Transfer in the Presence of Wireless Communication Links
Authors:
Boules A. Mouris,
Hadi Ghauch,
Ragnar Thobaben,
B. L. G. Jonsson
Abstract:
In this paper, we study optimization of multi-tone signals for wireless power transfer (WPT) systems. We investigate different non-linear energy harvesting models. Two of them are adopted to optimize the multi-tone signal according to the channel state information available at the transmitter. We show that a second-order polynomial curve-fitting model can be utilized to optimize the multi-tone sig…
▽ More
In this paper, we study optimization of multi-tone signals for wireless power transfer (WPT) systems. We investigate different non-linear energy harvesting models. Two of them are adopted to optimize the multi-tone signal according to the channel state information available at the transmitter. We show that a second-order polynomial curve-fitting model can be utilized to optimize the multi-tone signal for any RF energy harvester design. We consider both single-antenna and multi-antenna WPT systems. In-band co-existing communication links are also considered in this work by imposing a constraint on the received power at the nearby information receiver to prevent its RF front end from saturation. We emphasize the importance of imposing such constraint by explaining how inter-modulation products, due to saturation, can cause high interference at the information receiver in the case of multi-tone signals. The multi-tone optimization problem is formulated as a non-convex linearly constrained quadratic program. Two globally optimal solution approaches using mixed-integer linear programming and finite branch-and-bound techniques are proposed to solve the problem. The achieved improvement resulting from applying both solution methods to the multi-tone optimization problem is highlighted through simulations and comparisons with other solutions existing in the literature.
△ Less
Submitted 5 March, 2019;
originally announced March 2019.
-
Physical Layer Authentication in Mission-Critical MTC Networks: A Security and Delay Performance Analysis
Authors:
Henrik Forssell,
Ragnar Thobaben,
Hussein Al-Zubaidy,
James Gross
Abstract:
We study the detection and delay performance impacts of a feature-based physical layer authentication (PLA) protocol in mission-critical machine-type communication (MTC) networks. The PLA protocol uses generalized likelihood-ratio testing based on the line-of-sight (LOS), single-input multiple-output channel-state information in order to mitigate impersonation attempts from an adversary node. We s…
▽ More
We study the detection and delay performance impacts of a feature-based physical layer authentication (PLA) protocol in mission-critical machine-type communication (MTC) networks. The PLA protocol uses generalized likelihood-ratio testing based on the line-of-sight (LOS), single-input multiple-output channel-state information in order to mitigate impersonation attempts from an adversary node. We study the detection performance, develop a queueing model that captures the delay impacts of erroneous decisions in the PLA (i.e., the false alarms and missed detections), and model three different adversary strategies: data injection, disassociation, and Sybil attacks. Our main contribution is the derivation of analytical delay performance bounds that allow us to quantify the delay introduced by PLA that potentially can degrade the performance in mission-critical MTC networks. For the delay analysis, we utilize tools from stochastic network calculus. Our results show that with a sufficient number of receive antennas (approx. 4-8) and sufficiently strong LOS components from legitimate devices, PLA is a viable option for securing mission-critical MTC systems, despite the low latency requirements associated to corresponding use cases. Furthermore, we find that PLA can be very effective in detecting the considered attacks, and in particular, it can significantly reduce the delay impacts of disassociation and Sybil attacks.
△ Less
Submitted 27 June, 2018;
originally announced June 2018.
-
Multi-Phase Smart Relaying and Cooperative Jamming in Secure Cognitive Radio Networks
Authors:
Pin-Hsun Lin,
Frédéric Gabry,
Ragnar Thobaben,
Eduard A. Jorswieck,
Mikael Skoglund
Abstract:
In this paper we investigate cooperative secure communications in a four-node cognitive radio network where the secondary receiver is treated as a potential eavesdropper with respect to the primary transmission. The secondary user is allowed to transmit his own signals under the condition that the primary user's secrecy rate and transmission scheme are intact. Under this setting we derive the seco…
▽ More
In this paper we investigate cooperative secure communications in a four-node cognitive radio network where the secondary receiver is treated as a potential eavesdropper with respect to the primary transmission. The secondary user is allowed to transmit his own signals under the condition that the primary user's secrecy rate and transmission scheme are intact. Under this setting we derive the secondary user's achievable rates and the related constraints to guarantee the primary user's weak secrecy rate, when Gelfand-Pinsker coding is used at the secondary transmitter. In addition, we propose a multi-phase transmission scheme to include 1) the phases of the clean relaying with cooperative jamming and 2) the latency to successfully decode the primary message at the secondary transmitter. A capacity upper bound for the secondary user is also derived. Numerical results show that: 1) the proposed scheme can outperform the traditional ones by properly selecting the secondary user's parameters of different transmission schemes according to the relative positions of the nodes; 2) the derived capacity upper bound is close to the secondary user's achievable rate within 0.3 bits/channel use, especially when the secondary transmitter/receiver is far/close enough to the primary receiver/transmitter, respectively. Thereby, a smart secondary transmitter is able to adapt its relaying and cooperative jamming to guarantee primary secrecy rates and to transmit its own data at the same time from relevant geometric positions.
△ Less
Submitted 13 May, 2016;
originally announced May 2016.
-
Communication and Interference Coordination
Authors:
Ricardo Blasco-Serrano,
Ragnar Thobaben,
Mikael Skoglund
Abstract:
We study the problem of controlling the interference created to an external observer by a communication processes. We model the interference in terms of its type (empirical distribution), and we analyze the consequences of placing constraints on the admissible type. Considering a single interfering link, we characterize the communication-interference capacity region. Then, we look at a scenario wh…
▽ More
We study the problem of controlling the interference created to an external observer by a communication processes. We model the interference in terms of its type (empirical distribution), and we analyze the consequences of placing constraints on the admissible type. Considering a single interfering link, we characterize the communication-interference capacity region. Then, we look at a scenario where the interference is jointly created by two users allowed to coordinate their actions prior to transmission. In this case, the trade-off involves communication and interference as well as coordination. We establish an achievable communication-interference region and show that efficiency is significantly improved by coordination.
△ Less
Submitted 18 February, 2014;
originally announced February 2014.
-
Bilayer LDPC Convolutional Codes for Half-Duplex Relay Channels
Authors:
Zhongwei Si,
Ragnar Thobaben,
Mikael Skoglund
Abstract:
In this paper we present regular bilayer LDPC convolutional codes for half-duplex relay channels. For the binary erasure relay channel, we prove that the proposed code construction achieves the capacities for the source-relay link and the source-destination link provided that the channel conditions are known when designing the code. Meanwhile, this code enables the highest transmission rate with d…
▽ More
In this paper we present regular bilayer LDPC convolutional codes for half-duplex relay channels. For the binary erasure relay channel, we prove that the proposed code construction achieves the capacities for the source-relay link and the source-destination link provided that the channel conditions are known when designing the code. Meanwhile, this code enables the highest transmission rate with decode-and-forward relaying. In addition, its regular degree distributions can easily be computed from the channel parameters, which significantly simplifies the code optimization. Numerical results are provided for both binary erasure channels (BEC) and AWGN channels. In BECs, we can observe that the gaps between the decoding thresholds and the Shannon limits are impressively small. In AWGN channels, the bilayer LDPC convolutional code clearly outperforms its block code counterpart in terms of bit error rate.
△ Less
Submitted 25 February, 2011;
originally announced February 2011.
-
Performance Analysis and Design of Two Edge Type LDPC Codes for the BEC Wiretap Channel
Authors:
Vishwambhar Rathi,
Mattias Andersson,
Ragnar Thobaben,
Joerg Kliewer,
Mikael Skoglund
Abstract:
We consider transmission over a wiretap channel where both the main channel and the wiretapper's channel are Binary Erasure Channels (BEC). We propose a code construction method using two edge type LDPC codes based on the coset encoding scheme. Using a standard LDPC ensemble with a given threshold over the BEC, we give a construction for a two edge type LDPC ensemble with the same threshold. If th…
▽ More
We consider transmission over a wiretap channel where both the main channel and the wiretapper's channel are Binary Erasure Channels (BEC). We propose a code construction method using two edge type LDPC codes based on the coset encoding scheme. Using a standard LDPC ensemble with a given threshold over the BEC, we give a construction for a two edge type LDPC ensemble with the same threshold. If the given standard LDPC ensemble has degree two variable nodes, our construction gives rise to degree one variable nodes in the code used over the main channel. This results in zero threshold over the main channel. In order to circumvent this problem, we numerically optimize the degree distribution of the two edge type LDPC ensemble. We find that the resulting ensembles are able to perform close to the boundary of the rate-equivocation region of the wiretap channel.
There are two performance criteria for a coding scheme used over a wiretap channel: reliability and secrecy. The reliability measure corresponds to the probability of decoding error for the intended receiver. This can be easily measured using density evolution recursion. However, it is more challenging to characterize secrecy, corresponding to the equivocation of the message for the wiretapper. Méasson, Montanari, and Urbanke have shown how the equivocation can be measured for a broad range of standard LDPC ensembles for transmission over the BEC under the point-to-point setup. By generalizing the method of Méasson, Montanari, and Urbanke to two edge type LDPC ensembles, we show how the equivocation for the wiretapper can be computed. We find that relatively simple constructions give very good secrecy performance and are close to the secrecy capacity. However finding explicit sequences of two edge type LDPC ensembles which achieve secrecy capacity is a more difficult problem. We pose it as an interesting open problem.
△ Less
Submitted 30 November, 2011; v1 submitted 23 September, 2010;
originally announced September 2010.
-
Nested Polar Codes for Wiretap and Relay Channels
Authors:
Mattias Andersson,
Vishwambhar Rathi,
Ragnar Thobaben,
Joerg Kliewer,
Mikael Skoglund
Abstract:
We show that polar codes asymptotically achieve the whole capacity-equivocation region for the wiretap channel when the wiretapper's channel is degraded with respect to the main channel, and the weak secrecy notion is used. Our coding scheme also achieves the capacity of the physically degraded receiver-orthogonal relay channel. We show simulation results for moderate block length for the binary e…
▽ More
We show that polar codes asymptotically achieve the whole capacity-equivocation region for the wiretap channel when the wiretapper's channel is degraded with respect to the main channel, and the weak secrecy notion is used. Our coding scheme also achieves the capacity of the physically degraded receiver-orthogonal relay channel. We show simulation results for moderate block length for the binary erasure wiretap channel, comparing polar codes and two edge type LDPC codes.
△ Less
Submitted 17 June, 2010;
originally announced June 2010.