Search | arXiv e-print repository

arXiv:2012.04692 [pdf, other]

Locally optimal detection of stochastic targeted universal adversarial perturbations

Abstract: Deep learning image classifiers are known to be vulnerable to small adversarial perturbations of input images. In this paper, we derive the locally optimal generalized likelihood ratio test (LO-GLRT) based detector for detecting stochastic targeted universal adversarial perturbations (UAPs) of the classifier inputs. We also describe a supervised training method to learn the detector's parameters,… ▽ More Deep learning image classifiers are known to be vulnerable to small adversarial perturbations of input images. In this paper, we derive the locally optimal generalized likelihood ratio test (LO-GLRT) based detector for detecting stochastic targeted universal adversarial perturbations (UAPs) of the classifier inputs. We also describe a supervised training method to learn the detector's parameters, and demonstrate better performance of the detector compared to other detection methods on several popular image classification datasets. △ Less

Submitted 8 December, 2020; originally announced December 2020.

Comments: Submitted to ICASSP 2021

arXiv:2007.11693 [pdf, other]

Robust Machine Learning via Privacy/Rate-Distortion Theory

Authors: Ye Wang, Shuchin Aeron, Adnan Siraj Rakin, Toshiaki Koike-Akino, Pierre Moulin

Abstract: Robust machine learning formulations have emerged to address the prevalent vulnerability of deep neural networks to adversarial examples. Our work draws the connection between optimal robust learning and the privacy-utility tradeoff problem, which is a generalization of the rate-distortion problem. The saddle point of the game between a robust classifier and an adversarial perturbation can be foun… ▽ More Robust machine learning formulations have emerged to address the prevalent vulnerability of deep neural networks to adversarial examples. Our work draws the connection between optimal robust learning and the privacy-utility tradeoff problem, which is a generalization of the rate-distortion problem. The saddle point of the game between a robust classifier and an adversarial perturbation can be found via the solution of a maximum conditional entropy problem. This information-theoretic perspective sheds light on the fundamental tradeoff between robustness and clean data performance, which ultimately arises from the geometric structure of the underlying data distribution and perturbation constraints. △ Less

Submitted 18 May, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

Comments: 9 pages, 2 figures, accepted at 2021 IEEE International Symposium on Information Theory

arXiv:2006.01906 [pdf, other]

Detecting Audio Attacks on ASR Systems with Dropout Uncertainty

Authors: Tejas Jayashankar, Jonathan Le Roux, Pierre Moulin

Abstract: Various adversarial audio attacks have recently been developed to fool automatic speech recognition (ASR) systems. We here propose a defense against such attacks based on the uncertainty introduced by dropout in neural networks. We show that our defense is able to detect attacks created through optimized perturbations and frequency masking on a state-of-the-art end-to-end ASR system. Furthermore,… ▽ More Various adversarial audio attacks have recently been developed to fool automatic speech recognition (ASR) systems. We here propose a defense against such attacks based on the uncertainty introduced by dropout in neural networks. We show that our defense is able to detect attacks created through optimized perturbations and frequency masking on a state-of-the-art end-to-end ASR system. Furthermore, the defense can be made robust against attacks that are immune to noise reduction. We test our defense on Mozilla's CommonVoice dataset, the UrbanSound dataset, and an excerpt of the LibriSpeech dataset, showing that it achieves high detection accuracy in a wide range of scenarios. △ Less

Submitted 14 September, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

Comments: Accepted for publication at Interspeech 2020

arXiv:1704.00196 [pdf, ps, other]

doi 10.1007/s10107-018-01361-0

Faster Subgradient Methods for Functions with Hölderian Growth

Authors: Patrick R. Johnstone, Pierre Moulin

Abstract: The purpose of this manuscript is to derive new convergence results for several subgradient methods applied to minimizing nonsmooth convex functions with Hölderian growth. The growth condition is satisfied in many applications and includes functions with quadratic growth and weakly sharp minima as special cases. To this end there are three main contributions. First, for a constant and sufficiently… ▽ More The purpose of this manuscript is to derive new convergence results for several subgradient methods applied to minimizing nonsmooth convex functions with Hölderian growth. The growth condition is satisfied in many applications and includes functions with quadratic growth and weakly sharp minima as special cases. To this end there are three main contributions. First, for a constant and sufficiently small stepsize, we show that the subgradient method achieves linear convergence up to a certain region including the optimal set, with error of the order of the stepsize. Second, if appropriate problem parameters are known, we derive a decaying stepsize which obtains a much faster convergence rate than is suggested by the classical $O(1/\sqrt{k})$ result for the subgradient method. Thirdly we develop a novel "descending stairs" stepsize which obtains this faster convergence rate and also obtains linear convergence for the special case of weakly sharp functions. We also develop an adaptive variant of the "descending stairs" stepsize which achieves the same convergence rate without requiring an error bound constant which is difficult to estimate in practice. △ Less

Submitted 30 April, 2018; v1 submitted 1 April, 2017; originally announced April 2017.

Comments: 50 pages. First revised version (under submission to Math Programming)

Journal ref: Math. Program. 180, 417-450 (2020)

arXiv:1609.03626 [pdf, other]

Convergence Rates of Inertial Splitting Schemes for Nonconvex Composite Optimization

Authors: Patrick R. Johnstone, Pierre Moulin

Abstract: We study the convergence properties of a general inertial first-order proximal splitting algorithm for solving nonconvex nonsmooth optimization problems. Using the Kurdyka--Łojaziewicz (KL) inequality we establish new convergence rates which apply to several inertial algorithms in the literature. Our basic assumption is that the objective function is semialgebraic, which lends our results broad ap… ▽ More We study the convergence properties of a general inertial first-order proximal splitting algorithm for solving nonconvex nonsmooth optimization problems. Using the Kurdyka--Łojaziewicz (KL) inequality we establish new convergence rates which apply to several inertial algorithms in the literature. Our basic assumption is that the objective function is semialgebraic, which lends our results broad applicability in the fields of signal processing and machine learning. The convergence rates depend on the exponent of the "desingularizing function" arising in the KL inequality. Depending on this exponent, convergence may be finite, linear, or sublinear and of the form $O(k^{-p})$ for $p>1$. △ Less

Submitted 12 September, 2016; originally announced September 2016.

arXiv:1608.03697 [pdf, other]

On Information-Theoretic Characterizations of Markov Random Fields and Subfields

Authors: Raymond W. Yeung, Ali Al-Bashabsheh, Chao Chen, Qi Chen, Pierre Moulin

Abstract: Let $X_i, i \in V$ form a Markov random field (MRF) represented by an undirected graph $G = (V,E)$, and $V'$ be a subset of $V$. We determine the smallest graph that can always represent the subfield $X_i, i \in V'$ as an MRF. Based on this result, we obtain a necessary and sufficient condition for a subfield of a Markov tree to be also a Markov tree. When $G$ is a path so that $X_i, i \in V$ fo… ▽ More Let $X_i, i \in V$ form a Markov random field (MRF) represented by an undirected graph $G = (V,E)$, and $V'$ be a subset of $V$. We determine the smallest graph that can always represent the subfield $X_i, i \in V'$ as an MRF. Based on this result, we obtain a necessary and sufficient condition for a subfield of a Markov tree to be also a Markov tree. When $G$ is a path so that $X_i, i \in V$ form a Markov chain, it is known that the $I$-Measure is always nonnegative and the information diagram assumes a very special structure Kawabata and Yeung (1992). We prove that Markov chain is essentially the only MRF such that the $I$-Measure is always nonnegative. By applying our characterization of the smallest graph representation of a subfield of an MRF, we develop a recursive approach for constructing information diagrams for MRFs. Our work is built on the set-theoretic characterization of an MRF in Yeung, Lee, and Ye (2002). △ Less

Submitted 17 January, 2018; v1 submitted 12 August, 2016; originally announced August 2016.

arXiv:1603.05414 [pdf, ps, other]

Variable-Length Hashing

Authors: Honghai Yu, Pierre Moulin, Hong Wei Ng, Xiaoli Li

Abstract: Hashing has emerged as a popular technique for large-scale similarity search. Most learning-based hashing methods generate compact yet correlated hash codes. However, this redundancy is storage-inefficient. Hence we propose a lossless variable-length hashing (VLH) method that is both storage- and search-efficient. Storage efficiency is achieved by converting the fixed-length hash code into a varia… ▽ More Hashing has emerged as a popular technique for large-scale similarity search. Most learning-based hashing methods generate compact yet correlated hash codes. However, this redundancy is storage-inefficient. Hence we propose a lossless variable-length hashing (VLH) method that is both storage- and search-efficient. Storage efficiency is achieved by converting the fixed-length hash code into a variable-length code. Search efficiency is obtained by using a multiple hash table structure. With VLH, we are able to deliberately add redundancy into hash codes to improve retrieval performance with little sacrifice in storage efficiency or search complexity. In particular, we propose a block K-means hashing (B-KMH) method to obtain significantly improved retrieval performance with no increase in storage and marginal increase in computational cost. △ Less

Submitted 17 March, 2016; originally announced March 2016.

Comments: 10 pages, 6 figures

arXiv:1602.02726 [pdf, ps, other]

doi 10.1007/s10589-017-9896-7

Local and Global Convergence of a General Inertial Proximal Splitting Scheme

Authors: Patrick R. Johnstone, Pierre Moulin

Abstract: This paper is concerned with convex composite minimization problems in a Hilbert space. In these problems, the objective is the sum of two closed, proper, and convex functions where one is smooth and the other admits a computationally inexpensive proximal operator. We analyze a general family of inertial proximal splitting algorithms (GIPSA) for solving such problems. We establish finiteness of th… ▽ More This paper is concerned with convex composite minimization problems in a Hilbert space. In these problems, the objective is the sum of two closed, proper, and convex functions where one is smooth and the other admits a computationally inexpensive proximal operator. We analyze a general family of inertial proximal splitting algorithms (GIPSA) for solving such problems. We establish finiteness of the sum of squared increments of the iterates and optimality of the accumulation points. Weak convergence of the entire sequence then follows if the minimum is attained. Our analysis unifies and extends several previous results. We then focus on $\ell_1$-regularized optimization, which is the ubiquitous special case where the nonsmooth term is the $\ell_1$-norm. For certain parameter choices, GIPSA is amenable to a local analysis for this problem. For these choices we show that GIPSA achieves finite "active manifold identification", i.e. convergence in a finite number of iterations to the optimal support and sign, after which GIPSA reduces to minimizing a local smooth function. Local linear convergence then holds under certain conditions. We determine the rate in terms of the inertia, stepsize, and local curvature. Our local analysis is applicable to certain recent variants of the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), for which we establish active manifold identification and local linear convergence. Our analysis motivates the use of a momentum restart scheme in these FISTA variants to obtain the optimal local linear convergence rate. △ Less

Submitted 8 February, 2016; originally announced February 2016.

Comments: 33 pages 1 figure

Journal ref: Comput Optim Appl 67, 259-292 (2017)

arXiv:1504.06029 [pdf, ps, other]

On MMSE estimation from quantized observations in the nonasymptotic regime

Authors: Jaeho Lee, Maxim Raginsky, Pierre Moulin

Abstract: This paper studies MMSE estimation on the basis of quantized noisy observations. It presents nonasymptotic bounds on MMSE regret due to quantization for two settings: (1) estimation of a scalar random variable given a quantized vector of $n$ conditionally independent observations, and (2) estimation of a $p$-dimensional random vector given a quantized vector of $n$ observations (not necessarily in… ▽ More This paper studies MMSE estimation on the basis of quantized noisy observations. It presents nonasymptotic bounds on MMSE regret due to quantization for two settings: (1) estimation of a scalar random variable given a quantized vector of $n$ conditionally independent observations, and (2) estimation of a $p$-dimensional random vector given a quantized vector of $n$ observations (not necessarily independent) when the full MMSE estimator has a subgaussian concentration property. △ Less

Submitted 22 April, 2015; originally announced April 2015.

Comments: 5 pages; to be presented at ISIT 2015

arXiv:1502.02281 [pdf, other]

Local and Global Convergence of an Inertial Version of Forward-Backward Splitting

Authors: Patrick R. Johnstone, Pierre Moulin

Abstract: A problem of great interest in optimization is to minimize a sum of two closed, proper, and convex functions where one is smooth and the other has a computationally inexpensive proximal operator. In this paper we analyze a family of Inertial Forward-Backward Splitting (I-FBS) algorithms for solving this problem. We first apply a global Lyapunov analysis to I-FBS and prove weak convergence of the i… ▽ More A problem of great interest in optimization is to minimize a sum of two closed, proper, and convex functions where one is smooth and the other has a computationally inexpensive proximal operator. In this paper we analyze a family of Inertial Forward-Backward Splitting (I-FBS) algorithms for solving this problem. We first apply a global Lyapunov analysis to I-FBS and prove weak convergence of the iterates to a minimizer in a real Hilbert space. We then show that the algorithms achieve local linear convergence for "sparse optimization", which is the important special case where the nonsmooth term is the $\ell_1$-norm. This result holds under either a restricted strong convexity or a strict complimentary condition and we do not require the objective to be strictly convex. For certain parameter choices we determine an upper bound on the number of iterations until the iterates are confined on a manifold containing the solution set and linear convergence holds. The local linear convergence result for sparse optimization holds for the Fast Iterative Shrinkage and Soft Thresholding Algorithm (FISTA) due to Beck and Teboulle which is a particular parameter choice for I-FBS. In spite of its optimal global objective function convergence rate, we show that FISTA is not optimal for sparse optimization with respect to the local convergence rate. We determine the locally optimal parameter choice for the I-FBS family. Finally we propose a method which inherits the excellent global rate of FISTA but also has excellent local rate. △ Less

Submitted 23 January, 2017; v1 submitted 8 February, 2015; originally announced February 2015.

Comments: The proofs of Thms. 4.1, 5.1, 5.2, and 5.6 of this manuscript contain several errors. These errors have been fixed in a revised and rewritten manuscript entitled "Local and Global Convergence of a General Inertial Proximal Splitting Scheme" arxiv id. 1602.02726. We recommend reading this updated manuscript, available at arXiv:1602.02726

arXiv:1402.4881 [pdf, other]

Fixed Error Asymptotics For Erasure and List Decoding

Authors: Vincent Y. F. Tan, Pierre Moulin

Abstract: We derive the optimum second-order coding rates, known as second-order capacities, for erasure and list decoding. For erasure decoding for discrete memoryless channels, we show that second-order capacity is $\sqrt{V}Φ^{-1}(ε_t)$ where $V$ is the channel dispersion and $ε_t$ is the total error probability, i.e., the sum of the erasure and undetected errors. We show numerically that the expected rat… ▽ More We derive the optimum second-order coding rates, known as second-order capacities, for erasure and list decoding. For erasure decoding for discrete memoryless channels, we show that second-order capacity is $\sqrt{V}Φ^{-1}(ε_t)$ where $V$ is the channel dispersion and $ε_t$ is the total error probability, i.e., the sum of the erasure and undetected errors. We show numerically that the expected rate at finite blocklength for erasures decoding can exceed the finite blocklength channel coding rate. We also show that the analogous result also holds for lossless source coding with decoder side information, i.e., Slepian-Wolf coding. For list decoding, we consider list codes of deterministic size that scales as $\exp(\sqrt{n}l)$ and show that the second-order capacity is $l+\sqrt{V}Φ^{-1}(ε)$ where $ε$ is the permissible error probability. We also consider lists of polynomial size $n^α$ and derive bounds on the third-order coding rate in terms of the order of the polynomial $α$. These bounds are tight for symmetric and singular channels. The direct parts of the coding theorems leverage on the simple threshold decoder and converses are proved using variants of the hypothesis testing converse. △ Less

Submitted 21 April, 2014; v1 submitted 19 February, 2014; originally announced February 2014.

Comments: 18 pages, 1 figure; Submitted to IEEE Transactions on Information Theory; Shorter version to be presented at ISIT 2014

arXiv:1311.0181 [pdf, ps, other]

The Log-Volume of Optimal Codes for Memoryless Channels, Asymptotically Within A Few Nats

Authors: Pierre Moulin

Abstract: Shannon's analysis of the fundamental capacity limits for memoryless communication channels has been refined over time. In this paper, the maximum volume $M_\avg^*(n,ε)$ of length-$n$ codes subject to an average decoding error probability $ε$ is shown to satisfy the following tight asymptotic lower and upper bounds as $n \to \infty$: \[ \underline{A}_ε+ o(1) \le \log M_\avg^*(n,ε) - [nC - \sqrt{nV… ▽ More Shannon's analysis of the fundamental capacity limits for memoryless communication channels has been refined over time. In this paper, the maximum volume $M_\avg^*(n,ε)$ of length-$n$ codes subject to an average decoding error probability $ε$ is shown to satisfy the following tight asymptotic lower and upper bounds as $n \to \infty$: \[ \underline{A}_ε+ o(1) \le \log M_\avg^*(n,ε) - [nC - \sqrt{nV_ε} \,Q^{-1}(ε) + \frac{1}{2} \log n] \le \overline{A}_ε+ o(1) \] where $C$ is the Shannon capacity, $V_ε$ the $ε$-channel dispersion, or second-order coding rate, $Q$ the tail probability of the normal distribution, and the constants $\underline{A}_ε$ and $\overline{A}_ε$ are explicitly identified. This expression holds under mild regularity assumptions on the channel, including nonsingularity. The gap $\overline{A}_ε- \underline{A}_ε$ is one nat for weakly symmetric channels in the Cover-Thomas sense, and typically a few nats for other symmetric channels, for the binary symmetric channel, and for the $Z$ channel. The derivation is based on strong large-deviations analysis and refined central limit asymptotics. A random coding scheme that achieves the lower bound is presented. The codewords are drawn from a capacity-achieving input distribution modified by an $O(1/\sqrt{n})$ correction term. △ Less

Submitted 26 December, 2016; v1 submitted 1 November, 2013; originally announced November 2013.

Comments: 75 pages, 8 figures. This is the final version to appear in the IEEE Transactions on Information Theory, 2017

Journal ref: IEEE Transactions on Information Theory, 2017

arXiv:1210.0954 [pdf, ps, other]

Learning from Collective Intelligence in Groups

Authors: Guo-Jun Qi, Charu Aggarwal, Pierre Moulin, Thomas Huang

Abstract: Collective intelligence, which aggregates the shared information from large crowds, is often negatively impacted by unreliable information sources with the low quality data. This becomes a barrier to the effective use of collective intelligence in a variety of applications. In order to address this issue, we propose a probabilistic model to jointly assess the reliability of sources and find the tr… ▽ More Collective intelligence, which aggregates the shared information from large crowds, is often negatively impacted by unreliable information sources with the low quality data. This becomes a barrier to the effective use of collective intelligence in a variety of applications. In order to address this issue, we propose a probabilistic model to jointly assess the reliability of sources and find the true data. We observe that different sources are often not independent of each other. Instead, sources are prone to be mutually influenced, which makes them dependent when sharing information with each other. High dependency between sources makes collective intelligence vulnerable to the overuse of redundant (and possibly incorrect) information from the dependent sources. Thus, we reveal the latent group structure among dependent sources, and aggregate the information at the group level rather than from individual sources directly. This can prevent the collective intelligence from being inappropriately dominated by dependent sources. We will also explicitly reveal the reliability of groups, and minimize the negative impacts of unreliable groups. Experimental results on real-world data sets show the effectiveness of the proposed approach with respect to existing algorithms. △ Less

Submitted 2 October, 2012; originally announced October 2012.

arXiv:1011.1261 [pdf, ps, other]

doi 10.1109/TIFS.2011.2168212

On the Saddle-point Solution and the Large-Coalition Asymptotics of Fingerprinting Games

Authors: Yen-Wei Huang, Pierre Moulin

Abstract: We study a fingerprinting game in which the number of colluders and the collusion channel are unknown. The encoder embeds fingerprints into a host sequence and provides the decoder with the capability to trace back pirated copies to the colluders. Fingerprinting capacity has recently been derived as the limit value of a sequence of maximin games with mutual information as their payoff functions.… ▽ More We study a fingerprinting game in which the number of colluders and the collusion channel are unknown. The encoder embeds fingerprints into a host sequence and provides the decoder with the capability to trace back pirated copies to the colluders. Fingerprinting capacity has recently been derived as the limit value of a sequence of maximin games with mutual information as their payoff functions. However, these games generally do not admit saddle-point solutions and are very hard to solve numerically. Here under the so-called Boneh-Shaw marking assumption, we reformulate the capacity as the value of a single two-person zero-sum game, and show that it is achieved by a saddle-point solution. If the maximal coalition size is k and the fingerprinting alphabet is binary, we show that capacity decays quadratically with k. Furthermore, we prove rigorously that the asymptotic capacity is 1/(k^2 2ln2) and we confirm our earlier conjecture that Tardos' choice of the arcsine distribution asymptotically maximizes the mutual information payoff function while the interleaving attack minimizes it. Along with the asymptotic behavior, numerical solutions to the game for small k are also presented. △ Less

Submitted 19 April, 2011; v1 submitted 4 November, 2010; originally announced November 2010.

Comments: submitted to IEEE Trans. on Information Forensics and Security

arXiv:0905.1375 [pdf, ps, other]

doi 10.1109/ISIT.2009.5205882

Saddle-point Solution of the Fingerprinting Capacity Game Under the Marking Assumption

Authors: Yen-Wei Huang, Pierre Moulin

Abstract: We study a fingerprinting game in which the collusion channel is unknown. The encoder embeds fingerprints into a host sequence and provides the decoder with the capability to trace back pirated copies to the colluders. Fingerprinting capacity has recently been derived as the limit value of a sequence of maxmin games with mutual information as the payoff function. However, these games generally… ▽ More We study a fingerprinting game in which the collusion channel is unknown. The encoder embeds fingerprints into a host sequence and provides the decoder with the capability to trace back pirated copies to the colluders. Fingerprinting capacity has recently been derived as the limit value of a sequence of maxmin games with mutual information as the payoff function. However, these games generally do not admit saddle-point solutions and are very hard to solve numerically. Here under the so-called Boneh-Shaw marking assumption, we reformulate the capacity as the value of a single two-person zero-sum game, and show that it is achieved by a saddle-point solution. If the maximal coalition size is $k$ and the fingerprint alphabet is binary, we derive equations that can numerically solve the capacity game for arbitrary $k$. We also provide tight upper and lower bounds on the capacity. Finally, we discuss the asymptotic behavior of the fingerprinting game for large $k$ and practical implementation issues. △ Less

Submitted 9 May, 2009; originally announced May 2009.

Comments: 5 pages, to appear in 2009 IEEE International Symposium on Information Theory (ISIT 2009), Seoul, Korea, June 2009

arXiv:0803.0265 [pdf, ps, other]

Blind Fingerprinting

Authors: Ying Wang, Pierre Moulin

Abstract: We study blind fingerprinting, where the host sequence into which fingerprints are embedded is partially or completely unknown to the decoder. This problem relates to a multiuser version of the Gel'fand-Pinsker problem. The number of colluders and the collusion channel are unknown, and the colluders and the fingerprint embedder are subject to distortion constraints. We propose a conditionally… ▽ More We study blind fingerprinting, where the host sequence into which fingerprints are embedded is partially or completely unknown to the decoder. This problem relates to a multiuser version of the Gel'fand-Pinsker problem. The number of colluders and the collusion channel are unknown, and the colluders and the fingerprint embedder are subject to distortion constraints. We propose a conditionally constant-composition random binning scheme and a universal decoding rule and derive the corresponding false-positive and false-negative error exponents. The encoder is a stacked binning scheme and makes use of an auxiliary random sequence. The decoder is a {\em maximum doubly-penalized mutual information decoder}, where the significance of each candidate coalition is assessed relative to a threshold that trades off false-positive and false-negative error exponents. The penalty is proportional to coalition size and is a function of the conditional type of host sequence. Positive exponents are obtained at all rates below a certain value, which is therefore a lower bound on public fingerprinting capacity. We conjecture that this value is the public fingerprinting capacity. A simpler threshold decoder is also given, which has similar universality properties but also lower achievable rates. An upper bound on public fingerprinting capacity is also derived. △ Less

Submitted 3 March, 2008; originally announced March 2008.

Comments: 36 pages, submitted for publication

arXiv:0801.4544 [pdf, ps, other]

doi 10.1109/ISIT.2008.4594948

A Neyman-Pearson Approach to Universal Erasure and List Decoding

Authors: Pierre Moulin

Abstract: When information is to be transmitted over an unknown, possibly unreliable channel, an erasure option at the decoder is desirable. Using constant-composition random codes, we propose a generalization of Csiszar and Korner's Maximum Mutual Information decoder with erasure option for discrete memoryless channels. The new decoder is parameterized by a weighting function that is designed to optimize… ▽ More When information is to be transmitted over an unknown, possibly unreliable channel, an erasure option at the decoder is desirable. Using constant-composition random codes, we propose a generalization of Csiszar and Korner's Maximum Mutual Information decoder with erasure option for discrete memoryless channels. The new decoder is parameterized by a weighting function that is designed to optimize the fundamental tradeoff between undetected-error and erasure exponents for a compound class of channels. The class of weighting functions may be further enlarged to optimize a similar tradeoff for list decoders -- in that case, undetected-error probability is replaced with average number of incorrect messages in the list. Explicit solutions are identified. The optimal exponents admit simple expressions in terms of the sphere-packing exponent, at all rates below capacity. For small erasure exponents, these expressions coincide with those derived by Forney (1968) for symmetric channels, using Maximum a Posteriori decoding. Thus for those channels at least, ignorance of the channel law is inconsequential. Conditions for optimality of the Csiszar-Korner rule and of the simpler empirical-mutual-information thresholding rule are identified. The error exponents are evaluated numerically for the binary symmetric channel. △ Less

Submitted 29 January, 2008; originally announced January 2008.

Comments: 31 pages, submitted to IEEE Transactions on Information Theory

arXiv:0801.3837 [pdf, ps, other]

Universal Fingerprinting: Capacity and Random-Coding Exponents

Authors: Pierre Moulin

Abstract: This paper studies fingerprinting (traitor tracing) games in which the number of colluders and the collusion channel are unknown. The fingerprints are embedded into host sequences representing signals to be protected and provide the receiver with the capability to trace back pirated copies to the colluders. The colluders and the fingerprint embedder are subject to signal fidelity constraints. Our… ▽ More This paper studies fingerprinting (traitor tracing) games in which the number of colluders and the collusion channel are unknown. The fingerprints are embedded into host sequences representing signals to be protected and provide the receiver with the capability to trace back pirated copies to the colluders. The colluders and the fingerprint embedder are subject to signal fidelity constraints. Our problem setup unifies the signal-distortion and Boneh-Shaw formulations of fingerprinting. The fundamental tradeoffs between fingerprint codelength, number of users, number of colluders, fidelity constraints, and decoding reliability are then determined. Several bounds on fingerprinting capacity have been presented in recent literature. This paper derives exact capacity formulas and presents a new randomized fingerprinting scheme with the following properties: (1) the encoder and receiver assume a nominal coalition size but do not need to know the actual coalition size and the collusion channel; (2) a tunable parameter $Δ$ trades off false-positive and false-negative error exponents; (3) the receiver provides a reliability metric for its decision; and (4) the scheme is capacity-achieving when the false-positive exponent $Δ$ tends to zero and the nominal coalition size coincides with the actual coalition size. A fundamental component of the new scheme is the use of a "time-sharing" randomized sequence. The decoder is a maximum penalized mutual information decoder, where the significance of each candidate coalition is assessed relative to a threshold, and the penalty is proportional to the coalition size. A much simpler {\em threshold decoder} that satisfies properties (1)---(3) above but not (4) is also given. △ Less

Submitted 24 May, 2011; v1 submitted 24 January, 2008; originally announced January 2008.

Comments: 69 pages, revised

arXiv:cs/0702161 [pdf, ps, other]

doi 10.1109/TIT.2008.921684

Perfectly Secure Steganography: Capacity, Error Exponents, and Code Constructions

Authors: Ying Wang, Pierre Moulin

Abstract: An analysis of steganographic systems subject to the following perfect undetectability condition is presented in this paper. Following embedding of the message into the covertext, the resulting stegotext is required to have exactly the same probability distribution as the covertext. Then no statistical test can reliably detect the presence of the hidden message. We refer to such steganographic s… ▽ More An analysis of steganographic systems subject to the following perfect undetectability condition is presented in this paper. Following embedding of the message into the covertext, the resulting stegotext is required to have exactly the same probability distribution as the covertext. Then no statistical test can reliably detect the presence of the hidden message. We refer to such steganographic schemes as perfectly secure. A few such schemes have been proposed in recent literature, but they have vanishing rate. We prove that communication performance can potentially be vastly improved; specifically, our basic setup assumes independently and identically distributed (i.i.d.) covertext, and we construct perfectly secure steganographic codes from public watermarking codes using binning methods and randomized permutations of the code. The permutation is a secret key shared between encoder and decoder. We derive (positive) capacity and random-coding exponents for perfectly-secure steganographic systems. The error exponents provide estimates of the code length required to achieve a target low error probability. We address the potential loss in communication performance due to the perfect-security requirement. This loss is the same as the loss obtained under a weaker order-1 steganographic requirement that would just require matching of first-order marginals of the covertext and stegotext distributions. Furthermore, no loss occurs if the covertext distribution is uniform and the distortion metric is cyclically symmetric; steganographic capacity is then achieved by randomized linear codes. Our framework may also be useful for develo** computationally secure steganographic systems that have near-optimal communication performance. △ Less

Submitted 25 December, 2007; v1 submitted 27 February, 2007; originally announced February 2007.

Comments: To appear in IEEE Trans. on Information Theory, June 2008; ignore Version 2 as the file was corrupted

arXiv:cs/0506013 [pdf, ps, other]

On the existence and characterization of the maxent distribution under general moment inequality constraints

Authors: Prakash Ishwar, Pierre Moulin

Abstract: A broad set of sufficient conditions that guarantees the existence of the maximum entropy (maxent) distribution consistent with specified bounds on certain generalized moments is derived. Most results in the literature are either focused on the minimum cross-entropy distribution or apply only to distributions with a bounded-volume support or address only equality constraints. The results of this… ▽ More A broad set of sufficient conditions that guarantees the existence of the maximum entropy (maxent) distribution consistent with specified bounds on certain generalized moments is derived. Most results in the literature are either focused on the minimum cross-entropy distribution or apply only to distributions with a bounded-volume support or address only equality constraints. The results of this work hold for general moment inequality constraints for probability distributions with possibly unbounded support, and the technical conditions are explicitly on the underlying generalized moment functions. An analytical characterization of the maxent distribution is also derived using results from the theory of constrained optimization in infinite-dimensional normed linear spaces. Several auxiliary results of independent interest pertaining to certain properties of convex coercive functions are also presented. △ Less

Submitted 5 June, 2005; originally announced June 2005.

Comments: 13 pages; accepted for publication in the IEEE Transactions on Information Theory

arXiv:cs/0410003 [pdf, ps, other]

Capacity and Random-Coding Exponents for Channel Coding with Side Information

Authors: Pierre Moulin, Ying Wang

Abstract: Capacity formulas and random-coding exponents are derived for a generalized family of Gel'fand-Pinsker coding problems. These exponents yield asymptotic upper bounds on the achievable log probability of error. In our model, information is to be reliably transmitted through a noisy channel with finite input and output alphabets and random state sequence, and the channel is selected by a hypotheti… ▽ More Capacity formulas and random-coding exponents are derived for a generalized family of Gel'fand-Pinsker coding problems. These exponents yield asymptotic upper bounds on the achievable log probability of error. In our model, information is to be reliably transmitted through a noisy channel with finite input and output alphabets and random state sequence, and the channel is selected by a hypothetical adversary. Partial information about the state sequence is available to the encoder, adversary, and decoder. The design of the transmitter is subject to a cost constraint. Two families of channels are considered: 1) compound discrete memoryless channels (CDMC), and 2) channels with arbitrary memory, subject to an additive cost constraint, or more generally to a hard constraint on the conditional type of the channel output given the input. Both problems are closely connected. The random-coding exponent is achieved using a stacked binning scheme and a maximum penalized mutual information decoder, which may be thought of as an empirical generalized Maximum a Posteriori decoder. For channels with arbitrary memory, the random-coding exponents are larger than their CDMC counterparts. Applications of this study include watermarking, data hiding, communication in presence of partially known interferers, and problems such as broadcast channels, all of which involve the fundamental idea of binning. △ Less

Submitted 28 December, 2006; v1 submitted 1 October, 2004; originally announced October 2004.

Comments: to appear in IEEE Transactions on Information Theory, without Appendices G and H

Showing 1–21 of 21 results for author: Moulin, P