Search | arXiv e-print repository

Relative entropy bounds for sampling with and without replacement

Authors: Oliver Johnson, Lampros Gavalakis, Ioannis Kontoyiannis

Abstract: Sharp, nonasymptotic bounds are obtained for the relative entropy between the distributions of sampling with and without replacement from an urn with balls of $c\geq 2$ colors. Our bounds are asymptotically tight in certain regimes and, unlike previous results, they depend on the number of balls of each colour in the urn. The connection of these results with finite de Finetti-style theorems is exp… ▽ More Sharp, nonasymptotic bounds are obtained for the relative entropy between the distributions of sampling with and without replacement from an urn with balls of $c\geq 2$ colors. Our bounds are asymptotically tight in certain regimes and, unlike previous results, they depend on the number of balls of each colour in the urn. The connection of these results with finite de Finetti-style theorems is explored, and it is observed that a sampling bound due to Stam (1978) combined with the convexity of relative entropy yield a new finite de Finetti bound in relative entropy, which achieves the optimal asymptotic convergence rate. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 17 pages, 1 figure

MSC Class: 60E05 (Primary) 60G09 (Secondary)

arXiv:2403.07501 [pdf, other]

doi 10.1145/3643796.3648464

Detecting Security-Relevant Methods using Multi-label Machine Learning

Authors: Oshando Johnson, Goran Piskachev, Ranjith Krishnamurthy, Eric Bodden

Abstract: To detect security vulnerabilities, static analysis tools need to be configured with security-relevant methods. Current approaches can automatically identify such methods using binary relevance machine learning approaches. However, they ignore dependencies among security-relevant methods, over-generalize and perform poorly in practice. Additionally, users have to nevertheless manually configure st… ▽ More To detect security vulnerabilities, static analysis tools need to be configured with security-relevant methods. Current approaches can automatically identify such methods using binary relevance machine learning approaches. However, they ignore dependencies among security-relevant methods, over-generalize and perform poorly in practice. Additionally, users have to nevertheless manually configure static analysis tools using the detected methods. Based on feedback from users and our observations, the excessive manual steps can often be tedious, error-prone and counter-intuitive. In this paper, we present Dev-Assist, an IntelliJ IDEA plugin that detects security-relevant methods using a multi-label machine learning approach that considers dependencies among labels. The plugin can automatically generate configurations for static analysis tools, run the static analysis, and show the results in IntelliJ IDEA. Our experiments reveal that Dev-Assist's machine learning approach has a higher F1-Measure than related approaches. Moreover, the plugin reduces and simplifies the manual effort required when configuring and using static analysis tools. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 6 pages, 3 figures, The IDE Workshop

arXiv:2309.07264 [pdf, other]

Small error algorithms for tropical group testing

Authors: Vivekanand Paligadu, Oliver Johnson, Matthew Aldridge

Abstract: We consider a version of the classical group testing problem motivated by PCR testing for COVID-19. In the so-called tropical group testing model, the outcome of a test is the lowest cycle threshold (Ct) level of the individuals pooled within it, rather than a simple binary indicator variable. We introduce the tropical counterparts of three classical non-adaptive algorithms (COMP, DD and SCOMP), a… ▽ More We consider a version of the classical group testing problem motivated by PCR testing for COVID-19. In the so-called tropical group testing model, the outcome of a test is the lowest cycle threshold (Ct) level of the individuals pooled within it, rather than a simple binary indicator variable. We introduce the tropical counterparts of three classical non-adaptive algorithms (COMP, DD and SCOMP), and analyse their behaviour through both simulations and bounds on error probabilities. By comparing the results of the tropical and classical algorithms, we gain insight into the extra information provided by learning the outcomes (Ct levels) of the tests. We show that in a limiting regime the tropical COMP algorithm requires as many tests as its classical counterpart, but that for sufficiently dense problems tropical DD can recover more information with fewer tests, and can be viewed as essentially optimal in certain regimes. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2212.08196 [pdf, other]

Saved You A Click: Automatically Answering Clickbait Titles

Authors: Oliver Johnson, Beicheng Lou, Janet Zhong, Andrey Kurenkov

Abstract: Often clickbait articles have a title that is phrased as a question or vague teaser that entices the user to click on the link and read the article to find the explanation. We developed a system that will automatically find the answer or explanation of the clickbait hook from the website text so that the user does not need to read through the text themselves. We fine-tune an extractive question an… ▽ More Often clickbait articles have a title that is phrased as a question or vague teaser that entices the user to click on the link and read the article to find the explanation. We developed a system that will automatically find the answer or explanation of the clickbait hook from the website text so that the user does not need to read through the text themselves. We fine-tune an extractive question and answering model (RoBERTa) and an abstractive one (T5), using data scraped from the 'StopClickbait' Facebook pages and Reddit's 'SavedYouAClick' subforum. We find that both extractive and abstractive models improve significantly after finetuning. We find that the extractive model performs slightly better according to ROUGE scores, while the abstractive one has a slight edge in terms of BERTscores. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2209.15318 [pdf, other]

Extensions on `A Convex Scheme for the Secrecy Capacity of a MIMO Wiretap Channel with a Single Antenna Eavesdropper'

Authors: Jennifer Chakravarty, Oliver Johnson, Robert Piechocki

Abstract: One key metric for physical layer security is the secrecy capacity. This is the maximum rate that a system can transmit with perfect secrecy. For a Multiple Input Multiple Output (MIMO) system (a newer technology for 5G, 6G and beyond) the secrecy capacity is not fully understood. For a Gaussian MIMO channel, the secrecy capacity is a non-convex optimisation problem for which a general solution is… ▽ More One key metric for physical layer security is the secrecy capacity. This is the maximum rate that a system can transmit with perfect secrecy. For a Multiple Input Multiple Output (MIMO) system (a newer technology for 5G, 6G and beyond) the secrecy capacity is not fully understood. For a Gaussian MIMO channel, the secrecy capacity is a non-convex optimisation problem for which a general solution is not available. Previous work by the authors showed that the secrecy capacity of a MIMO system with a single eavesdrop antenna is concave to a cut off point. In this work, which extends the previous paper, results are given for the region beyond this cut off point. It is shown that, for certain parameters, the presented scheme is concave to a point, and convex beyond it, and can therefore be solved efficiently using existing convex optimisation software. △ Less

Submitted 30 September, 2022; originally announced September 2022.

arXiv:2203.07803 [pdf, other]

doi 10.1017/S026996482200033X

A negative binomial approximation in group testing

Authors: Letian Yu, Fraser Daly, Oliver Johnson

Abstract: We consider the problem of group testing (pooled testing), first introduced by Dorfman. For non-adaptive testing strategies, we refer to a non-defective item as `intruding' if it only appears in positive tests. Such items cause mis-classification errors in the well-known COMP algorithm, and can make other algorithms produce an error. It is therefore of interest to understand the distribution of th… ▽ More We consider the problem of group testing (pooled testing), first introduced by Dorfman. For non-adaptive testing strategies, we refer to a non-defective item as `intruding' if it only appears in positive tests. Such items cause mis-classification errors in the well-known COMP algorithm, and can make other algorithms produce an error. It is therefore of interest to understand the distribution of the number of intruding items. We show that, under Bernoulli matrix designs, this distribution is well approximated in a variety of senses by a negative binomial distribution, allowing us to understand the performance of the two-stage conservative group testing algorithm of Aldridge. △ Less

Submitted 26 August, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Journal ref: Probability in the Engineering and Informational Sciences, vol 37/4, 2023, pages 973-996

arXiv:2107.03765 [pdf, other]

Bounds on Eavesdropper Performance for a MIMO-NOMA Downlink Scheme

Authors: Jennifer Chakravarty, Oliver Johnson, Robert Piechocki

Abstract: Non-Orthogonal Multiple Access (NOMA) is a multiplexing technique for future wireless, which when combined with Multiple-Input Multiple-Output (MIMO) unlocks higher capacities for systems where users have varying channel strength. NOMA utilises the channel differences to increase the throughput, while MIMO exploits the additional degrees of freedom (DoF) to enhance this. This work analyses the sec… ▽ More Non-Orthogonal Multiple Access (NOMA) is a multiplexing technique for future wireless, which when combined with Multiple-Input Multiple-Output (MIMO) unlocks higher capacities for systems where users have varying channel strength. NOMA utilises the channel differences to increase the throughput, while MIMO exploits the additional degrees of freedom (DoF) to enhance this. This work analyses the secrecy capacity, demonstrating the robustness of a combined MIMO-NOMA scheme at physical layer, when in the presence of a passive eavesdropper. We present bounds on the eavesdropper performance and show heuristically that, as the number of users and antennas increases, the eavesdropper's SINR becomes small, regardless of how `lucky' they may be with their channel. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: 9 pages, 5 figures

arXiv:2104.06575 [pdf, other]

doi 10.1016/j.commatsci.2021.110756

Five Degree-of-Freedom Property Interpolation of Arbitrary Grain Boundaries via Voronoi Fundamental Zone Octonion Framework

Authors: Sterling G. Baird, Eric R. Homer, David T. Fullwood, Oliver K. Johnson

Abstract: We introduce the Voronoi fundamental zone octonion interpolation framework for grain boundary (GB) structure-property models and surrogates. The VFZO framework offers an advantage over other five degree-of-freedom based property interpolation methods because it is constructed as a point set in a manifold. This means that directly computed Euclidean distances approximate the original octonion dista… ▽ More We introduce the Voronoi fundamental zone octonion interpolation framework for grain boundary (GB) structure-property models and surrogates. The VFZO framework offers an advantage over other five degree-of-freedom based property interpolation methods because it is constructed as a point set in a manifold. This means that directly computed Euclidean distances approximate the original octonion distance with significantly reduced computation runtime (~7 CPU minutes vs. 153 CPU days for a 50000x50000 pairwise-distance matrix). This increased efficiency facilitates lower interpolation error through the use of significantly more input data. We demonstrate grain boundary energy interpolation results for a non-smooth validation function and simulated bi-crystal datasets for Fe and Ni using four interpolation methods: barycentric interpolation, Gaussian process regression (GPR), inverse-distance weighting, and nearest-neighbor interpolation. These are evaluated for 50000 random input GBs and 10 000 random prediction GBs. The best performance was achieved with GPR, which resulted in a reduction of the root mean square error (RMSE) by 83.0% relative to RMSE of a constant, average model. Likewise, interpolation on a large, noisy, molecular statics Fe simulation dataset improves performance by 34.4% compared to 21.2% in prior work. Interpolation on a small, low-noise MS Ni simulation dataset is similar to interpolation results for the original octonion metric (57.6% vs. 56.4%). A vectorized, parallelized, MATLAB interpolation function (interp5DOF.m) and related routines are available in our VFZO repository (github.com/sgbaird-5dof/interp) which can be applied to other crystallographic point groups. The VFZO framework offers advantages for computing distances between GBs, estimating property values for arbitrary GBs, and modeling surrogates of computationally expensive 5DOF functions and simulations. △ Less

Submitted 13 April, 2021; originally announced April 2021.

Comments: main: 22 pages, 10 figures; appendices: 5 pages, 3 figures; supp: 13 pages, 12 figures

arXiv:2007.03569 [pdf, ps, other]

Information-theoretic convergence of extreme values to the Gumbel distribution

Authors: Oliver Johnson

Abstract: We show how convergence to the Gumbel distribution in an extreme value setting can be understood in an information-theoretic sense. We introduce a new type of score function which behaves well under the maximum operation, and which implies simple expressions for entropy and relative entropy. We show that, assuming certain properties of the von Mises representation, convergence to the Gumbel can be… ▽ More We show how convergence to the Gumbel distribution in an extreme value setting can be understood in an information-theoretic sense. We introduce a new type of score function which behaves well under the maximum operation, and which implies simple expressions for entropy and relative entropy. We show that, assuming certain properties of the von Mises representation, convergence to the Gumbel can be proved in the strong sense of relative entropy. △ Less

Submitted 3 August, 2022; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: 13 pages

arXiv:2007.01376 [pdf, other]

doi 10.1109/TIT.2021.3138489

Improved bounds for noisy group testing with constant tests per item

Authors: Oliver Gebhard, Oliver Johnson, Philipp Loick, Maurice Rolvien

Abstract: The group testing problem is concerned with identifying a small set of infected individuals in a large population. At our disposal is a testing procedure that allows us to test several individuals together. In an idealized setting, a test is positive if and only if at least one infected individual is included and negative otherwise. Significant progress was made in recent years towards understandi… ▽ More The group testing problem is concerned with identifying a small set of infected individuals in a large population. At our disposal is a testing procedure that allows us to test several individuals together. In an idealized setting, a test is positive if and only if at least one infected individual is included and negative otherwise. Significant progress was made in recent years towards understanding the information-theoretic and algorithmic properties in this noiseless setting. In this paper, we consider a noisy variant of group testing where test results are flipped with certain probability, including the realistic scenario where sensitivity and specificity can take arbitrary values. Using a test design where each individual is assigned to a fixed number of tests, we derive explicit algorithmic bounds for two commonly considered inference algorithms and thereby naturally extend the results of Scarlett \& Cevher (2016) and Scarlett \& Johnson (2020). We provide improved performance guarantees for the efficient algorithms in these noisy group testing models -- indeed, for a large set of parameter choices the bounds provided in the paper are the strongest currently proved. △ Less

Submitted 21 December, 2021; v1 submitted 2 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Information Theory, vol 68/4, 2022, pages 2604-2621

arXiv:1905.11913 [pdf, ps, other]

doi 10.1109/TIT.2020.2985957

Maximal correlation and the rate of Fisher information convergence in the Central Limit Theorem

Authors: Oliver Johnson

Abstract: We consider the behaviour of the Fisher information of scaled sums of independent and identically distributed random variables in the Central Limit Theorem regime. We show how this behaviour can be related to the second-largest non-trivial eigenvalue associated with the Hirschfeld--Gebelein--Rényi maximal correlation. We prove that assuming this eigenvalue satisfies a strict inequality, an… ▽ More We consider the behaviour of the Fisher information of scaled sums of independent and identically distributed random variables in the Central Limit Theorem regime. We show how this behaviour can be related to the second-largest non-trivial eigenvalue associated with the Hirschfeld--Gebelein--Rényi maximal correlation. We prove that assuming this eigenvalue satisfies a strict inequality, an $O(1/n)$ rate of convergence and a strengthened form of monotonicity hold. △ Less

Submitted 6 December, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

Journal ref: IEEE Transactions on Information Theory, vol 66/8, 2020, pages 4992-5002

arXiv:1902.06002 [pdf, other]

doi 10.1561/0100000099

Group Testing: An Information Theory Perspective

Authors: Matthew Aldridge, Oliver Johnson, Jonathan Scarlett

Abstract: The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information tec… ▽ More The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: efficient algorithms with practical storage and computation requirements, achievability bounds for optimal decoding methods, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the {\em rate} of group testing, indicating the amount of information learned per test. Considering both noiseless and noisy settings, we identify several regimes where existing algorithms are provably optimal or near-optimal, as well as regimes where there remains greater potential for improvement. In addition, we survey results concerning a number of variations on the standard group testing problem, including partial recovery criteria, adaptive algorithms with a limited number of stages, constrained test designs, and sublinear-time algorithms. △ Less

Submitted 17 August, 2020; v1 submitted 15 February, 2019; originally announced February 2019.

Comments: Survey paper, 140 pages, 19 figures. To be published in Foundations and Trends in Communications and Information Theory

Journal ref: Foundations and Trends in Communications and Information Theory: Vol. 15: No. 3-4, pp 196-392, 2019

arXiv:1810.09791 [pdf, ps, other]

doi 10.1214/19-EJP380

A proof of the Shepp-Olkin entropy monotonicity conjecture

Authors: Erwan Hillion, Oliver Johnson

Abstract: Consider tossing a collection of coins, each fair or biased towards heads, and take the distribution of the total number of heads that result. It is natural to conjecture that this distribution should be 'more random' when each coin is fairer. Indeed, Shepp and Olkin conjectured that the Shannon entropy of this distribution is monotonically increasing in this case. We resolve this conjecture, by p… ▽ More Consider tossing a collection of coins, each fair or biased towards heads, and take the distribution of the total number of heads that result. It is natural to conjecture that this distribution should be 'more random' when each coin is fairer. Indeed, Shepp and Olkin conjectured that the Shannon entropy of this distribution is monotonically increasing in this case. We resolve this conjecture, by proving that this intuition is correct. Our proof uses a construction which was previously developed by the authors to prove a related conjecture of Shepp and Olkin concerning concavity of entropy. We discuss whether this result can be generalized to $q$-Rényi and $q$-Tsallis entropies, for a range of values of $q$. △ Less

Submitted 23 October, 2018; originally announced October 2018.

Comments: 16 pages

Journal ref: Electronic Journal of Probability, vol 24/126, 2019, pages 1-14

arXiv:1808.09143 [pdf, other]

doi 10.1109/TIT.2020.2970184

Noisy Non-Adaptive Group Testing: A (Near-)Definite Defectives Approach

Authors: Jonathan Scarlett, Oliver Johnson

Abstract: The group testing problem consists of determining a small set of defective items from a larger set of items based on a number of possibly-noisy tests, and is relevant in applications such as medical testing, communication protocols, pattern matching, and more. We study the noisy version of this problem, where the outcome of each standard noiseless group test is subject to independent noise, corres… ▽ More The group testing problem consists of determining a small set of defective items from a larger set of items based on a number of possibly-noisy tests, and is relevant in applications such as medical testing, communication protocols, pattern matching, and more. We study the noisy version of this problem, where the outcome of each standard noiseless group test is subject to independent noise, corresponding to passing the noiseless result through a binary channel. We introduce a class of algorithms that we refer to as Near-Definite Defectives (NDD), and study bounds on the required number of tests for asymptotically vanishing error probability under Bernoulli random test designs. In addition, we study algorithm-independent converse results, giving lower bounds on the required number of tests under Bernoulli test designs. Under reverse Z-channel noise, the achievable rates and converse results match in a broad range of sparsity regimes, and under Z-channel noise, the two match in a narrower range of dense/low-noise regimes. We observe that although these two channels have the same Shannon capacity when viewed as a communication channel, they can behave quite differently when it comes to group testing. Finally, we extend our analysis of these noise models to a general binary noise model (including symmetric noise), and show improvements over known existing bounds in broad scaling regimes. △ Less

Submitted 28 December, 2021; v1 submitted 28 August, 2018; originally announced August 2018.

Comments: IEEE Transactions on Information Theory, Volume 66, Issue 6, pp. 3775-3797, June 2020

Journal ref: IEEE Transactions on Information Theory, Volume 66, Issue 6, pp. 3775-3797, June 2020

arXiv:1706.04410 [pdf, ps, other]

doi 10.1214/18-EJS1419

A strong converse bound for multiple hypothesis testing, with applications to high-dimensional estimation

Authors: Ramji Venkataramanan, Oliver Johnson

Abstract: In statistical inference problems, we wish to obtain lower bounds on the minimax risk, that is to bound the performance of any possible estimator. A standard technique to obtain risk lower bounds involves the use of Fano's inequality. In an information-theoretic setting, it is known that Fano's inequality typically does not give a sharp converse result (error lower bound) for channel coding proble… ▽ More In statistical inference problems, we wish to obtain lower bounds on the minimax risk, that is to bound the performance of any possible estimator. A standard technique to obtain risk lower bounds involves the use of Fano's inequality. In an information-theoretic setting, it is known that Fano's inequality typically does not give a sharp converse result (error lower bound) for channel coding problems. Moreover, recent work has shown that an argument based on binary hypothesis testing gives tighter results. We adapt this technique to the statistical setting, and argue that Fano's inequality can always be replaced by this approach to obtain tighter lower bounds that can be easily computed and are asymptotically sharp. We illustrate our technique in three applications: density estimation, active learning of a binary classifier, and compressed sensing, obtaining tighter risk lower bounds in each case. △ Less

Submitted 4 April, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

Comments: In the latest version, the value of $λ$ in the statements of Lemma 4.1 and Proposition 4.2 is restricted to the interval $(0,1]$. This is the correct condition, rather than $λ>0$ stated in the journal version below

Journal ref: Electronic Journal of Statistics, Vol. 12, No. 1, pp. 1126-1149, 2018

arXiv:1705.09473 [pdf, ps, other]

doi 10.1109/TVT.2018.2790436

Reliability of Broadcast Communications Under Sparse Random Linear Network Coding

Authors: Suzie Brown, Oliver Johnson, Andrea Tassi

Abstract: Ultra-reliable Point-to-Multipoint (PtM) communications are expected to become pivotal in networks offering future dependable services for smart cities. In this regard, sparse Random Linear Network Coding (RLNC) techniques have been widely employed to provide an efficient way to improve the reliability of broadcast and multicast data streams. This paper addresses the pressing concern of providing… ▽ More Ultra-reliable Point-to-Multipoint (PtM) communications are expected to become pivotal in networks offering future dependable services for smart cities. In this regard, sparse Random Linear Network Coding (RLNC) techniques have been widely employed to provide an efficient way to improve the reliability of broadcast and multicast data streams. This paper addresses the pressing concern of providing a tight approximation to the probability of a user recovering a data stream protected by this kind of coding technique. In particular, by exploiting the Stein--Chen method, we provide a novel and general performance framework applicable to any combination of system and service parameters, such as finite field sizes, lengths of the data stream and level of sparsity. The deviation of the proposed approximation from Monte Carlo simulations is negligible, improving significantly on the state of the art performance bounds. △ Less

Submitted 18 October, 2018; v1 submitted 26 May, 2017; originally announced May 2017.

Comments: Accepted for publication on IEEE Transactions on Vehicular Technology

Journal ref: IEEE Transactions on Vehicular Technology, vol 67/5, 2018, pages 4677-4682

arXiv:1701.07089 [pdf, ps, other]

doi 10.1109/ISIT.2017.8006658

A de Bruijn identity for discrete random variables

Authors: Oliver Johnson, Saikat Guha

Abstract: We discuss properties of the "beamsplitter addition" operation, which provides a non-standard scaled convolution of random variables supported on the non-negative integers. We give a simple expression for the action of beamsplitter addition using generating functions. We use this to give a self-contained and purely classical proof of a heat equation and de Bruijn identity, satisfied when one of th… ▽ More We discuss properties of the "beamsplitter addition" operation, which provides a non-standard scaled convolution of random variables supported on the non-negative integers. We give a simple expression for the action of beamsplitter addition using generating functions. We use this to give a self-contained and purely classical proof of a heat equation and de Bruijn identity, satisfied when one of the variables is geometric. △ Less

Submitted 23 January, 2017; originally announced January 2017.

Comments: 9 pages, shorter version submitted to ISIT 2017

Journal ref: Proceedings of the International Symposium on Information Theory, 2017, p898-902

arXiv:1612.07122 [pdf, other]

doi 10.1109/TIT.2018.2861772

Performance of group testing algorithms with near-constant tests-per-item

Authors: Oliver Johnson, Matthew Aldridge, Jonathan Scarlett

Abstract: We consider the nonadaptive group testing with N items, of which $K = Θ(N^θ)$ are defective. We study a test design in which each item appears in nearly the same number of tests. For each item, we independently pick L tests uniformly at random with replacement, and place the item in those tests. We analyse the performance of these designs with simple and practical decoding algorithms in a range of… ▽ More We consider the nonadaptive group testing with N items, of which $K = Θ(N^θ)$ are defective. We study a test design in which each item appears in nearly the same number of tests. For each item, we independently pick L tests uniformly at random with replacement, and place the item in those tests. We analyse the performance of these designs with simple and practical decoding algorithms in a range of sparsity regimes, and show that the performance is consistently improved in comparison with standard Bernoulli designs. We show that our new design requires 23% fewer tests than a Bernoulli design when paired with the simple decoding algorithms known as COMP and DD. This gives the best known nonadaptive group testing performance for $θ> 0.43$, and the best proven performance with a practical decoding algorithm for all $θ\in (0,1)$. We also give a converse result showing that the DD algorithm is optimal for these designs when $θ> 1/2$. △ Less

Submitted 30 May, 2018; v1 submitted 21 November, 2016; originally announced December 2016.

Comments: 16 pages, 2 figures. This work was presented in part at the 2016 IEEE International Symposium on Information Theory: arXiv:1602.03471

Journal ref: IEEE Transactions on Information Theory, 2018

arXiv:1602.03471 [pdf, other]

doi 10.1109/ISIT.2016.7541525

Improved group testing rates with constant column weight designs

Authors: Matthew Aldridge, Oliver Johnson, Jonathan Scarlett

Abstract: We consider nonadaptive group testing where each item is placed in a constant number of tests. The tests are chosen uniformly at random with replacement, so the testing matrix has (almost) constant column weights. We show that performance is improved compared to Bernoulli designs, where each item is placed in each test independently with a fixed probability. In particular, we show that the rate of… ▽ More We consider nonadaptive group testing where each item is placed in a constant number of tests. The tests are chosen uniformly at random with replacement, so the testing matrix has (almost) constant column weights. We show that performance is improved compared to Bernoulli designs, where each item is placed in each test independently with a fixed probability. In particular, we show that the rate of the practical COMP detection algorithm is increased by 31% in all sparsity regimes. In dense cases, this beats the best possible algorithm with Bernoulli tests, and in sparse cases is the best proven performance of any practical algorithm. We also give an algorithm-independent upper bound for the constant column weight case; for dense cases this is again a 31% increase over the analogous Bernoulli result. △ Less

Submitted 13 May, 2016; v1 submitted 10 February, 2016; originally announced February 2016.

Comments: 5 pages, 2 figures; to be presented at ISIT 2016

Journal ref: Proceedings of the International Symposium on Information Theory (ISIT 2016), p1381-1385

arXiv:1601.08132 [pdf, other]

Interference Management in Heterogeneous Networks with Blind Transmitters

Authors: Vaia Kalokidou, Oliver Johnson, Robert Piechocki

Abstract: Future multi-tier communication networks will require enhanced network capacity and reduced overhead. In the absence of Channel State Information (CSI) at the transmitters, Blind Interference Alignment (BIA) and Topological Interference Management (TIM) can achieve optimal Degrees of Freedom (DoF), minimising network's overhead. In addition, Non-Orthogonal Multiple Access (NOMA) can increase the s… ▽ More Future multi-tier communication networks will require enhanced network capacity and reduced overhead. In the absence of Channel State Information (CSI) at the transmitters, Blind Interference Alignment (BIA) and Topological Interference Management (TIM) can achieve optimal Degrees of Freedom (DoF), minimising network's overhead. In addition, Non-Orthogonal Multiple Access (NOMA) can increase the sum rate of the network, compared to orthogonal radio access techniques currently adopted by 4G networks. Our contribution is two interference management schemes, BIA and a hybrid TIM-NOMA scheme, employed in heterogeneous networks by applying user-pairing and Kronecker Product representation. BIA manages inter- and intra-cell interference by antenna selection and appropriate message scheduling. The hybrid scheme manages intra-cell interference based on NOMA and inter-cell interference based on TIM. We show that both schemes achieve at least double the rate of TDMA. The hybrid scheme always outperforms TDMA and BIA in terms of Degrees of Freedom (DoF). Comparing the two proposed schemes, BIA achieves more DoF than TDMA under certain restrictions, and provides better Bit-Error-Rate (BER) and sum rate performance to macrocell users, whereas the hybrid scheme improves the performance of femtocell users. △ Less

Submitted 29 January, 2016; originally announced January 2016.

Comments: 30 pages, 18 figures

arXiv:1510.05390 [pdf, ps, other]

doi 10.1007/978-1-4939-7005-6_2

Entropy and thinning of discrete random variables

Authors: Oliver Johnson

Abstract: We describe five types of results concerning information and concentration of discrete random variables, and relationships between them, motivated by their counterparts in the continuous case. The results we consider are information theoretic approaches to Poisson approximation, the maximum entropy property of the Poisson distribution, discrete concentration (Poincaré and logarithmic Sobolev) ineq… ▽ More We describe five types of results concerning information and concentration of discrete random variables, and relationships between them, motivated by their counterparts in the continuous case. The results we consider are information theoretic approaches to Poisson approximation, the maximum entropy property of the Poisson distribution, discrete concentration (Poincaré and logarithmic Sobolev) inequalities, monotonicity of entropy and concavity of entropy in the Shepp--Olkin regime. △ Less

Submitted 7 June, 2016; v1 submitted 19 October, 2015; originally announced October 2015.

Comments: Draft for Proceedings of IMA theme year in Discrete Structures

Journal ref: Pages 33-53 in: Carlen E., Madiman M., Werner E. (eds) Convexity and Concentration. The IMA Volumes in Mathematics and its Applications, vol 161, 2017

arXiv:1509.06188 [pdf, other]

doi 10.1109/TIT.2017.2697358

Strong converses for group testing in the finite blocklength regime

Authors: Oliver Johnson

Abstract: We prove new strong converse results in a variety of group testing settings, generalizing a result of Baldassini, Johnson and Aldridge. These results are proved by two distinct approaches, corresponding to the non-adaptive and adaptive cases. In the non-adaptive case, we mimic the hypothesis testing argument introduced in the finite blocklength channel coding regime by Polyanskiy, Poor and Verdú.… ▽ More We prove new strong converse results in a variety of group testing settings, generalizing a result of Baldassini, Johnson and Aldridge. These results are proved by two distinct approaches, corresponding to the non-adaptive and adaptive cases. In the non-adaptive case, we mimic the hypothesis testing argument introduced in the finite blocklength channel coding regime by Polyanskiy, Poor and Verdú. In the adaptive case, we combine a formulation based on directed information theory with ideas of Kemperman, Kesten and Wolfowitz from the problem of channel coding with feedback. In both cases, we prove results which are valid for finite sized problems, and imply capacity results in the asymptotic regime. These results are illustrated graphically for a range of models. △ Less

Submitted 21 September, 2015; originally announced September 2015.

Journal ref: IEEE Transactions on Information Theory, vol 63/9, 2017, pages 5923-5933

arXiv:1508.03658 [pdf]

A hybrid TIM-NOMA scheme for the Broadcast Channel

Authors: V. Kalokidou, O. Johnson, R. Piechocki

Abstract: Future mobile communication networks will require enhanced network efficiency and reduced system overhead. Research on Blind Interference Alignment and Topological Interference Management (TIM) has shown that optimal Degrees of Freedom can be achieved, in the absence of Channel State Information at the transmitters. Moreover, the recently emerged Non-Orthogonal Multiple Access (NOMA) scheme sugges… ▽ More Future mobile communication networks will require enhanced network efficiency and reduced system overhead. Research on Blind Interference Alignment and Topological Interference Management (TIM) has shown that optimal Degrees of Freedom can be achieved, in the absence of Channel State Information at the transmitters. Moreover, the recently emerged Non-Orthogonal Multiple Access (NOMA) scheme suggests a different multiple access approach, compared to the orthogonal methods employed in 4G, resulting in high capacity gains. Our contribution is a hybrid TIM-NOMA scheme in K-user cells, where users are divided into T groups. By superimposing users in the power domain, we introduce a two-stage decoding process, managing inter-group interference based on the TIM principles, and intra-group interference based on Successful Interference Cancellation, as proposed by NOMA. We show that the hybrid scheme can improve the sum rate by at least 100% compared to Time Division Multiple Access, for high SNR values. △ Less

Submitted 4 August, 2015; originally announced August 2015.

Comments: 11 pages, Published at "EAI Endorsed Transactions on Wireless Spectrum"

arXiv:1507.06268 [pdf, ps, other]

doi 10.1214/16-AIHP778

A discrete log-Sobolev inequality under a Bakry-Emery type condition

Authors: Oliver Johnson

Abstract: We consider probability mass functions $V$ supported on the positive integers using arguments introduced by Caputo, Dai Pra and Posta, based on a Bakry--Émery condition for a Markov birth and death operator with invariant measure $V$. Under this condition, we prove a modified logarithmic Sobolev inequality, generalizing and strengthening results of Wu, Bobkov and Ledoux, and Caputo, Dai Pra and Po… ▽ More We consider probability mass functions $V$ supported on the positive integers using arguments introduced by Caputo, Dai Pra and Posta, based on a Bakry--Émery condition for a Markov birth and death operator with invariant measure $V$. Under this condition, we prove a modified logarithmic Sobolev inequality, generalizing and strengthening results of Wu, Bobkov and Ledoux, and Caputo, Dai Pra and Posta. We show how this inequality implies results including concentration of measure and hypercontractivity, and discuss how it may extend to higher dimensions. △ Less

Submitted 6 July, 2016; v1 submitted 22 July, 2015; originally announced July 2015.

Journal ref: Annales de l'Institut Henri Poincare B, vol 53/4, 2017, pages 1952-1970

arXiv:1506.07436 [pdf, other]

Distributed Wideband Spectrum Sensing

Authors: Thomas Kealy, Oliver Johnson, Robert Piechocki

Abstract: We consider the problem of reconstructing wideband frequency spectra from distributed, compressive measurements. The measurements are made by a network of nodes, each independently mixing the ambient spectra with low frequency, random signals. The reconstruction takes place via local transmissions between nodes, each performing simple statistical operations such as ridge regression and shrinkage. We consider the problem of reconstructing wideband frequency spectra from distributed, compressive measurements. The measurements are made by a network of nodes, each independently mixing the ambient spectra with low frequency, random signals. The reconstruction takes place via local transmissions between nodes, each performing simple statistical operations such as ridge regression and shrinkage. △ Less

Submitted 24 June, 2015; originally announced June 2015.

Comments: 6 pages, 6 figures, submitted to ieee Globecom 2015

arXiv:1503.01570 [pdf, ps, other]

doi 10.3150/16-BEJ860

A proof of the Shepp-Olkin entropy concavity conjecture

Authors: Erwan Hillion, Oliver Johnson

Abstract: We prove the Shepp--Olkin conjecture, which states that the entropy of the sum of independent Bernoulli random variables is concave in the parameters of the individual random variables. Our proof is a refinement of an argument previously presented by the same authors, which resolved the conjecture in the monotonic case (where all the parameters are simultaneously increasing). In fact, we show that… ▽ More We prove the Shepp--Olkin conjecture, which states that the entropy of the sum of independent Bernoulli random variables is concave in the parameters of the individual random variables. Our proof is a refinement of an argument previously presented by the same authors, which resolved the conjecture in the monotonic case (where all the parameters are simultaneously increasing). In fact, we show that the monotonic case is the worst case, using a careful analysis of concavity properties of the derivatives of the probability mass function. We propose a generalization of Shepp and Olkin's original conjecture, to consider Renyi and Tsallis entropies. △ Less

Submitted 5 March, 2015; originally announced March 2015.

Journal ref: Bernoulli 2017, Vol. 23, No. 4B, 3638-3649

arXiv:1501.07723 [pdf, other]

doi 10.1109/ICCW.2015.7247210

A hybrid TIM-NOMA scheme for the SISO Broadcast Channel

Authors: Vaia Kalokidou, Oliver Johnson, Robert Piechocki

Abstract: Future mobile communication networks will require enhanced network efficiency and reduced system overhead due to their user density and high data rate demanding applications of the mobile devices. Research on Blind Interference Alignment (BIA) and Topological Interference Management (TIM) has shown that optimal Degrees of Freedom (DoF) can be achieved, in the absence of Channel State Information (… ▽ More Future mobile communication networks will require enhanced network efficiency and reduced system overhead due to their user density and high data rate demanding applications of the mobile devices. Research on Blind Interference Alignment (BIA) and Topological Interference Management (TIM) has shown that optimal Degrees of Freedom (DoF) can be achieved, in the absence of Channel State Information (CSI) at the transmitters, reducing the network's overhead. Moreover, the recently emerged Non-Orthogonal Multiple Access (NOMA) scheme suggests a different multiple access approach, compared to the current orthogonal methods employed in 4G networks, resulting in high capacity gains. Our contribution is a hybrid TIM-NOMA scheme in Single-Input-Single-Output (SISO) K-user cells, in which users are divided into T groups, and 1/T DoF is achieved for each user. By superimposing users in the power domain, we introduce a two-stage decoding process, managing 'inter-group' interference based on the TIM principles, and 'intra-group' interference based on Successful Interference Cancellation (SIC), as proposed by NOMA. We show that for high SNR values the hybrid scheme can improve the sum rate by at least 100% when compared to Time Division Multiple Access (TDMA). △ Less

Submitted 30 January, 2015; originally announced January 2015.

Comments: 6 pages, 6 figures, submitted to IEEE ICC'15 - IEEE SCAN Workshop

arXiv:1409.8653 [pdf, other]

doi 10.1109/ALLERTON.2014.7028442

The capacity of non-identical adaptive group testing

Authors: Tom Kealy, Oliver Johnson, Robert Piechocki

Abstract: We consider the group testing problem, in the case where the items are defective independently but with non-constant probability. We introduce and analyse an algorithm to solve this problem by grou** items together appropriately. We give conditions under which the algorithm performs essentially optimally in the sense of information-theoretic capacity. We use concentration of measure results to b… ▽ More We consider the group testing problem, in the case where the items are defective independently but with non-constant probability. We introduce and analyse an algorithm to solve this problem by grou** items together appropriately. We give conditions under which the algorithm performs essentially optimally in the sense of information-theoretic capacity. We use concentration of measure results to bound the probability that this algorithm requires many more tests than the expected number. This has applications to the allocation of spectrum to cognitive radios, in the case where a database gives prior information that a particular band will be occupied. △ Less

Submitted 30 September, 2014; originally announced September 2014.

Comments: To be presented at Allerton 2014

Journal ref: 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), p101-108

arXiv:1407.2391 [pdf, other]

doi 10.1109/PIMRC.2014.7136277

Blind Interference Alignment in General Heterogeneous Networks

Authors: Vaia Kalokidou, Oliver Johnson, Robert Piechocki

Abstract: Heterogeneous networks have a key role in the design of future mobile communication networks, since the employment of small cells around a macrocell enhances the network's efficiency and decreases complexity and power demand. Moreover, research on Blind Interference Alignment (BIA) has shown that optimal Degrees of Freedom (DoF) can be achieved in certain network architectures, with no requirement… ▽ More Heterogeneous networks have a key role in the design of future mobile communication networks, since the employment of small cells around a macrocell enhances the network's efficiency and decreases complexity and power demand. Moreover, research on Blind Interference Alignment (BIA) has shown that optimal Degrees of Freedom (DoF) can be achieved in certain network architectures, with no requirement of Channel State Information (CSI) at the transmitters. Our contribution is a generalised model of BIA in a heterogeneous network with one macrocell with K users and K femtocells each with one user, by using Kronecker (Tensor) Product representation. We introduce a solution on how to vary beamforming vectors under power constraints to maximize the sum rate of the network and how optimal DoF can be achieved over K+1 time slots. △ Less

Submitted 9 July, 2014; originally announced July 2014.

Comments: 5 pages, 7 figures, accepted to IEEE PIMRC'14

Journal ref: Proceedings of IEEE PIMRC 2014, pages 816-820

arXiv:1402.4380 [pdf]

A Comparative Study of Machine Learning Methods for Verbal Autopsy Text Classification

Authors: Samuel Danso, Eric Atwell, Owen Johnson

Abstract: A Verbal Autopsy is the record of an interview about the circumstances of an uncertified death. In develo** countries, if a death occurs away from health facilities, a field-worker interviews a relative of the deceased about the circumstances of the death; this Verbal Autopsy can be reviewed off-site. We report on a comparative study of the processes involved in Text Classification applied to cl… ▽ More A Verbal Autopsy is the record of an interview about the circumstances of an uncertified death. In develo** countries, if a death occurs away from health facilities, a field-worker interviews a relative of the deceased about the circumstances of the death; this Verbal Autopsy can be reviewed off-site. We report on a comparative study of the processes involved in Text Classification applied to classifying Cause of Death: feature value representation; machine learning classification algorithms; and feature reduction strategies in order to identify the suitable approaches applicable to the classification of Verbal Autopsy text. We demonstrate that normalised term frequency and the standard TFiDF achieve comparable performance across a number of classifiers. The results also show Support Vector Machine is superior to other classification algorithms employed in this research. Finally, we demonstrate the effectiveness of employing a "locally-semi-supervised" feature reduction strategy in order to increase performance accuracy. △ Less

Submitted 18 February, 2014; originally announced February 2014.

Comments: 10 pages

Journal ref: International Journal of Computer Science Issues, Volume 10, Issue 6, No 2, November 2013

arXiv:1401.2051 [pdf]

Enhancement performance of road recognition system of autonomous robots in shadow scenario

Authors: Olusanya Y. Agunbiade, Tranos Zuva, Awosejo O. Johnson, Keneilwe Zuva

Abstract: Road region recognition is a main feature that is gaining increasing attention from intellectuals because it helps autonomous vehicle to achieve a successful navigation without accident. However, different techniques based on camera sensor have been used by various researchers and outstanding results have been achieved. Despite their success, environmental noise like shadow leads to inaccurate rec… ▽ More Road region recognition is a main feature that is gaining increasing attention from intellectuals because it helps autonomous vehicle to achieve a successful navigation without accident. However, different techniques based on camera sensor have been used by various researchers and outstanding results have been achieved. Despite their success, environmental noise like shadow leads to inaccurate recognition of road region which eventually leads to accident for autonomous vehicle. In this research, we conducted an investigation on shadow and its effects, optimized the road region recognition system of autonomous vehicle by introducing an algorithm capable of detecting and eliminating the effects of shadow. The experimental performance of our system was tested and compared using the following schemes: Total Positive Rate (TPR), False Negative Rate (FNR), Total Negative Rate (TNR), Error Rate (ERR) and False Positive Rate (FPR). The performance result of the system improved on road recognition in shadow scenario and this advancement has added tremendously to successful navigation approaches for autonomous vehicle. △ Less

Submitted 9 January, 2014; originally announced January 2014.

Comments: Signal & Image Processing : An International Journal (SIPIJ) Vol.4, No.6, December 2013

arXiv:1310.2045 [pdf, ps, other]

A de Bruijn identity for symmetric stable laws

Authors: Oliver Johnson

Abstract: We show how some attractive information--theoretic properties of Gaussians pass over to more general families of stable densities. We define a new score function for symmetric stable laws, and use it to give a stable version of the heat equation. Using this, we derive a version of the de Bruijn identity, allowing us to write the derivative of relative entropy as an inner product of score functions… ▽ More We show how some attractive information--theoretic properties of Gaussians pass over to more general families of stable densities. We define a new score function for symmetric stable laws, and use it to give a stable version of the heat equation. Using this, we derive a version of the de Bruijn identity, allowing us to write the derivative of relative entropy as an inner product of score functions. We discuss maximum entropy properties of symmetric stable densities. △ Less

Submitted 8 October, 2013; originally announced October 2013.

arXiv:1306.6438 [pdf, other]

doi 10.1109/TIT.2014.2314472

Group testing algorithms: bounds and simulations

Authors: Matthew Aldridge, Leonardo Baldassini, Oliver Johnson

Abstract: We consider the problem of non-adaptive noiseless group testing of $N$ items of which $K$ are defective. We describe four detection algorithms: the COMP algorithm of Chan et al.; two new algorithms, DD and SCOMP, which require stronger evidence to declare an item defective; and an essentially optimal but computationally difficult algorithm called SSS. By considering the asymptotic rate of these al… ▽ More We consider the problem of non-adaptive noiseless group testing of $N$ items of which $K$ are defective. We describe four detection algorithms: the COMP algorithm of Chan et al.; two new algorithms, DD and SCOMP, which require stronger evidence to declare an item defective; and an essentially optimal but computationally difficult algorithm called SSS. By considering the asymptotic rate of these algorithms with Bernoulli designs we see that DD outperforms COMP, that DD is essentially optimal in regimes where $K \geq \sqrt N$, and that no algorithm with a nonadaptive Bernoulli design can perform as well as the best non-random adaptive designs when $K > N^{0.35}$. In simulations, we see that DD and SCOMP far outperform COMP, with SCOMP very close to the optimal SSS, especially in cases with larger $K$. △ Less

Submitted 12 December, 2013; v1 submitted 27 June, 2013; originally announced June 2013.

Journal ref: IEEE Transactions on Information Theory, 60:6, 3671-3687, 2014

arXiv:1303.3381 [pdf, ps, other]

doi 10.1214/14-AOP973

Discrete versions of the transport equation and the Shepp-Olkin conjecture

Authors: Erwan Hillion, Oliver Johnson

Abstract: We introduce a framework to consider transport problems for integer-valued random variables. We introduce weighting coefficients which allow us to characterize transport problems in a gradient flow setting, and form the basis of our introduction of a discrete version of the Benamou-Brenier formula. Further, we use these coefficients to state a new form of weighted log-concavity. These results are… ▽ More We introduce a framework to consider transport problems for integer-valued random variables. We introduce weighting coefficients which allow us to characterize transport problems in a gradient flow setting, and form the basis of our introduction of a discrete version of the Benamou-Brenier formula. Further, we use these coefficients to state a new form of weighted log-concavity. These results are applied to prove the monotone case of the Shepp-Olkin entropy concavity conjecture. △ Less

Submitted 22 February, 2016; v1 submitted 14 March, 2013; originally announced March 2013.

Comments: Published at http://dx.doi.org/10.1214/14-AOP973 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOP-AOP973

Journal ref: Annals of Probability 2016, Vol. 44, No. 1, 276-306

arXiv:1301.7023 [pdf, other]

doi 10.1109/ISIT.2013.6620712

The Capacity of Adaptive Group Testing

Authors: Leonardo Baldassini, Oliver Johnson, Matthew Aldridge

Abstract: We define capacity for group testing problems and deduce bounds for the capacity of a variety of noisy models, based on the capacity of equivalent noisy communication channels. For noiseless adaptive group testing we prove an information-theoretic lower bound which tightens a bound of Chan et al. This can be combined with a performance analysis of a version of Hwang's adaptive group testing algori… ▽ More We define capacity for group testing problems and deduce bounds for the capacity of a variety of noisy models, based on the capacity of equivalent noisy communication channels. For noiseless adaptive group testing we prove an information-theoretic lower bound which tightens a bound of Chan et al. This can be combined with a performance analysis of a version of Hwang's adaptive group testing algorithm, in order to deduce the capacity of noiseless and erasure group testing models. △ Less

Submitted 18 July, 2013; v1 submitted 29 January, 2013; originally announced January 2013.

Comments: 5 pages

Journal ref: Proceedings of the International Symposium on Information Theory (ISIT) 2013, pages 2676-2680

arXiv:1106.5714 [pdf, other]

doi 10.1007/s11009-013-9359-2

Non-parametric change-point detection using string matching algorithms

Authors: Oliver Johnson, Dino Sejdinovic, James Cruise, Ayalvadi Ganesh, Robert Piechocki

Abstract: Given the output of a data source taking values in a finite alphabet, we wish to detect change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs w… ▽ More Given the output of a data source taking values in a finite alphabet, we wish to detect change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately detect the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, we establish consistency of the CRECHE estimator under a related toy model, by establishing a fluid limit and using martingale arguments. △ Less

Submitted 28 July, 2011; v1 submitted 28 June, 2011; originally announced June 2011.

Journal ref: Methodology and Computing in Applied Probability. 16(4) p. 987-1008 (2014)

arXiv:1010.2441 [pdf, ps, other]

doi 10.1109/ALLERTON.2010.5707018

Note on Noisy Group Testing: Asymptotic Bounds and Belief Propagation Reconstruction

Authors: Dino Sejdinovic, Oliver Johnson

Abstract: An information theoretic perspective on group testing problems has recently been proposed by Atia and Saligrama, in order to characterise the optimal number of tests. Their results hold in the noiseless case, where only false positives occur, and where only false negatives occur. We extend their results to a model containing both false positives and false negatives, develo** simple information t… ▽ More An information theoretic perspective on group testing problems has recently been proposed by Atia and Saligrama, in order to characterise the optimal number of tests. Their results hold in the noiseless case, where only false positives occur, and where only false negatives occur. We extend their results to a model containing both false positives and false negatives, develo** simple information theoretic bounds on the number of tests required. Based on these bounds, we obtain an improved order of convergence in the case of false negatives only. Since these results are based on (computationally infeasible) joint typicality decoding, we propose a belief propagation algorithm for the detection of defective items and compare its actual performance to the theoretical bounds. △ Less

Submitted 12 October, 2010; originally announced October 2010.

Comments: 5 pages, 3 figures, presented at the Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, September 29 - October 1, 2010, Monticello, IL, USA

arXiv:1004.3692 [pdf, ps, other]

doi 10.1214/EJP.v15-799

Compound Poisson Approximation via Information Functionals

Authors: A. D. Barbour, Oliver Johnson, Ioannis Kontoyiannis, Mokshay Madiman

Abstract: An information-theoretic development is given for the problem of compound Poisson approximation, which parallels earlier treatments for Gaussian and Poisson approximation. Let $P_{S_n}$ be the distribution of a sum $S_n=\Sumn Y_i$ of independent integer-valued random variables $Y_i$. Nonasymptotic bounds are derived for the distance between $P_{S_n}$ and an appropriately chosen compound Poisson la… ▽ More An information-theoretic development is given for the problem of compound Poisson approximation, which parallels earlier treatments for Gaussian and Poisson approximation. Let $P_{S_n}$ be the distribution of a sum $S_n=\Sumn Y_i$ of independent integer-valued random variables $Y_i$. Nonasymptotic bounds are derived for the distance between $P_{S_n}$ and an appropriately chosen compound Poisson law. In the case where all $Y_i$ have the same conditional distribution given $\{Y_i\neq 0\}$, a bound on the relative entropy distance between $P_{S_n}$ and the compound Poisson distribution is derived, based on the data-processing property of relative entropy and earlier Poisson approximation results. When the $Y_i$ have arbitrary distributions, corresponding bounds are derived in terms of the total variation distance. The main technical ingredient is the introduction of two "information functionals," and the analysis of their properties. These information functionals play a role analogous to that of the classical Fisher information in normal approximation. Detailed comparisons are made between the resulting inequalities and related bounds. △ Less

Submitted 21 April, 2010; originally announced April 2010.

Comments: 27 pages

Journal ref: Electronic Journal of Probability, Vol 15, Paper no. 42, pages 1344-1369, 2010

arXiv:1004.0208 [pdf, other]

doi 10.1109/ISIT.2012.6283994

Delay-rate tradeoff in ergodic interference alignment

Authors: Oliver Johnson, Matthew Aldridge, Robert Piechocki

Abstract: Ergodic interference alignment, as introduced by Nazer et al (NGJV), is a technique that allows high-rate communication in n-user interference networks with fast fading. It works by splitting communication across a pair of fading matrices. However, it comes with the overhead of a long time delay until matchable matrices occur: the delay is q^n^2 for field size q. In this paper, we outline two ne… ▽ More Ergodic interference alignment, as introduced by Nazer et al (NGJV), is a technique that allows high-rate communication in n-user interference networks with fast fading. It works by splitting communication across a pair of fading matrices. However, it comes with the overhead of a long time delay until matchable matrices occur: the delay is q^n^2 for field size q. In this paper, we outline two new families of schemes, called JAP and JAP-B, that reduce the expected delay, sometimes at the cost of a reduction in rate from the NGJV scheme. In particular, we give examples of good schemes for networks with few users, and show that in large n-user networks, the delay scales like q^T, where T is quadratic in n for a constant per-user rate and T is constant for a constant sum-rate. We also show that half the single-user rate can be achieved while reducing NGJV's delay from q^n^2 to q^(n-1)(n-2). This extended version includes complete proofs and more details of good schemes for small n. △ Less

Submitted 16 May, 2012; v1 submitted 1 April, 2010; originally announced April 2010.

Comments: Extended version of a paper presented at the 2012 International Symposium on Information Theory. 7 pages, 1 figure

Journal ref: 2012 IEEE International Symposium on Information Theory Proceedings, 2626-2630 (shorter conference version)

arXiv:1002.0235 [pdf, ps, other]

doi 10.1109/ISIT.2010.5513390

Asymptotic Sum-Capacity of Random Gaussian Interference Networks Using Interference Alignment

Authors: Matthew Aldridge, Oliver Johnson, Robert Piechocki

Abstract: We consider a dense n-user Gaussian interference network formed by paired transmitters and receivers placed independently at random in Euclidean space. Under natural conditions on the node position distributions and signal attenuation, we prove convergence in probability of the average per-user capacity C_Sigma/n to 1/2 E log(1 + 2SNR). The achievability result follows directly from results base… ▽ More We consider a dense n-user Gaussian interference network formed by paired transmitters and receivers placed independently at random in Euclidean space. Under natural conditions on the node position distributions and signal attenuation, we prove convergence in probability of the average per-user capacity C_Sigma/n to 1/2 E log(1 + 2SNR). The achievability result follows directly from results based on an interference alignment scheme presented in recent work of Nazer et al. Our main contribution comes through the converse result, motivated by ideas of `bottleneck links' developed in recent work of Jafar. An information theoretic argument gives a capacity bound on such bottleneck links, and probabilistic counting arguments show there are sufficiently many such links to tightly bound the sum-capacity of the whole network. △ Less

Submitted 2 June, 2010; v1 submitted 1 February, 2010; originally announced February 2010.

Comments: 5 pages; to appear at ISIT 2010

ACM Class: E.4

Journal ref: 2010 IEEE International Symposium on Information Theory, Austin, Texas, June 2010, pages 410-414

arXiv:0912.0581 [pdf, ps, other]

doi 10.1016/j.dam.2011.08.025

Log-concavity, ultra-log-concavity, and a maximum entropy property of discrete compound Poisson measures

Authors: Oliver Johnson, Ioannis Kontoyiannis, Mokshay Madiman

Abstract: Sufficient conditions are developed, under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. Recently, one of the authors [O. Johnson, {\em Stoch. Proc. Appl.}, 2007] used a semigroup approach to show that the Poisson has maximal entropy among all ultra-log-concave distributions with fixed mean. We show via a non… ▽ More Sufficient conditions are developed, under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. Recently, one of the authors [O. Johnson, {\em Stoch. Proc. Appl.}, 2007] used a semigroup approach to show that the Poisson has maximal entropy among all ultra-log-concave distributions with fixed mean. We show via a non-trivial extension of this semigroup approach that the natural analog of the Poisson maximum entropy property remains valid if the compound Poisson distributions under consideration are log-concave, but that it fails in general. A parallel maximum entropy result is established for the family of compound binomial measures. Sufficient conditions for compound distributions to be log-concave are discussed and applications to combinatorics are examined; new bounds are derived on the entropy of the cardinality of a random independent set in a claw-free graph, and a connection is drawn to Mason's conjecture for matroids. The present results are primarily motivated by the desire to provide an information-theoretic foundation for compound Poisson approximation and associated limit theorems, analogous to the corresponding developments for the central limit theorem and for Poisson approximation. Our results also demonstrate new links between some probabilistic methods and the combinatorial notions of log-concavity and ultra-log-concavity, and they add to the growing body of work exploring the applications of maximum entropy characterizations to problems in discrete mathematics. △ Less

Submitted 27 September, 2011; v1 submitted 3 December, 2009; originally announced December 2009.

Comments: 30 pages. This submission supersedes arXiv:0805.4112v1. Changes in v2: Updated references, typos corrected

MSC Class: 94A17; 60E07; 60E15

Journal ref: Discrete Applied Mathematics, vol 161/9, pages 1232-1250, 2013

arXiv:0909.0641 [pdf, ps, other]

doi 10.1109/TIT.2010.2070570

Monotonicity, thinning and discrete versions of the Entropy Power Inequality

Authors: Oliver Johnson, Yaming Yu

Abstract: We consider the entropy of sums of independent discrete random variables, in analogy with Shannon's Entropy Power Inequality, where equality holds for normals. In our case, infinite divisibility suggests that equality should hold for Poisson variables. We show that some natural analogues of the Entropy Power Inequality do not in fact hold, but propose an alternative formulation which does always h… ▽ More We consider the entropy of sums of independent discrete random variables, in analogy with Shannon's Entropy Power Inequality, where equality holds for normals. In our case, infinite divisibility suggests that equality should hold for Poisson variables. We show that some natural analogues of the Entropy Power Inequality do not in fact hold, but propose an alternative formulation which does always hold. The key to many proofs of Shannon's Entropy Power Inequality is the behaviour of entropy on scaling of continuous random variables. We believe that Rényi's operation of thinning discrete random variables plays a similar role to scaling, and give a sharp bound on how the entropy of ultra log-concave random variables behaves on thinning. In the spirit of the monotonicity results established by Artstein, Ball, Barthe and Naor, we prove a stronger version of concavity of entropy, which implies a strengthened form of our discrete Entropy Power Inequality. △ Less

Submitted 15 July, 2010; v1 submitted 3 September, 2009; originally announced September 2009.

Comments: 9 pages (revised to take account of referees' comments)

Journal ref: IEEE Transactions on Information Theory, Vol 56/11, 2010, pages 5387-5395

arXiv:0907.5165 [pdf, ps, other]

doi 10.1109/TIT.2010.2090242

Interference alignment-based sum capacity bounds for random dense Gaussian interference networks

Authors: Oliver Johnson, Matthew Aldridge, Robert Piechocki

Abstract: We consider a dense $K$ user Gaussian interference network formed by paired transmitters and receivers placed independently at random in a fixed spatial region. Under natural conditions on the node position distributions and signal attenuation, we prove convergence in probability of the average per-user capacity $\csum/K$ to $\half \ep \log(1 + 2 \SNR)$. The achievability result follows directly… ▽ More We consider a dense $K$ user Gaussian interference network formed by paired transmitters and receivers placed independently at random in a fixed spatial region. Under natural conditions on the node position distributions and signal attenuation, we prove convergence in probability of the average per-user capacity $\csum/K$ to $\half \ep \log(1 + 2 \SNR)$. The achievability result follows directly from results based on an interference alignment scheme presented in recent work of Nazer et al. Our main contribution comes through an upper bound, motivated by ideas of `bottleneck capacity' developed in recent work of Jafar. By controlling the physical location of transmitter--receiver pairs, we can match a large proportion of these pairs to form so-called $ε$-bottleneck links, with consequent control of the sum capacity. △ Less

Submitted 29 July, 2009; originally announced July 2009.

Comments: 23 pages

Journal ref: IEEE Transactions on Information Theory, 57:1, 282-290, 2011

arXiv:0906.0690 [pdf, ps, other]

doi 10.1109/TIT.2010.2053893

Thinning, Entropy and the Law of Thin Numbers

Authors: Peter Harremoes, Oliver Johnson, Ioannis Kontoyiannis

Abstract: Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referre… ▽ More Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referred to as the "law of small numbers" is seen to be a special case of a thinning limit theorem for convolutions of discrete distributions. A rate of convergence is provided for this limit, and nonasymptotic bounds are also established. This development parallels, in part, the development of Gaussian inequalities leading to the information-theoretic version of the central limit theorem. In particular, a "thinning Markov chain" is introduced, and it is shown to play a role analogous to that of the Ornstein-Uhlenbeck process in connection to the entropy power inequality. △ Less

Submitted 3 June, 2009; originally announced June 2009.

Journal ref: IEEE Transactions on Information Theory, Vol 56/9, 2010, pages 4228-4244

arXiv:0904.1446 [pdf, ps, other]

doi 10.1109/ISIT.2009.5205880

Concavity of entropy under thinning

Authors: Yaming Yu, Oliver Johnson

Abstract: Building on the recent work of Johnson (2007) and Yu (2008), we prove that entropy is a concave function with respect to the thinning operation T_a. That is, if X and Y are independent random variables on Z_+ with ultra-log-concave probability mass functions, then H(T_a X+T_{1-a} Y)>= a H(X)+(1-a)H(Y), 0 <= a <= 1, where H denotes the discrete entropy. This is a discrete analogue of the inequali… ▽ More Building on the recent work of Johnson (2007) and Yu (2008), we prove that entropy is a concave function with respect to the thinning operation T_a. That is, if X and Y are independent random variables on Z_+ with ultra-log-concave probability mass functions, then H(T_a X+T_{1-a} Y)>= a H(X)+(1-a)H(Y), 0 <= a <= 1, where H denotes the discrete entropy. This is a discrete analogue of the inequality (h denotes the differential entropy) h(sqrt(a) X + sqrt{1-a} Y)>= a h(X)+(1-a) h(Y), 0 <= a <= 1, which holds for continuous X and Y with finite variances and is equivalent to Shannon's entropy power inequality. As a consequence we establish a special case of a conjecture of Shepp and Olkin (1981). △ Less

Submitted 8 April, 2009; originally announced April 2009.

Comments: To be presented at ISIT09

Journal ref: IEEE International Symposium on Information Theory, June 28 2009-July 3, 2009, pp. 144 -- 148

arXiv:0805.4112 [pdf, ps, other]

On the entropy and log-concavity of compound Poisson measures

Authors: Oliver Johnson, Ioannis Kontoyiannis, Mokshay Madiman

Abstract: Motivated, in part, by the desire to develop an information-theoretic foundation for compound Poisson approximation limit theorems (analogous to the corresponding developments for the central limit theorem and for simple Poisson approximation), this work examines sufficient conditions under which the compound Poisson distribution has maximal entropy within a natural class of probability measures… ▽ More Motivated, in part, by the desire to develop an information-theoretic foundation for compound Poisson approximation limit theorems (analogous to the corresponding developments for the central limit theorem and for simple Poisson approximation), this work examines sufficient conditions under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. We show that the natural analog of the Poisson maximum entropy property remains valid if the measures under consideration are log-concave, but that it fails in general. A parallel maximum entropy result is established for the family of compound binomial measures. The proofs are largely based on ideas related to the semigroup approach introduced in recent work by Johnson for the Poisson family. Sufficient conditions are given for compound distributions to be log-concave, and specific examples are presented illustrating all the above results. △ Less

Submitted 27 May, 2008; originally announced May 2008.

Report number: Superceded by arXiv:0912.0581 MSC Class: 62B10; 94A17

Showing 1–46 of 46 results for author: Johnson, O