Search | arXiv e-print repository

Statistical and Computational Phase Transitions in Group Testing

Authors: Amin Coja-Oghlan, Oliver Gebhard, Max Hahn-Klimroth, Alexander S. Wein, Ilias Zadik

Abstract: We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease within a population of size n, based on the outcomes of pooled tests which return positive whenever there is at least one infected individual in the tested group. We consider two different simple random procedures for assigning individuals to tests: the constant-column design an… ▽ More We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease within a population of size n, based on the outcomes of pooled tests which return positive whenever there is at least one infected individual in the tested group. We consider two different simple random procedures for assigning individuals to tests: the constant-column design and Bernoulli design. Our first set of results concerns the fundamental statistical limits. For the constant-column design, we give a new information-theoretic lower bound which implies that the proportion of correctly identifiable infected individuals undergoes a sharp "all-or-nothing" phase transition when the number of tests crosses a particular threshold. For the Bernoulli design, we determine the precise number of tests required to solve the associated detection problem (where the goal is to distinguish between a group testing instance and pure noise), improving both the upper and lower bounds of Truong, Aldridge, and Scarlett (2020). For both group testing models, we also study the power of computationally efficient (polynomial-time) inference procedures. We determine the precise number of tests required for the class of low-degree polynomial algorithms to solve the detection problem. This provides evidence for an inherent computational-statistical gap in both the detection and recovery problems at small sparsity levels. Notably, our evidence is contrary to that of Iliopoulos and Zadik (2021), who predicted the absence of a computational-statistical gap in the Bernoulli design. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2022

arXiv:2205.08203 [pdf, other]

On the Hierarchy of Distributed Majority Protocols

Authors: Petra Berenbrink, Amin Coja-Oghlan, Oliver Gebhard, Max Hahn-Klimroth, Dominik Kaaser, Malin Rau

Abstract: We study the Consensus problem among $n$ agents, defined as follows. Initially, each agent holds one of two possible opinions. The goal is to reach a consensus configuration in which every agent shares the same opinion. To this end, agents randomly sample other agents and update their opinion according to a simple update function depending on the sampled opinions. We consider two communication m… ▽ More We study the Consensus problem among $n$ agents, defined as follows. Initially, each agent holds one of two possible opinions. The goal is to reach a consensus configuration in which every agent shares the same opinion. To this end, agents randomly sample other agents and update their opinion according to a simple update function depending on the sampled opinions. We consider two communication models: the gossip model and a variant of the population model. In the gossip model, agents are activated in parallel, synchronous rounds. In the population model, one agent is activated after the other in a sequence of discrete time steps. For both models we analyze the following natural family of majority processes called $j$-Majority: when activated, every agent samples $j$ other agents uniformly at random (with replacement) and adopts the majority opinion among the sample (breaking ties uniformly at random). As our main result we show a hierarchy among majority protocols: $(j+1)$-Majority (for $j > 1$) converges stochastically faster than $j$-Majority for any initial opinion configuration. In our analysis we use Strassen's Theorem to prove the existence of a coupling. This gives an affirmative answer for the case of two opinions to an open question asked by Berenbrink et al. [2017]. △ Less

Submitted 17 May, 2022; originally announced May 2022.

arXiv:2103.13039 [pdf, ps, other]

Note on the offspring distribution for group testing in the linear regime

Authors: Oliver Gebhard, Philipp Loick

Abstract: The group testing problem is concerned with identifying a small set of $k$ infected individuals in a large population of $n$ people. At our disposal is a testing scheme that can test groups of individuals. A test comes back positive if and only if at least one individual is infected. In this note, we lay groundwork for analysing belief propagation for group testing when $k$ scales linearly in $n$.… ▽ More The group testing problem is concerned with identifying a small set of $k$ infected individuals in a large population of $n$ people. At our disposal is a testing scheme that can test groups of individuals. A test comes back positive if and only if at least one individual is infected. In this note, we lay groundwork for analysing belief propagation for group testing when $k$ scales linearly in $n$. To this end, we derive the offspring distribution for different types of individuals. With these distributions at hand, one can employ the population dynamics algorithm to simulate the posterior marginal distribution resulting from belief propagation. △ Less

Submitted 24 March, 2021; originally announced March 2021.

arXiv:2007.01376 [pdf, other]

doi 10.1109/TIT.2021.3138489

Improved bounds for noisy group testing with constant tests per item

Authors: Oliver Gebhard, Oliver Johnson, Philipp Loick, Maurice Rolvien

Abstract: The group testing problem is concerned with identifying a small set of infected individuals in a large population. At our disposal is a testing procedure that allows us to test several individuals together. In an idealized setting, a test is positive if and only if at least one infected individual is included and negative otherwise. Significant progress was made in recent years towards understandi… ▽ More The group testing problem is concerned with identifying a small set of infected individuals in a large population. At our disposal is a testing procedure that allows us to test several individuals together. In an idealized setting, a test is positive if and only if at least one infected individual is included and negative otherwise. Significant progress was made in recent years towards understanding the information-theoretic and algorithmic properties in this noiseless setting. In this paper, we consider a noisy variant of group testing where test results are flipped with certain probability, including the realistic scenario where sensitivity and specificity can take arbitrary values. Using a test design where each individual is assigned to a fixed number of tests, we derive explicit algorithmic bounds for two commonly considered inference algorithms and thereby naturally extend the results of Scarlett \& Cevher (2016) and Scarlett \& Johnson (2020). We provide improved performance guarantees for the efficient algorithms in these noisy group testing models -- indeed, for a large set of parameter choices the bounds provided in the paper are the strongest currently proved. △ Less

Submitted 21 December, 2021; v1 submitted 2 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Information Theory, vol 68/4, 2022, pages 2604-2621

arXiv:2004.11860 [pdf, other]

Near optimal sparsity-constrained group testing: improved bounds and algorithms

Authors: Oliver Gebhard, Max Hahn-Klimroth, Olaf Parczyk, Manuel Penschuck, Maurice Rolvien, Jonathan Scarlett, Nelvin Tan

Abstract: Recent advances in noiseless non-adaptive group testing have led to a precise asymptotic characterization of the number of tests required for high-probability recovery in the sublinear regime $k = n^θ$ (with $θ\in (0,1)$), with $n$ individuals among which $k$ are infected. However, the required number of tests may increase substantially under real-world practical constraints, notably including bou… ▽ More Recent advances in noiseless non-adaptive group testing have led to a precise asymptotic characterization of the number of tests required for high-probability recovery in the sublinear regime $k = n^θ$ (with $θ\in (0,1)$), with $n$ individuals among which $k$ are infected. However, the required number of tests may increase substantially under real-world practical constraints, notably including bounds on the maximum number $Δ$ of tests an individual can be placed in, or the maximum number $Γ$ of individuals in a given test. While previous works have given recovery guarantees for these settings, significant gaps remain between the achievability and converse bounds. In this paper, we substantially or completely close several of the most prominent gaps. In the case of $Δ$-divisible items, we show that the definite defectives (DD) algorithm coupled with a random regular design is asymptotically optimal in dense scaling regimes, and optimal to within a factor of $\eul$ more generally; we establish this by strengthening both the best known achievability and converse bounds. In the case of $Γ$-sized tests, we provide a comprehensive analysis of the regime $Γ= Θ(1)$, and again establish a precise threshold proving the asymptotic optimality of SCOMP (a slight refinement of DD) equipped with a tailored pooling scheme. Finally, for each of these two settings, we provide near-optimal adaptive algorithms based on sequential splitting, and provably demonstrate gaps between the performance of optimal adaptive and non-adaptive algorithms. △ Less

Submitted 22 December, 2021; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: Accepted for publication at IEEE Transactions on Information Theory

MSC Class: 05C80; 60B20; 68P30

arXiv:1911.02287 [pdf, ps, other]

doi 10.1017/S096354832100002X

Optimal group testing

Authors: Amin Coja-Oghlan, Oliver Gebhard, Max Hahn-Klimroth, Philipp Loick

Abstract: In the group testing problem the aim is to identify a small set of $k\sim n^θ$ infected individuals out of a population size $n$, $0<θ<1$. We avail ourselves of a test procedure capable of testing groups of individuals, with the test returning a positive result iff at least one individual in the group is infected. The aim is to devise a test design with as few tests as possible so that the set of… ▽ More In the group testing problem the aim is to identify a small set of $k\sim n^θ$ infected individuals out of a population size $n$, $0<θ<1$. We avail ourselves of a test procedure capable of testing groups of individuals, with the test returning a positive result iff at least one individual in the group is infected. The aim is to devise a test design with as few tests as possible so that the set of infected individuals can be identified correctly with high probability. We establish an explicit sharp information-theoretic/algorithmic phase transition $\minf$ for non-adaptive group testing, where all tests are conducted in parallel. Thus, with more than $\minf$ tests the infected individuals can be identified in polynomial time \whp, while learning the set of infected individuals is information-theoretically impossible with fewer tests. In addition, we develop an optimal adaptive scheme where the tests are conducted in two stages. △ Less

Submitted 18 April, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

MSC Class: 05C80; 60B20; 68P30

arXiv:1905.01458 [pdf, ps, other]

On the Parallel Reconstruction from Pooled Data

Authors: Oliver Gebhard, Max Hahn-Klimroth, Dominik Kaaser, Philipp Loick

Abstract: In the pooled data problem the goal is to efficiently reconstruct a binary signal from additive measurements. Given a signal $σ\in \{ 0,1 \}^n$, we can query multiple entries at once and get the total number of non-zero entries in the query as a result. We assume that queries are time-consuming and therefore focus on the setting where all queries are executed in parallel. For the regime where the… ▽ More In the pooled data problem the goal is to efficiently reconstruct a binary signal from additive measurements. Given a signal $σ\in \{ 0,1 \}^n$, we can query multiple entries at once and get the total number of non-zero entries in the query as a result. We assume that queries are time-consuming and therefore focus on the setting where all queries are executed in parallel. For the regime where the signal is sparse such that $ || σ||_1 = o(n)$ our results are twofold: First, we propose and analyze a simple and efficient greedy reconstruction algorithm. Secondly, we derive a sharp information-theoretic threshold for the minimum number of queries required to reconstruct $σ$ with high probability. Our first result matches the performance guarantees of much more involved constructions (Karimi et al. 2019). Our second result extends a result of Alaoui et al. (2014) and Scarlett & Cevher (2017) who studied the pooled data problem for dense signals. Finally, our theoretical findings are complemented with empirical simulations. Our data not only confirm the information-theoretic thresholds but also hint at the practical applicability of our pooling scheme and the simple greedy reconstruction algorithm. △ Less

Submitted 13 April, 2022; v1 submitted 4 May, 2019; originally announced May 2019.

Comments: Accepted at 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS)

arXiv:1902.02202 [pdf, ps, other]

doi 10.1109/TIT.2020.3023377

Information-theoretic and algorithmic thresholds for group testing

Authors: Amin Coja-Oghlan, Oliver Gebhard, Max Hahn-Klimroth, Philipp Loick

Abstract: In the group testing problem we aim to identify a small number of infected individuals within a large population. We avail ourselves to a procedure that can test a group of multiple individuals, with the test result coming out positive iff at least one individual in the group is infected. With all tests conducted in parallel, what is the least number of tests required to identify the status of all… ▽ More In the group testing problem we aim to identify a small number of infected individuals within a large population. We avail ourselves to a procedure that can test a group of multiple individuals, with the test result coming out positive iff at least one individual in the group is infected. With all tests conducted in parallel, what is the least number of tests required to identify the status of all individuals? In a recent test design [Aldridge et al.\ 2016] the individuals are assigned to test groups randomly, with every individual joining an equal number of groups. We pinpoint the sharp threshold for the number of tests required in this randomised design so that it is information-theoretically possible to infer the infection status of every individual. Moreover, we analyse two efficient inference algorithms. These results settle conjectures from [Aldridge et al.\ 2014, Johnson et al.\ 2019]. △ Less

Submitted 30 September, 2020; v1 submitted 6 February, 2019; originally announced February 2019.

MSC Class: 05C80

Showing 1–8 of 8 results for author: Gebhard, O