-
Hitting a prime in 2.43 dice rolls (on average)
Authors:
Noga Alon,
Yaakov Malinovsky
Abstract:
What is the number of rolls of fair 6-sided dice until the first time the total sum of all rolls is a prime? We compute the expectation and the variance of this random variable up to an additive error of less than 10^{-4}. This is a solution to a puzzle suggested by DasGupta (2017) in the Bulletin of the Institute of Mathematical Statistics, where the published solution is incomplete. The proof is…
▽ More
What is the number of rolls of fair 6-sided dice until the first time the total sum of all rolls is a prime? We compute the expectation and the variance of this random variable up to an additive error of less than 10^{-4}. This is a solution to a puzzle suggested by DasGupta (2017) in the Bulletin of the Institute of Mathematical Statistics, where the published solution is incomplete. The proof is simple, combining a basic dynamic programming algorithm with a quick Matlab computation and basic facts about the distribution of primes.
△ Less
Submitted 23 January, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Two-Stage and Sequential Unbiased Estimation of N in Binomial Trials, when the Probability of Success p is Unknown
Authors:
Yaakov Malinovsky,
Shelemyahu Zacks
Abstract:
We propose two-stage and sequential procedures to estimate the unknown parameter N of a binomial distribution with unknown parameter p, when we reinforce data with an independent sample of a negative-binomial experiment having the same p.
We propose two-stage and sequential procedures to estimate the unknown parameter N of a binomial distribution with unknown parameter p, when we reinforce data with an independent sample of a negative-binomial experiment having the same p.
△ Less
Submitted 13 April, 2022; v1 submitted 30 December, 2021;
originally announced December 2021.
-
Nested Group Testing Procedures for Screening
Authors:
Yaakov Malinovsky,
Paul S. Albert
Abstract:
This article reviews a class of adaptive group testing procedures that operate under a probabilistic model assumption as follows. Consider a set of $N$ items, where item $i$ has the probability $p$ ($p_i$ in the generalized group testing) to be defective, and the probability $1-p$ to be non-defective independent from the other items. A group test applied to any subset of size $n$ is a binary test…
▽ More
This article reviews a class of adaptive group testing procedures that operate under a probabilistic model assumption as follows. Consider a set of $N$ items, where item $i$ has the probability $p$ ($p_i$ in the generalized group testing) to be defective, and the probability $1-p$ to be non-defective independent from the other items. A group test applied to any subset of size $n$ is a binary test with two possible outcomes, positive or negative. The outcome is negative if all $n$ items are non-defective, whereas the outcome is positive if at least one item among the $n$ items is defective. The goal is complete identification of all $N$ items with the minimum expected number of tests.
△ Less
Submitted 17 February, 2021; v1 submitted 6 February, 2021;
originally announced February 2021.
-
A note on the closed-form solution for the longest head run problem of Abraham de Moivre
Authors:
Yaakov Malinovsky
Abstract:
The problem of the longest head run was introduced and solved by Abraham de Moivre in the second edition of his book Doctrine of Chances (de Moivre, 1738). The closed-form solution as a finite sum involving binomial coefficients was provided in Uspensky (1937). Since then, the problem and its variations and extensions have found broad interest and diverse applications. Surprisingly, a very simple…
▽ More
The problem of the longest head run was introduced and solved by Abraham de Moivre in the second edition of his book Doctrine of Chances (de Moivre, 1738). The closed-form solution as a finite sum involving binomial coefficients was provided in Uspensky (1937). Since then, the problem and its variations and extensions have found broad interest and diverse applications. Surprisingly, a very simple closed form can be obtained, which we present in this note.
△ Less
Submitted 21 February, 2021; v1 submitted 16 September, 2020;
originally announced September 2020.
-
Is Group Testing Ready for Prime-time in Disease Identification?
Authors:
Gregory Haber,
Yaakov Malinovsky,
Paul S. Albert
Abstract:
Large scale disease screening is a complicated process in which high costs must be balanced against pressing public health needs. When the goal is screening for infectious disease, one approach is group testing in which samples are initially tested in pools and individual samples are retested only if the initial pooled test was positive. Intuitively, if the prevalence of infection is small, this c…
▽ More
Large scale disease screening is a complicated process in which high costs must be balanced against pressing public health needs. When the goal is screening for infectious disease, one approach is group testing in which samples are initially tested in pools and individual samples are retested only if the initial pooled test was positive. Intuitively, if the prevalence of infection is small, this could result in a large reduction of the total number of tests required. Despite this, the use of group testing in medical studies has been limited, largely due to skepticism about the impact of pooling on the accuracy of a given assay. While there is a large body of research addressing the issue of testing errors in group testing studies, it is customary to assume that the misclassification parameters are known from an external population and/or that the values do not change with the group size. Both of these assumptions are highly questionable for many medical practitioners considering group testing in their study design. In this article, we explore how the failure of these assumptions might impact the efficacy of a group testing design and, consequently, whether group testing is currently feasible for medical screening. Specifically, we look at how incorrect assumptions about the sensitivity function at the design stage can lead to poor estimation of a procedure's overall sensitivity and expected number of tests. Furthermore, if a validation study is used to estimate the pooled misclassification parameters of a given assay, we show that the sample sizes required are so large as to be prohibitive in all but the largest screening programs
△ Less
Submitted 27 February, 2021; v1 submitted 9 April, 2020;
originally announced April 2020.
-
An optimal design for hierarchical generalized group testing
Authors:
Yaakov Malinovsky,
Gregory Haber,
Paul S. Albert
Abstract:
Choosing an optimal strategy for hierarchical group testing is an important problem for practitioners who are interested in disease screening with limited resources. For example, when screening for infectious diseases in large populations, it is important to use algorithms that minimize the cost of potentially expensive assays. Black et al. (2015) described this as an intractable problem unless th…
▽ More
Choosing an optimal strategy for hierarchical group testing is an important problem for practitioners who are interested in disease screening with limited resources. For example, when screening for infectious diseases in large populations, it is important to use algorithms that minimize the cost of potentially expensive assays. Black et al. (2015) described this as an intractable problem unless the number of individuals to screen is small. They proposed an approximation to an optimal strategy that is difficult to implement for large population sizes. In this article, we develop an optimal design with respect to the expected total number of tests that can be obtained using a novel dynamic programming algorithm. We show that this algorithm is substantially more efficient than the approach proposed by Black et al. (2015). In addition, we compare the two designs for imperfect tests. R code is provided for the practitioner.
△ Less
Submitted 26 February, 2020; v1 submitted 9 August, 2018;
originally announced August 2018.
-
Efficient methods for the estimation of the multinomial parameter for the two-trait group testing model
Authors:
Gregory Haber,
Yaakov Malinovsky
Abstract:
Estimation of a single Bernoulli parameter using pooled sampling is among the oldest problems in the group testing literature. To carry out such estimation, an array of efficient estimators have been introduced covering a wide range of situations routinely encountered in applications. More recently, there has been growing interest in using group testing to simultaneously estimate the joint probabi…
▽ More
Estimation of a single Bernoulli parameter using pooled sampling is among the oldest problems in the group testing literature. To carry out such estimation, an array of efficient estimators have been introduced covering a wide range of situations routinely encountered in applications. More recently, there has been growing interest in using group testing to simultaneously estimate the joint probabilities of two correlated traits using a multinomial model. Unfortunately, basic estimation results, such as the maximum likelihood estimator (MLE), have not been adequately addressed in the literature for such cases. In this paper, we show that finding the MLE for this problem is equivalent to maximizing a multinomial likelihood with a restricted parameter space. A solution using the EM algorithm is presented which is guaranteed to converge to the global maximizer, even on the boundary of the parameter space. Two additional closed form estimators are presented with the goal of minimizing the bias and/or mean square error. The methods are illustrated by considering an application to the joint estimation of transmission prevalence for two strains of the Potato virus Y by the aphid myzus persicae.
△ Less
Submitted 20 May, 2019; v1 submitted 3 May, 2018;
originally announced May 2018.
-
Conjectures on Optimal Nested Generalized Group Testing Algorithm
Authors:
Yaakov Malinovsky
Abstract:
Consider a finite population of $N$ items, where item $i$ has a probability $p_i$ to be defective. The goal is to identify all items by means of group testing. This is the generalized group testing problem (hereafter GGTP). In the case of $\displaystyle p_1=\cdots=p_{N}=p$ \cite{YH1990} proved that the pairwise testing algorithm is the optimal nested algorithm, with respect to the expected number…
▽ More
Consider a finite population of $N$ items, where item $i$ has a probability $p_i$ to be defective. The goal is to identify all items by means of group testing. This is the generalized group testing problem (hereafter GGTP). In the case of $\displaystyle p_1=\cdots=p_{N}=p$ \cite{YH1990} proved that the pairwise testing algorithm is the optimal nested algorithm, with respect to the expected number of tests, for all $N$ if and only if $\displaystyle p \in [1-1/\sqrt{2},\,(3-\sqrt{5})/2]$ (R-range hereafter) (an optimal at the boundary values). In this note, we present a result that helps to define the generalized pairwise testing algorithm (hereafter GPTA) for the GGTP. We present two conjectures: (1) when all $p_i, i=1,\ldots,N$ belong to the R-range, GPTA is the optimal procedure among nested procedures applied to $p_i$ of nondecreasing order; (2) if all $p_i, i=1,\ldots,N$ belong to the R-range, GPTA the optimal nested procedure, i.e., minimises the expected total number of tests with respect to all possible testing orders in the class of nested procedures. Although these conjectures are logically reasonable, we were only able to empirically verify the first one up to a particular level of $N$. We also provide a short survey of GGTP.
△ Less
Submitted 27 February, 2020; v1 submitted 3 May, 2018;
originally announced May 2018.
-
Follow Up on Detecting Deficiencies: An Optimal Group Testing Algorithm
Authors:
Yaakov Malinovsky
Abstract:
In a recent volume of Mathematics Magazine (Vol. 90, No. 3, June 2017) there is an interesting article by Seth Zimmerman, titled Detecting Deficiencies: An Optimal Group Testing Algorithm. The claim in the summary is contradictory to well-known facts reported in the group- testing literature, which is easily verified, beginning with the work by Sobel and Groll (1959), which was cited by S. Zimmerm…
▽ More
In a recent volume of Mathematics Magazine (Vol. 90, No. 3, June 2017) there is an interesting article by Seth Zimmerman, titled Detecting Deficiencies: An Optimal Group Testing Algorithm. The claim in the summary is contradictory to well-known facts reported in the group- testing literature, which is easily verified, beginning with the work by Sobel and Groll (1959), which was cited by S. Zimmerman himself. Therefore, I feel compelled to offer a number of comments and clarifications. In addition, I have made some correction of mistaken claim made by Zimmerman (2017).
△ Less
Submitted 27 February, 2018;
originally announced February 2018.
-
On the construction of unbiased estimators for the group testing problem
Authors:
Gregory Haber,
Yaakov Malinovsky
Abstract:
Debiased estimation has long been an area of research in the group testing literature. This has led to the development of several estimators with the goal of bias minimization and, recently, an unbiased estimator based on sequential binomial sampling. Previous research, however, has focused heavily on the simple case where no misclassification is assumed and only one trait is to be tested. In this…
▽ More
Debiased estimation has long been an area of research in the group testing literature. This has led to the development of several estimators with the goal of bias minimization and, recently, an unbiased estimator based on sequential binomial sampling. Previous research, however, has focused heavily on the simple case where no misclassification is assumed and only one trait is to be tested. In this paper, we consider the problem of unbiased estimation in these broader areas, giving constructions of such estimators for several cases. We show that, outside of the standard case addressed previously in the literature, it is impossible to find any proper unbiased estimator, that is, an estimator giving only values in the parameter space. This is shown to hold generally under any binomial or multinomial sampling plans
△ Less
Submitted 7 June, 2018; v1 submitted 31 January, 2018;
originally announced January 2018.
-
On optimal policy in the group testing with incomplete identification
Authors:
Yaakov Malinovsky
Abstract:
Consider a very large (infinite) population of items, where each item independent from the others is defective with probability p, or good with probability q=1-p. The goal is to identify N good items as quickly as possible. The following group testing policy (policy A) is considered: test items together in the groups, if the test outcome of group i of size n_i is negative, then accept all items in…
▽ More
Consider a very large (infinite) population of items, where each item independent from the others is defective with probability p, or good with probability q=1-p. The goal is to identify N good items as quickly as possible. The following group testing policy (policy A) is considered: test items together in the groups, if the test outcome of group i of size n_i is negative, then accept all items in this group as good, otherwise discard the group. Then, move to the next group and continue until exact N good items are found. The goal is to find an optimal testing configuration, i.e., group sizes, under policy A, such that the expected waiting time to obtain N good items is minimal. Recently, Gusev (2012) found an optimal group testing configuration under the assumptions of constant group size and N=\infty. In this note, an optimal solution under policy A for finite N is provided. Keywords: Dynamic programming; Optimal design; Partition problem; Shur-convexity
△ Less
Submitted 14 April, 2018; v1 submitted 8 December, 2017;
originally announced December 2017.
-
Sterrett Procedure for the Generalized Group Testing Problem
Authors:
Yaakov Malinovsky
Abstract:
Group testing is a useful method that has broad applications in medicine, engineering, and even in airport security control. Consider a finite population of $N$ items, where item $i$ has a probability $p_i$ to be defective. The goal is to identify all items by means of group testing. This is the generalized group testing problem. The optimum procedure, with respect to the expected total number of…
▽ More
Group testing is a useful method that has broad applications in medicine, engineering, and even in airport security control. Consider a finite population of $N$ items, where item $i$ has a probability $p_i$ to be defective. The goal is to identify all items by means of group testing. This is the generalized group testing problem. The optimum procedure, with respect to the expected total number of tests, is unknown even in case when all $p_i$ are equal. \cite{H1975} proved that an ordered partition (with respect to $p_i$) is the optimal for the Dorfman procedure (procedure $D$), and obtained an optimum solution (i.e., found an optimal partition) by dynamic programming. In this paper, we investigate the Sterrett procedure (procedure $S$). We provide close form expression for the expected total number of tests, which allows us to find the optimum arrangement of the items in the particular group. We also show that an ordered partition is not optimal for the procedure $S$ or even for a slightly modified Dorfman procedure (procedure $D^{\prime}$). This discovery implies that finding an optimal procedure $S$ appears to be a hard computational problem. However, by using an optimal ordered partition for all procedures, we show that procedure $D^{\prime}$ is uniformly better than procedure $D$, and based on numerical comparisons, procedure $S$ is uniformly and significantly better than procedures $D$ and $D^{\prime}$.
△ Less
Submitted 13 April, 2017; v1 submitted 14 September, 2016;
originally announced September 2016.
-
Revisiting nested group testing procedures: new results, comparisons, and robustness
Authors:
Yaakov Malinovsky,
Paul S. Albert
Abstract:
Group testing has its origin in the identification of syphilis in the US army during World War II. Much of the theoretical framework of group testing was developed starting in the late 1950s, with continued work into the 1990s. Recently, with the advent of new laboratory and genetic technologies, there has been an increasing interest in group testing designs for cost saving purposes. In this paper…
▽ More
Group testing has its origin in the identification of syphilis in the US army during World War II. Much of the theoretical framework of group testing was developed starting in the late 1950s, with continued work into the 1990s. Recently, with the advent of new laboratory and genetic technologies, there has been an increasing interest in group testing designs for cost saving purposes. In this paper, we compare different nested designs, including Dorfman, Sterrett and an optimal nested procedure obtained through dynamic programming. To elucidate these comparisons, we develop closed-form expressions for the optimal Sterrett procedure and provide a concise review of the prior literature for other commonly used procedures. We consider designs where the prevalence of disease is known as well as investigate the robustness of these procedures when it is incorrectly assumed. This article provides a technical presentation that will be of interest to researchers as well as from a pedagogical perspective. Supplementary material for this article is available online.
△ Less
Submitted 25 July, 2017; v1 submitted 22 August, 2016;
originally announced August 2016.
-
Prediction of Ordered Random Effects in a Simple Small Area Model
Authors:
Yaakov Malinovsky,
Yosef Rinott
Abstract:
Prediction of a vector of ordered parameters or part of it arises naturally in the context of Small Area Estimation (SAE). For example, one may want to estimate the parameters associated with the top ten areas, the best or worst area, or a certain percentile. We use a simple SAE model to show that estimation of ordered parameters by the corresponding ordered estimates of each area separately doe…
▽ More
Prediction of a vector of ordered parameters or part of it arises naturally in the context of Small Area Estimation (SAE). For example, one may want to estimate the parameters associated with the top ten areas, the best or worst area, or a certain percentile. We use a simple SAE model to show that estimation of ordered parameters by the corresponding ordered estimates of each area separately does not yield good results with respect to MSE. Shrinkage-type predictors, with an appropriate amount of shrinkage for the particular problem of ordered parameters, are considerably better, and their performance is close to that of the optimal predictors, which cannot in general be computed explicitly.
△ Less
Submitted 24 September, 2009;
originally announced September 2009.