-
Optimal Dorfman Group Testing for Symmetric Distributions
Authors:
Nicholas C. Landolfi,
Sanjay Lall
Abstract:
We study Dorfman's classical group testing protocol in a novel setting where individual specimen statuses are modeled as exchangeable random variables. We are motivated by infectious disease screening. In that case, specimens which arrive together for testing often originate from the same community and so their statuses may exhibit positive correlation. Dorfman's protocol screens a population of n…
▽ More
We study Dorfman's classical group testing protocol in a novel setting where individual specimen statuses are modeled as exchangeable random variables. We are motivated by infectious disease screening. In that case, specimens which arrive together for testing often originate from the same community and so their statuses may exhibit positive correlation. Dorfman's protocol screens a population of n specimens for a binary trait by partitioning it into non-overlap** groups, testing these, and only individually retesting the specimens of each positive group. The partition is chosen to minimize the expected number of tests under a probabilistic model of specimen statuses. We relax the typical assumption that these are independent and identically distributed and instead model them as exchangeable random variables. In this case, their joint distribution is symmetric in the sense that it is invariant under permutations. We give a characterization of such distributions in terms of a function q where q(h) is the marginal probability that any group of size h tests negative. We use this interpretable representation to show that the set partitioning problem arising in Dorfman's protocol can be reduced to an integer partitioning problem and efficiently solved. We apply these tools to an empirical dataset from the COVID-19 pandemic. The methodology helps explain the unexpectedly high empirical efficiency reported by the original investigators.
△ Less
Submitted 27 February, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Probabilistic Modeling Using Tree Linear Cascades
Authors:
Nicholas C. Landolfi,
Sanjay Lall
Abstract:
We introduce tree linear cascades, a class of linear structural equation models for which the error variables are uncorrelated but need not be Gaussian nor independent. We show that, in spite of this weak assumption, the tree structure of this class of models is identifiable. In a similar vein, we introduce a constrained regression problem for fitting a tree-structured linear structural equation m…
▽ More
We introduce tree linear cascades, a class of linear structural equation models for which the error variables are uncorrelated but need not be Gaussian nor independent. We show that, in spite of this weak assumption, the tree structure of this class of models is identifiable. In a similar vein, we introduce a constrained regression problem for fitting a tree-structured linear structural equation model and solve the problem analytically. We connect these results to the classical Chow-Liu approach for Gaussian graphical models. We conclude by giving an empirical-risk form of the regression and illustrating the computationally attractive implications of our theoretical results on a basic example involving stock prices.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Shape-Based Approach to Household Load Curve Clustering and Prediction
Authors:
Thanchanok Teeraratkul,
Daniel O'Neill,
Sanjay Lall
Abstract:
Consumer Demand Response (DR) is an important research and industry problem, which seeks to categorize, predict and modify consumer's energy consumption. Unfortunately, traditional clustering methods have resulted in many hundreds of clusters, with a given consumer often associated with several clusters, making it difficult to classify consumers into stable representative groups and to predict ind…
▽ More
Consumer Demand Response (DR) is an important research and industry problem, which seeks to categorize, predict and modify consumer's energy consumption. Unfortunately, traditional clustering methods have resulted in many hundreds of clusters, with a given consumer often associated with several clusters, making it difficult to classify consumers into stable representative groups and to predict individual energy consumption patterns. In this paper, we present a shape-based approach that better classifies and predicts consumer energy consumption behavior at the household level. The method is based on Dynamic Time War**. DTW seeks an optimal alignment between energy consumption patterns reflecting the effect of hidden patterns of regular consumer behavior. Using real consumer 24-hour load curves from Opower Corporation, our method results in a 50% reduction in the number of representative groups and an improvement in prediction accuracy measured under DTW distance. We extend the approach to estimate which electrical devices will be used and in which hours.
△ Less
Submitted 5 February, 2017;
originally announced February 2017.