-
Individual-level Modeling of COVID-19 Epidemic Risk
Authors:
Andres Colubri,
Kailash Yadav,
Abhishek Jha,
Pardis C. Sabeti
Abstract:
The ongoing COVID-19 pandemic calls for a multi-faceted public health response comprising complementary interventions to control the spread of the disease while vaccines and therapies are developed. Many of these interventions need to be informed by epidemic risk predictions given available data, including symptoms, contact patterns, and environmental factors. Here we propose a novel probabilistic…
▽ More
The ongoing COVID-19 pandemic calls for a multi-faceted public health response comprising complementary interventions to control the spread of the disease while vaccines and therapies are developed. Many of these interventions need to be informed by epidemic risk predictions given available data, including symptoms, contact patterns, and environmental factors. Here we propose a novel probabilistic formalism based on Individual-Level Models (ILMs) that offers rigorous formulas for the probability of infection of individuals, which can be parameterised via Maximum Likelihood Estimation (MLE) applied on compartmental models defined at the population level. We describe an approach where individual data collected in real-time is integrated with overall case counts to update the a predictor of the susceptibility of infection as a function of individual risk factors.
△ Less
Submitted 20 August, 2020; v1 submitted 28 June, 2020;
originally announced June 2020.
-
Blind Channel Estimation for Massive MIMO: A Deep Learning Assisted Approach
Authors:
Parna Sabeti,
Arman Farhang,
Irene Macaluso,
Nicola Marchetti,
Linda Doyle
Abstract:
Large scale multiple-input multiple-output (MIMO) or Massive MIMO is one of the pivotal technologies for future wireless networks. However, the performance of massive MIMO systems heavily relies on accurate channel estimation. While the acquisition of channel state information (CSI) in such systems requires an increasingly large amount of training overhead as the number of users grows. To tackle t…
▽ More
Large scale multiple-input multiple-output (MIMO) or Massive MIMO is one of the pivotal technologies for future wireless networks. However, the performance of massive MIMO systems heavily relies on accurate channel estimation. While the acquisition of channel state information (CSI) in such systems requires an increasingly large amount of training overhead as the number of users grows. To tackle this issue, in this paper, we propose a deep learning assisted blind channel estimation technique for orthogonal frequency division multiplexing (OFDM) based massive MIMO systems. We prove that by exploiting the asymptotic orthogonality of the massive MIMO channels, the channel distortion can be averaged out without the prior knowledge of channel impulse responses, and after some mathematical manipulation, different users transmitted data symbols can be extracted. Thus, by deploying a denoising convolutional neural network algorithm (DnCNN), we mitigate a remaining channel and noise effect to accurately detect the transmitted data symbols at the channel sounding stage. Using the detected data symbols as virtual pilots, we estimate the CSI of all the users at each BS antennas. Our simulation results testify the efficacy of our proposed technique and demonstrate that it can provide a mean square error (MSE) performance which coincides with that of the data-aided channel estimation technique.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
An Empirical Study of Leading Measures of Dependence
Authors:
David N. Reshef,
Yakir A. Reshef,
Pardis C. Sabeti,
Michael M. Mitzenmacher
Abstract:
In exploratory data analysis, we are often interested in identifying promising pairwise associations for further analysis while filtering out weaker, less interesting ones. This can be accomplished by computing a measure of dependence on all variable pairs and examining the highest-scoring pairs, provided the measure of dependence used assigns similar scores to equally noisy relationships of diffe…
▽ More
In exploratory data analysis, we are often interested in identifying promising pairwise associations for further analysis while filtering out weaker, less interesting ones. This can be accomplished by computing a measure of dependence on all variable pairs and examining the highest-scoring pairs, provided the measure of dependence used assigns similar scores to equally noisy relationships of different types. This property, called equitability, is formalized in Reshef et al. [2015b]. In addition to equitability, measures of dependence can also be assessed by the power of their corresponding independence tests as well as their runtime.
Here we present extensive empirical evaluation of the equitability, power against independence, and runtime of several leading measures of dependence. These include two statistics introduced in Reshef et al. [2015a]: MICe, which has equitability as its primary goal, and TICe, which has power against independence as its goal. Regarding equitability, our analysis finds that MICe is the most equitable method on functional relationships in most of the settings we considered, although mutual information estimation proves the most equitable at large sample sizes in some specific settings. Regarding power against independence, we find that TICe, along with Heller and Gorfine's S^DDP, is the state of the art on the relationships we tested. Our analyses also show a trade-off between power against independence and equitability consistent with the theory in Reshef et al. [2015b]. In terms of runtime, MICe and TICe are significantly faster than many other measures of dependence tested, and computing either one makes computing the other trivial. This suggests that a fast and useful strategy for achieving a combination of power against independence and equitability may be to filter relationships by TICe and then to examine the MICe of only the significant ones.
△ Less
Submitted 12 May, 2015; v1 submitted 8 May, 2015;
originally announced May 2015.
-
Measuring dependence powerfully and equitably
Authors:
Yakir A. Reshef,
David N. Reshef,
Hilary K. Finucane,
Pardis C. Sabeti,
Michael M. Mitzenmacher
Abstract:
Given a high-dimensional data set we often wish to find the strongest relationships within it. A common strategy is to evaluate a measure of dependence on every variable pair and retain the highest-scoring pairs for follow-up. This strategy works well if the statistic used is equitable [Reshef et al. 2015a], i.e., if, for some measure of noise, it assigns similar scores to equally noisy relationsh…
▽ More
Given a high-dimensional data set we often wish to find the strongest relationships within it. A common strategy is to evaluate a measure of dependence on every variable pair and retain the highest-scoring pairs for follow-up. This strategy works well if the statistic used is equitable [Reshef et al. 2015a], i.e., if, for some measure of noise, it assigns similar scores to equally noisy relationships regardless of relationship type (e.g., linear, exponential, periodic).
In this paper, we introduce and characterize a population measure of dependence called MIC*. We show three ways that MIC* can be viewed: as the population value of MIC, a highly equitable statistic from [Reshef et al. 2011], as a canonical "smoothing" of mutual information, and as the supremum of an infinite sequence defined in terms of optimal one-dimensional partitions of the marginals of the joint distribution. Based on this theory, we introduce an efficient approach for computing MIC* from the density of a pair of random variables, and we define a new consistent estimator MICe for MIC* that is efficiently computable. In contrast, there is no known polynomial-time algorithm for computing the original equitable statistic MIC. We show through simulations that MICe has better bias-variance properties than MIC. We then introduce and prove the consistency of a second statistic, TICe, that is a trivial side-product of the computation of MICe and whose goal is powerful independence testing rather than equitability.
We show in simulations that MICe and TICe have good equitability and power against independence respectively. The analyses here complement a more in-depth empirical evaluation of several leading measures of dependence [Reshef et al. 2015b] that shows state-of-the-art performance for MICe and TICe.
△ Less
Submitted 30 August, 2021; v1 submitted 8 May, 2015;
originally announced May 2015.
-
Equitability, interval estimation, and statistical power
Authors:
Yakir A. Reshef,
David N. Reshef,
Pardis C. Sabeti,
Michael M. Mitzenmacher
Abstract:
For analysis of a high-dimensional dataset, a common approach is to test a null hypothesis of statistical independence on all variable pairs using a non-parametric measure of dependence. However, because this approach attempts to identify any non-trivial relationship no matter how weak, it often identifies too many relationships to be useful. What is needed is a way of identifying a smaller set of…
▽ More
For analysis of a high-dimensional dataset, a common approach is to test a null hypothesis of statistical independence on all variable pairs using a non-parametric measure of dependence. However, because this approach attempts to identify any non-trivial relationship no matter how weak, it often identifies too many relationships to be useful. What is needed is a way of identifying a smaller set of relationships that merit detailed further analysis.
Here we formally present and characterize equitability, a property of measures of dependence that aims to overcome this challenge. Notionally, an equitable statistic is a statistic that, given some measure of noise, assigns similar scores to equally noisy relationships of different types [Reshef et al. 2011]. We begin by formalizing this idea via a new object called the interpretable interval, which functions as an interval estimate of the amount of noise in a relationship of unknown type. We define an equitable statistic as one with small interpretable intervals.
We then draw on the equivalence of interval estimation and hypothesis testing to show that under moderate assumptions an equitable statistic is one that yields well powered tests for distinguishing not only between trivial and non-trivial relationships of all kinds but also between non-trivial relationships of different strengths. This means that equitability allows us to specify a threshold relationship strength $x_0$ and to search for relationships of all kinds with strength greater than $x_0$. Thus, equitability can be thought of as a strengthening of power against independence that enables fruitful analysis of data sets with a small number of strong, interesting relationships and a large number of weaker ones. We conclude with a demonstration of how our two equivalent characterizations of equitability can be used to evaluate the equitability of a statistic in practice.
△ Less
Submitted 12 May, 2015; v1 submitted 8 May, 2015;
originally announced May 2015.
-
Theoretical Foundations of Equitability and the Maximal Information Coefficient
Authors:
Yakir A. Reshef,
David N. Reshef,
Pardis C. Sabeti,
Michael Mitzenmacher
Abstract:
The maximal information coefficient (MIC) is a tool for finding the strongest pairwise relationships in a data set with many variables (Reshef et al., 2011). MIC is useful because it gives similar scores to equally noisy relationships of different types. This property, called {\em equitability}, is important for analyzing high-dimensional data sets.
Here we formalize the theory behind both equit…
▽ More
The maximal information coefficient (MIC) is a tool for finding the strongest pairwise relationships in a data set with many variables (Reshef et al., 2011). MIC is useful because it gives similar scores to equally noisy relationships of different types. This property, called {\em equitability}, is important for analyzing high-dimensional data sets.
Here we formalize the theory behind both equitability and MIC in the language of estimation theory. This formalization has a number of advantages. First, it allows us to show that equitability is a generalization of power against statistical independence. Second, it allows us to compute and discuss the population value of MIC, which we call MIC_*. In doing so we generalize and strengthen the mathematical results proven in Reshef et al. (2011) and clarify the relationship between MIC and mutual information. Introducing MIC_* also enables us to reason about the properties of MIC more abstractly: for instance, we show that MIC_* is continuous and that there is a sense in which it is a canonical "smoothing" of mutual information. We also prove an alternate, equivalent characterization of MIC_* that we use to state new estimators of it as well as an algorithm for explicitly computing it when the joint probability density function of a pair of random variables is known. Our hope is that this paper provides a richer theoretical foundation for MIC and equitability going forward.
This paper will be accompanied by a forthcoming companion paper that performs extensive empirical analysis and comparison to other methods and discusses the practical aspects of both equitability and the use of MIC and its related statistics.
△ Less
Submitted 12 May, 2015; v1 submitted 21 August, 2014;
originally announced August 2014.
-
Equitability Analysis of the Maximal Information Coefficient, with Comparisons
Authors:
David Reshef,
Yakir Reshef,
Michael Mitzenmacher,
Pardis Sabeti
Abstract:
A measure of dependence is said to be equitable if it gives similar scores to equally noisy relationships of different types. Equitability is important in data exploration when the goal is to identify a relatively small set of strongest associations within a dataset as opposed to finding as many non-zero associations as possible, which often are too many to sift through. Thus an equitable statisti…
▽ More
A measure of dependence is said to be equitable if it gives similar scores to equally noisy relationships of different types. Equitability is important in data exploration when the goal is to identify a relatively small set of strongest associations within a dataset as opposed to finding as many non-zero associations as possible, which often are too many to sift through. Thus an equitable statistic, such as the maximal information coefficient (MIC), can be useful for analyzing high-dimensional data sets. Here, we explore both equitability and the properties of MIC, and discuss several aspects of the theory and practice of MIC. We begin by presenting an intuition behind the equitability of MIC through the exploration of the maximization and normalization steps in its definition. We then examine the speed and optimality of the approximation algorithm used to compute MIC, and suggest some directions for improving both. Finally, we demonstrate in a range of noise models and sample sizes that MIC is more equitable than natural alternatives, such as mutual information estimation and distance correlation.
△ Less
Submitted 14 August, 2013; v1 submitted 26 January, 2013;
originally announced January 2013.