-
A Bayesian joint longitudinal-survival model with a latent stochastic process for intensive longitudinal data
Authors:
Madeline R. Abbott,
Walter H. Dempsey,
Inbal Nahum-Shani,
Lindsey N. Potter,
David W. Wetter,
Cho Y. Lam,
Jeremy M. G. Taylor
Abstract:
The availability of mobile health (mHealth) technology has enabled increased collection of intensive longitudinal data (ILD). ILD have potential to capture rapid fluctuations in outcomes that may be associated with changes in the risk of an event. However, existing methods for jointly modeling longitudinal and event-time outcomes are not well-equipped to handle ILD due to the high computational co…
▽ More
The availability of mobile health (mHealth) technology has enabled increased collection of intensive longitudinal data (ILD). ILD have potential to capture rapid fluctuations in outcomes that may be associated with changes in the risk of an event. However, existing methods for jointly modeling longitudinal and event-time outcomes are not well-equipped to handle ILD due to the high computational cost. We propose a joint longitudinal and time-to-event model suitable for analyzing ILD. In this model, we summarize a multivariate longitudinal outcome as a smaller number of time-varying latent factors. These latent factors, which are modeled using an Ornstein-Uhlenbeck stochastic process, capture the risk of a time-to-event outcome in a parametric hazard model. We take a Bayesian approach to fit our joint model and conduct simulations to assess its performance. We use it to analyze data from an mHealth study of smoking cessation. We summarize the longitudinal self-reported intensity of nine emotions as the psychological states of positive and negative affect. These time-varying latent states capture the risk of the first smoking lapse after attempted quit. Understanding factors associated with smoking lapse is of keen interest to smoking cessation researchers.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Data integration methods for micro-randomized trials
Authors:
Easton Huch,
Inbal Nahum-Shani,
Lindsey Potter,
Cho Lam,
David W. Wetter,
Walter Dempsey
Abstract:
Existing statistical methods for the analysis of micro-randomized trials (MRTs) are designed to estimate causal excursion effects using data from a single MRT. In practice, however, researchers can often find previous MRTs that employ similar interventions. In this paper, we develop data integration methods that capitalize on this additional information, leading to statistical efficiency gains. To…
▽ More
Existing statistical methods for the analysis of micro-randomized trials (MRTs) are designed to estimate causal excursion effects using data from a single MRT. In practice, however, researchers can often find previous MRTs that employ similar interventions. In this paper, we develop data integration methods that capitalize on this additional information, leading to statistical efficiency gains. To further increase efficiency, we demonstrate how to combine these approaches according to a generalization of multivariate precision weighting that allows for correlation between estimates, and we show that the resulting meta-estimator possesses an asymptotic optimality property. We illustrate our methods in simulation and in a case study involving two MRTs in the area of smoking cessation.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Selective Inference for Sparse Graphs via Neighborhood Selection
Authors:
Yiling Huang,
Snigdha Panigrahi,
Walter Dempsey
Abstract:
Neighborhood selection is a widely used method used for estimating the support set of sparse precision matrices, which helps determine the conditional dependence structure in undirected graphical models. However, reporting only point estimates for the estimated graph can result in poor replicability without accompanying uncertainty estimates. In fields such as psychology, where the lack of replica…
▽ More
Neighborhood selection is a widely used method used for estimating the support set of sparse precision matrices, which helps determine the conditional dependence structure in undirected graphical models. However, reporting only point estimates for the estimated graph can result in poor replicability without accompanying uncertainty estimates. In fields such as psychology, where the lack of replicability is a major concern, there is a growing need for methods that can address this issue. In this paper, we focus on the Gaussian graphical model. We introduce a selective inference method to attach uncertainty estimates to the selected (nonzero) entries of the precision matrix and decide which of the estimated edges must be included in the graph. Our method provides an exact adjustment for the selection of edges, which when multiplied with the Wishart density of the random matrix, results in valid selective inferences. Through the use of externally added randomization variables, our adjustment is easy to compute, requiring us to calculate the probability of a selection event, that is equivalent to a few sign constraints and that decouples across the nodewise regressions. Through simulations and an application to a mobile health trial designed to study mental health, we demonstrate that our selective inference method results in higher power and improved estimation accuracy.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
A Robust Mixed-Effects Bandit Algorithm for Assessing Mobile Health Interventions
Authors:
Easton K. Huch,
Jieru Shi,
Madeline R. Abbott,
Jessica R. Golbus,
Alexander Moreno,
Walter H. Dempsey
Abstract:
Mobile health leverages personalized, contextually-tailored interventions optimized through bandit and reinforcement learning algorithms. Despite its promise, challenges like participant heterogeneity, nonstationarity, and nonlinearity in rewards hinder algorithm performance. We propose a robust contextual bandit algorithm, termed "DML-TS-NNR", that simultaneously addresses these challenges via (1…
▽ More
Mobile health leverages personalized, contextually-tailored interventions optimized through bandit and reinforcement learning algorithms. Despite its promise, challenges like participant heterogeneity, nonstationarity, and nonlinearity in rewards hinder algorithm performance. We propose a robust contextual bandit algorithm, termed "DML-TS-NNR", that simultaneously addresses these challenges via (1) modeling the differential reward with user- and time-specific incidental parameters, (2) network cohesion penalties, and (3) debiased machine learning for flexible estimation of baseline rewards. We establish a high-probability regret bound that depends solely on the dimension of the differential reward model. This feature enables us to achieve robust regret bounds even when the baseline reward is highly complex. We demonstrate the superior performance of the DML-TS-NNR algorithm in a simulation and two off-policy evaluation studies.
△ Less
Submitted 6 June, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
A Continuous-Time Dynamic Factor Model for Intensive Longitudinal Data Arising from Mobile Health Studies
Authors:
Madeline R. Abbott,
Walter H. Dempsey,
Inbal Nahum-Shani,
Cho Y. Lam,
David W. Wetter,
Jeremy M. G. Taylor
Abstract:
Intensive longitudinal data (ILD) collected in mobile health (mHealth) studies contain rich information on multiple outcomes measured frequently over time that have the potential to capture short-term and long-term dynamics. Motivated by an mHealth study of smoking cessation in which participants self-report the intensity of many emotions multiple times per day, we describe a dynamic factor model…
▽ More
Intensive longitudinal data (ILD) collected in mobile health (mHealth) studies contain rich information on multiple outcomes measured frequently over time that have the potential to capture short-term and long-term dynamics. Motivated by an mHealth study of smoking cessation in which participants self-report the intensity of many emotions multiple times per day, we describe a dynamic factor model that summarizes the ILD as a low-dimensional, interpretable latent process. This model consists of two submodels: (i) a measurement submodel--a factor model--that summarizes the multivariate longitudinal outcome as lower-dimensional latent variables and (ii) a structural submodel--an Ornstein-Uhlenbeck (OU) stochastic process--that captures the temporal dynamics of the multivariate latent process in continuous time. We derive a closed-form likelihood for the marginal distribution of the outcome and the computationally-simpler sparse precision matrix for the OU process. We propose a block coordinate descent algorithm for estimation. Finally, we apply our method to the mHealth data to summarize the dynamics of 18 different emotions as two latent processes. These latent processes are interpreted by behavioral scientists as the psychological constructs of positive and negative affect and are key in understanding vulnerability to lapsing back to tobacco use among smokers attempting to quit.
△ Less
Submitted 20 February, 2024; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Incorporating Auxiliary Variables to Improve the Efficiency of Time-Varying Treatment Effect Estimation
Authors:
Jieru Shi,
Zhenke Wu,
Walter Dempsey
Abstract:
The use of smart devices (e.g., smartphones, smartwatches) and other wearables for context sensing and delivery of digital interventions to improve health outcomes has grown significantly in behavioral and psychiatric studies. Micro-randomized trials (MRTs) are a common experimental design for obtaining data-driven evidence on mobile health (mHealth) intervention effectiveness where each individua…
▽ More
The use of smart devices (e.g., smartphones, smartwatches) and other wearables for context sensing and delivery of digital interventions to improve health outcomes has grown significantly in behavioral and psychiatric studies. Micro-randomized trials (MRTs) are a common experimental design for obtaining data-driven evidence on mobile health (mHealth) intervention effectiveness where each individual is repeatedly randomized to receive treatments over numerous time points. Individual characteristics and the contexts around randomizations are also collected throughout the study, some may be pre-specified as moderators when assessing time-varying causal effect moderation. Moreover, we have access to abundant measurements beyond just the moderators. Our study aims to leverage this auxiliary information to improve causal estimation and better understand the intervention effect. Similar problems have been raised in randomized control trials (RCTs), where extensive literature demonstrates that baseline covariate information can be incorporated to alleviate chance imbalances and increase asymptotic efficiency. However, covariate adjustment in the context of time-varying treatments and repeated measurements, as seen in MRTs, has not been studied. Recognizing the connection to Neyman Orthogonality, we address this gap by introducing an intuitive approach to incorporate auxiliary variables to improve the efficiency of moderated causal excursion effect estimation. The efficiency gain of our approach is proved theoretically and demonstrated through simulation studies and an analysis of data from the Intern Health Study (NeCamp et al., 2020).
△ Less
Submitted 29 March, 2024; v1 submitted 29 June, 2023;
originally announced June 2023.
-
A Meta-Learning Method for Estimation of Causal Excursion Effects to Assess Time-Varying Moderation
Authors:
Jieru Shi,
Walter Dempsey
Abstract:
Twin revolutions in wearable technologies and health interventions delivered by smartphones have greatly increased the accessibility of mobile health (mHealth) interventions. Micro-randomized trials (MRTs) are designed to assess the effectiveness of the mHealth intervention and introduce a novel class of causal estimands called "causal excursion effects." These estimands enable the evaluation of h…
▽ More
Twin revolutions in wearable technologies and health interventions delivered by smartphones have greatly increased the accessibility of mobile health (mHealth) interventions. Micro-randomized trials (MRTs) are designed to assess the effectiveness of the mHealth intervention and introduce a novel class of causal estimands called "causal excursion effects." These estimands enable the evaluation of how intervention effects change over time and are influenced by individual characteristics or context. However, existing analysis methods for causal excursion effects require prespecified features of the observed high-dimensional history to build a working model for a critical nuisance parameter. Machine learning appears ideal for automatic feature construction, but their naive application can lead to bias under model misspecification. To address this issue, this paper revisits the estimation of causal excursion effects from a meta-learner perspective, where the analyst remains agnostic to the supervised learning algorithms used to estimate nuisance parameters. We present the bidirectional asymptotic properties of the proposed estimators and compare them both theoretically and through extensive simulations. The results show relative efficiency gains and support the suggestion of a doubly robust alternative to existing methods. Finally, the proposed methods' practical utilities are demonstrated by analyzing data from a multi-institution cohort of first-year medical residents in the United States (NeCamp et al., 2020).
△ Less
Submitted 26 June, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Exploring the big data paradox for various estimands using vaccination data from the global COVID-19 Trends and Impact Survey (CTIS)
Authors:
Youqi Yang,
Walter Dempsey,
Peisong Han,
Yashwant Deshmukh,
Sylvia Richardson,
Brian Tom,
Bhramar Mukherjee
Abstract:
Selection bias poses a challenge to statistical inference validity in non-probability surveys. This study compared estimates of the first-dose COVID-19 vaccination rates among Indian adults in 2021 from a large non-probability survey, COVID-19 Trends and Impact Survey (CTIS), and a small probability survey, the Center for Voting Options and Trends in Election Research (CVoter), against benchmark d…
▽ More
Selection bias poses a challenge to statistical inference validity in non-probability surveys. This study compared estimates of the first-dose COVID-19 vaccination rates among Indian adults in 2021 from a large non-probability survey, COVID-19 Trends and Impact Survey (CTIS), and a small probability survey, the Center for Voting Options and Trends in Election Research (CVoter), against benchmark data from the COVID Vaccine Intelligence Network (CoWIN). Notably, CTIS exhibits a larger estimation error (0.39) compared to CVoter (0.16). Additionally, we investigated the estimation accuracy of the CTIS when using a relative scale and found a significant increase in the effective sample size by altering the estimand from the overall vaccination rate. These results suggest that the big data paradox can manifest in countries beyond the US and it may not apply to every estimand of interest.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Design of Experiments with Sequential Randomizations on Multiple Timescales: The Hybrid Experimental Design
Authors:
Inbal Nahum-Shani,
John J. Dziak,
Hanna Venera,
Angela F. Pfammatter,
Bonnie Spring,
Walter Dempsey
Abstract:
Psychological interventions, especially those leveraging mobile and wireless technologies, often include multiple components that are delivered and adapted on multiple timescales (e.g., coaching sessions adapted monthly based on clinical progress, combined with motivational messages from a mobile device adapted daily based on the person's daily emotional state). The hybrid experimental design (HED…
▽ More
Psychological interventions, especially those leveraging mobile and wireless technologies, often include multiple components that are delivered and adapted on multiple timescales (e.g., coaching sessions adapted monthly based on clinical progress, combined with motivational messages from a mobile device adapted daily based on the person's daily emotional state). The hybrid experimental design (HED) is a new experimental approach that enables researchers to answer scientific questions about the construction of psychological interventions in which components are delivered and adapted on different timescales. These designs involve sequential randomizations of study participants to intervention components, each at an appropriate timescale (e.g., monthly randomization to different intensities of coaching sessions and daily randomization to different forms of motivational messages). The goal of the current manuscript is twofold. The first is to highlight the flexibility of the HED by conceptualizing this experimental approach as a special form of a factorial design in which different factors are introduced at multiple timescales. We also discuss how the structure of the HED can vary depending on the scientific question(s) motivating the study. The second goal is to explain how data from various types of HEDs can be analyzed to answer a variety of scientific questions about the development of multi-component psychological interventions. For illustration we use a completed HED to inform the development of a technology-based weight loss intervention that integrates components that are delivered and adapted on multiple timescales.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
Estimating Time-Varying Direct and Indirect Causal Excursion Effects with Longitudinal Binary Outcomes
Authors:
Jieru Shi,
Zhenke Wu,
Walter Dempsey
Abstract:
Construction of just-in-time adaptive interventions, such as prompts delivered by mobile apps to promote and maintain behavioral change, requires knowledge about time-varying moderated effects to inform when and how we deliver intervention options. Micro-randomized trials (MRT) have emerged as a sequentially randomized design to gather requisite data for effect estimation. The existing literature…
▽ More
Construction of just-in-time adaptive interventions, such as prompts delivered by mobile apps to promote and maintain behavioral change, requires knowledge about time-varying moderated effects to inform when and how we deliver intervention options. Micro-randomized trials (MRT) have emerged as a sequentially randomized design to gather requisite data for effect estimation. The existing literature (Qian et al., 2020; Boruvka et al., 2018; Dempsey et al., 2020) has defined a general class of causal estimands, referred to as "causal excursion effects", to assess the time-varying moderated effect. However, there is limited statistical literature on how to address potential between-cluster treatment effect heterogeneity and within-cluster interference in a sequential treatment setting for longitudinal binary outcomes. In this paper, based on a cluster conceptualization of the potential outcomes, we define a larger class of direct and indirect causal excursion effects for proximal and lagged binary outcomes, and propose a new inferential procedure that addresses effect heterogeneity and interference. We provide theoretical guarantees of consistency and asymptotic normality of the estimator. Extensive simulation studies confirm our theory empirically and show the proposed procedure provides consistent point estimator and interval estimates with valid coverage. Finally, we analyze a data set from a multi-institution MRT study to assess the time-varying moderated effects of mobile prompts upon binary study engagement outcomes.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
Node-level community detection within edge exchangeable models for interaction processes
Authors:
Yuhua Zhang,
Walter Dempsey
Abstract:
Scientists are increasingly interested in discovering community structure from modern relational data arising on large-scale social networks. While many methods have been proposed for learning community structure, few account for the fact that these modern networks arise from processes of interactions in the population. We introduce block edge exchangeable models (BEEM) for the study of interactio…
▽ More
Scientists are increasingly interested in discovering community structure from modern relational data arising on large-scale social networks. While many methods have been proposed for learning community structure, few account for the fact that these modern networks arise from processes of interactions in the population. We introduce block edge exchangeable models (BEEM) for the study of interaction networks with latent node-level community structure. The block vertex components model (B-VCM) is derived as a canonical example. Several theoretical and practical advantages over traditional vertex-centric approaches are highlighted. In particular, BEEMs allow for sparse degree structure and power-law degree distributions within communities. Our theoretical analysis bounds the misspecification rate of block assignments, while supporting simulations show the properties of the network can be recovered. A computationally tractable Gibbs algorithm is derived. We demonstrate the proposed model using post-comment interaction data from Talklife, a large-scale online peer-to-peer support network, and contrast the learned communities from those using standard algorithms including spectral clustering and degree-correct stochastic block models.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Recurrent event analysis in the presence of real-time high frequency data via random subsampling
Authors:
Walter Dempsey
Abstract:
Digital monitoring studies collect real-time high frequency data via mobile sensors in the subjects' natural environment. This data can be used to model the impact of changes in physiology on recurrent event outcomes such as smoking, drug use, alcohol use, or self-identified moments of suicide ideation. Likelihood calculations for the recurrent event analysis, however, become computationally prohi…
▽ More
Digital monitoring studies collect real-time high frequency data via mobile sensors in the subjects' natural environment. This data can be used to model the impact of changes in physiology on recurrent event outcomes such as smoking, drug use, alcohol use, or self-identified moments of suicide ideation. Likelihood calculations for the recurrent event analysis, however, become computationally prohibitive in this setting. Motivated by this, a random subsampling framework is proposed for computationally efficient, approximate likelihood-based estimation. A subsampling-unbiased estimator for the derivative of the cumulative hazard enters into an approximation of log-likelihood. The estimator has two sources of variation: the first due to the recurrent event model and the second due to subsampling. The latter can be reduced by increasing the sampling rate; however, this leads to increased computational costs. The approximate score equations are equivalent to logistic regression score equations, allowing for standard, "off-the-shelf" software to be used in fitting these models. Simulations demonstrate the method and efficiency-computation trade-off. We end by illustrating our approach using data from a digital monitoring study of suicidal ideation.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Kernel Deformed Exponential Families for Sparse Continuous Attention
Authors:
Alexander Moreno,
Supriya Nagesh,
Zhenke Wu,
Walter Dempsey,
James M. Rehg
Abstract:
Attention mechanisms take an expectation of a data representation with respect to probability weights. This creates summary statistics that focus on important features. Recently, (Martins et al. 2020, 2021) proposed continuous attention mechanisms, focusing on unimodal attention densities from the exponential and deformed exponential families: the latter has sparse support. (Farinhas et al. 2021)…
▽ More
Attention mechanisms take an expectation of a data representation with respect to probability weights. This creates summary statistics that focus on important features. Recently, (Martins et al. 2020, 2021) proposed continuous attention mechanisms, focusing on unimodal attention densities from the exponential and deformed exponential families: the latter has sparse support. (Farinhas et al. 2021) extended this to use Gaussian mixture attention densities, which are a flexible class with dense support. In this paper, we extend this to two general flexible classes: kernel exponential families and our new sparse counterpart kernel deformed exponential families. Theoretically, we show new existence results for both kernel exponential and deformed exponential families, and that the deformed case has similar approximation capabilities to kernel exponential families. Experiments show that kernel deformed exponential families can attend to multiple compact regions of the data domain.
△ Less
Submitted 12 November, 2021; v1 submitted 1 November, 2021;
originally announced November 2021.
-
SMART Binary: Sample Size Calculation for Comparing Adaptive Interventions in SMART studies with Longitudinal Binary Outcomes
Authors:
John J. Dziak,
Daniel Almirall,
Walter Dempsey,
Catherine Stanger,
Inbal Nahum-Shani
Abstract:
Sequential Multiple-Assignment Randomized Trials (SMARTs) play an increasingly important role in psychological and behavioral health research. This experimental approach enables researchers to answer scientific questions about how to sequence and match interventions to the unique, changing needs of individuals. A variety of sample size planning resources for SMART studies have been developed in re…
▽ More
Sequential Multiple-Assignment Randomized Trials (SMARTs) play an increasingly important role in psychological and behavioral health research. This experimental approach enables researchers to answer scientific questions about how to sequence and match interventions to the unique, changing needs of individuals. A variety of sample size planning resources for SMART studies have been developed in recent years; these enable researchers to plan SMARTs for addressing different types of scientific questions. However, relatively limited attention has been given to planning SMARTs with binary (dichotomous) outcomes, which often require higher sample sizes relative to continuous outcomes. Existing resources for estimating sample size requirements for SMARTs with binary outcomes do not consider the potential to improve power by including a baseline measurement and/or multiple repeated outcome measurements. The current paper addresses this issue by providing sample size simulation code and approximate formulas for two-wave repeated measures binary outcomes (i.e., two measurement times for the outcome variable, before and after receiving the intervention). The simulation results agree well with the formulas. We also discuss how to use simulations to calculate power for studies with more than two outcome measurement occasions. The results show that having at least one repeated measurement of the outcome can substantially improve power under certain conditions.
△ Less
Submitted 19 June, 2023; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Assessing Time-Varying Causal Effect Moderation in the Presence of Cluster-Level Treatment Effect Heterogeneity
Authors:
Jieru Shi,
Zhenke Wu,
Walter Dempsey
Abstract:
The micro-randomized trial (MRT) is a sequential randomized experimental design to empirically evaluate the effectiveness of mobile health (mHealth) intervention components that may be delivered at hundreds or thousands of decision points. MRTs have motivated a new class of causal estimands, termed "causal excursion effects", for which semiparametric inference can be conducted via a weighted, cent…
▽ More
The micro-randomized trial (MRT) is a sequential randomized experimental design to empirically evaluate the effectiveness of mobile health (mHealth) intervention components that may be delivered at hundreds or thousands of decision points. MRTs have motivated a new class of causal estimands, termed "causal excursion effects", for which semiparametric inference can be conducted via a weighted, centered least squares criterion (Boruvka et al., 2018). Existing methods assume between-subject independence and non-interference. Deviations from these assumptions often occur. In this paper, causal excursion effects are revisited under potential cluster-level treatment effect heterogeneity and interference, where the treatment effect of interest may depend on cluster-level moderators. Utility of the proposed methods is shown by analyzing data from a multi-institution cohort of first year medical residents in the United States.
△ Less
Submitted 10 December, 2021; v1 submitted 2 February, 2021;
originally announced February 2021.
-
Addressing selection bias and measurement error in COVID-19 case count data using auxiliary information
Authors:
Walter Dempsey
Abstract:
Coronavirus case-count data has influenced government policies and drives most epidemiological forecasts. Limited testing is cited as the key driver behind minimal information on the COVID-19 pandemic. While expanded testing is laudable, measurement error and selection bias are the two greatest problems limiting our understanding of the COVID-19 pandemic; neither can be fully addressed by increase…
▽ More
Coronavirus case-count data has influenced government policies and drives most epidemiological forecasts. Limited testing is cited as the key driver behind minimal information on the COVID-19 pandemic. While expanded testing is laudable, measurement error and selection bias are the two greatest problems limiting our understanding of the COVID-19 pandemic; neither can be fully addressed by increased testing capacity. In this paper, we demonstrate their impact on estimation of point prevalence and the effective reproduction number. We show that estimates based on the millions of molecular tests in the US has the same mean square error as a small simple random sample. To address this, a procedure is presented that combines case-count data and random samples over time to estimate selection propensities based on key covariate information. We then combine these selection propensities with epidemiological forecast models to construct a \emph{doubly robust} estimation method that accounts for both measurement-error and selection bias. This method is then applied to estimate Indiana's active infection prevalence using case-count, hospitalization, and death data with demographic information, a statewide random molecular sample collected from April 25--29th, and Delphi's COVID-19 Trends and Impact Survey. We end with a series of recommendations based on the proposed methodology.
△ Less
Submitted 21 September, 2022; v1 submitted 20 May, 2020;
originally announced May 2020.
-
A Robust Functional EM Algorithm for Incomplete Panel Count Data
Authors:
Alexander Moreno,
Zhenke Wu,
Jamie Yap,
David Wetter,
Cho Lam,
Inbal Nahum-Shani,
Walter Dempsey,
James M. Rehg
Abstract:
Panel count data describes aggregated counts of recurrent events observed at discrete time points. To understand dynamics of health behaviors, the field of quantitative behavioral research has evolved to increasingly rely upon panel count data collected via multiple self reports, for example, about frequencies of smoking using in-the-moment surveys on mobile devices. However, missing reports are c…
▽ More
Panel count data describes aggregated counts of recurrent events observed at discrete time points. To understand dynamics of health behaviors, the field of quantitative behavioral research has evolved to increasingly rely upon panel count data collected via multiple self reports, for example, about frequencies of smoking using in-the-moment surveys on mobile devices. However, missing reports are common and present a major barrier to downstream statistical learning. As a first step, under a missing completely at random assumption (MCAR), we propose a simple yet widely applicable functional EM algorithm to estimate the counting process mean function, which is of central interest to behavioral scientists. The proposed approach wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption. Theoretical analysis of the proposed algorithm provides finite-sample guarantees by expanding parametric EM theory to our general non-parametric setting. We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data. We also discuss useful extensions to address deviations from the MCAR assumption and covariate effects.
△ Less
Submitted 19 June, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.
-
Hierarchical network models for structured exchangeable interaction processes
Authors:
Walter Dempsey,
Brandon Oselio,
Alfred Hero
Abstract:
Network data often arises via a series of structured interactions among a population of constituent elements. E-mail exchanges, for example, have a single sender followed by potentially multiple receivers. Scientific articles, on the other hand, may have multiple subject areas and multiple authors. We introduce hierarchical edge exchangeable models for the study of these structured interaction net…
▽ More
Network data often arises via a series of structured interactions among a population of constituent elements. E-mail exchanges, for example, have a single sender followed by potentially multiple receivers. Scientific articles, on the other hand, may have multiple subject areas and multiple authors. We introduce hierarchical edge exchangeable models for the study of these structured interaction networks. In particular, we introduce the hierarchical vertex components model as a canonical example, which partially pools information via a latent, shared population-level distribution. Theoretical analysis and supporting simulations provide clear model interpretation, and establish global sparsity and power-law degree distribution. A computationally tractable Gibbs algorithm is derived. We demonstrate the model on both the Enron e-mail dataset and an ArXiv dataset, showing goodness of fit of the model via posterior predictive validation.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Exchangeable, Markov multi-state survival process
Authors:
Walter Dempsey
Abstract:
We consider exchangeable Markov multi-state survival processes -- temporal processes taking values over a state-space$\mathcal{S}$ with at least one absorbing failure state $\flat \in \mathcal{S}$ that satisfy natural invariance properties of exchangeability and consistency under subsampling. The set of processes contains many well-known examples from health and epidemiology -- survival, illness-d…
▽ More
We consider exchangeable Markov multi-state survival processes -- temporal processes taking values over a state-space$\mathcal{S}$ with at least one absorbing failure state $\flat \in \mathcal{S}$ that satisfy natural invariance properties of exchangeability and consistency under subsampling. The set of processes contains many well-known examples from health and epidemiology -- survival, illness-death, competing risk, and comorbidity processes; an extension leads to recurrent event processes. We characterize exchangeable Markov multi-state survival processes in both discrete and continuous time. Statistical considerations impose natural constraints on the space of models appropriate for applied work. In particular, we describe constraints arising from the notion of composable systems. We end with an application of the developed models to irregularly sampled and potentially censored multi-state survival data, develo** a Markov chain Monte Carlo algorithm for posterior computation.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
The stratified micro-randomized trial design: sample size considerations for testing nested causal effects of time-varying treatments
Authors:
Walter Dempsey,
Peng Liao,
Santosh Kumar,
Susan A. Murphy
Abstract:
Technological advancements in the field of mobile devices and wearable sensors have helped overcome obstacles in the delivery of care, making it possible to deliver behavioral treatments anytime and anywhere. Increasingly the delivery of these treatments is triggered by predictions of risk or engagement which may have been impacted by prior treatments. Furthermore the treatments are often designed…
▽ More
Technological advancements in the field of mobile devices and wearable sensors have helped overcome obstacles in the delivery of care, making it possible to deliver behavioral treatments anytime and anywhere. Increasingly the delivery of these treatments is triggered by predictions of risk or engagement which may have been impacted by prior treatments. Furthermore the treatments are often designed to have an impact on individuals over a span of time during which subsequent treatments may be provided.
Here we discuss our work on the design of a mobile health smoking cessation experimental study in which two challenges arose. First the randomizations to treatment should occur at times of stress and second the outcome of interest accrues over a period that may include subsequent treatment. To address these challenges we develop the "stratified micro-randomized trial," in which each individual is randomized among treatments at times determined by predictions constructed from outcomes to prior treatment and with randomization probabilities depending on these outcomes. We define both conditional and marginal proximal treatment effects. Depending on the scientific goal these effects may be defined over a period of time during which subsequent treatments may be provided. We develop a primary analysis method and associated sample size formulae for testing these effects.
△ Less
Submitted 9 November, 2017;
originally announced November 2017.
-
Vital variables and survival processes
Authors:
Walter Dempsey,
Peter McCullagh
Abstract:
The focus of a survival study is partly on the distribution of survival times, and partly on the health or quality of life of patients while they live. Health varies over time, and survival is the most basic aspect of health, so the two aspects are closely intertwined. Depending on the nature of the study, a range of variables may be measured; some constant in time, others not; some regarded as re…
▽ More
The focus of a survival study is partly on the distribution of survival times, and partly on the health or quality of life of patients while they live. Health varies over time, and survival is the most basic aspect of health, so the two aspects are closely intertwined. Depending on the nature of the study, a range of variables may be measured; some constant in time, others not; some regarded as responses, others as explanatory risk factors; some directly and personally health-related, others less directly so. This paper begins by classifying variables that may arise in such a setting, emphasizing in particular, the mathematical distinction between vital and non-vital variables. We examine also various types of probabilistic relationships that may exist among variables. Independent evolution is an asymmetric relation, which is intended to encapsulate the notion of one process driving the other; $X$~is a driver of~$Y$ if $X$ evolves independently of the history of~$Y$. This concept arises in several places in the study of survival processes.
△ Less
Submitted 19 January, 2016;
originally announced January 2016.
-
Community detection for interaction networks
Authors:
Harry Crane,
Walter Dempsey
Abstract:
In many applications, it is common practice to obtain a network from interaction counts by thresholding each pairwise count at a prescribed value. Our analysis calls attention to the dependence of certain methods, notably Newman--Girvan modularity, on the choice of threshold. Essentially, the threshold either separates the network into clusters automatically, making the algorithm's job trivial, or…
▽ More
In many applications, it is common practice to obtain a network from interaction counts by thresholding each pairwise count at a prescribed value. Our analysis calls attention to the dependence of certain methods, notably Newman--Girvan modularity, on the choice of threshold. Essentially, the threshold either separates the network into clusters automatically, making the algorithm's job trivial, or erases all structure in the data, rendering clustering impossible. By fitting the original interaction counts as given, we show that minor modifications to classical statistical methods outperform the prevailing approaches for community detection from interaction datasets. We also introduce a new hidden Markov model for inferring community structures that vary over time. We demonstrate each of these features on three real datasets: the karate club dataset, voting data from the U.S.\ Senate (2001--2003), and temporal voting data for the U.S. Supreme Court (1990--2004).
△ Less
Submitted 30 September, 2015;
originally announced September 2015.
-
Atypical scaling behavior persists in real world interaction networks
Authors:
Harry Crane,
Walter Dempsey
Abstract:
Scale-free power law structure describes complex networks derived from a wide range of real world processes. The extensive literature focuses almost exclusively on networks with power law exponent strictly larger than 2, which can be explained by constant vertex growth and preferential attachment. The complementary scale-free behavior in the range between 1 and 2 has been mostly neglected as atypi…
▽ More
Scale-free power law structure describes complex networks derived from a wide range of real world processes. The extensive literature focuses almost exclusively on networks with power law exponent strictly larger than 2, which can be explained by constant vertex growth and preferential attachment. The complementary scale-free behavior in the range between 1 and 2 has been mostly neglected as atypical because there is no known generating mechanism to explain how networks with this property form. However, empirical observations reveal that scaling in this range is an inherent feature of real world networks obtained from repeated interactions within a population, as in social, communication, and collaboration networks. A generative model explains the observed phenomenon through the realistic dynamics of constant edge growth and a positive feedback mechanism. Our investigation, therefore, yields a novel empirical observation grounded in a strong theoretical basis for its occurrence.
△ Less
Submitted 27 September, 2015;
originally announced September 2015.
-
Survival models and health sequences
Authors:
Peter McCullagh,
Walter Dempsey
Abstract:
Medical investigations focusing on patient survival often generate not only a failure time for each patient but also a sequence of measurements on patient health at annual or semi-annual check-ups while the patient remains alive. Such a sequence of random length accompanied by a survival time is called a survival process. Ordinarily robust health is associated with longer survival, so the two part…
▽ More
Medical investigations focusing on patient survival often generate not only a failure time for each patient but also a sequence of measurements on patient health at annual or semi-annual check-ups while the patient remains alive. Such a sequence of random length accompanied by a survival time is called a survival process. Ordinarily robust health is associated with longer survival, so the two parts of a survival process cannot be assumed independent. This paper is concerned with a general technique---time reversal---for constructing statistical models for survival processes. A revival model is a regression model in the sense that it incorporates covariate and treatment effects into both the distribution of survival times and the joint distribution of health outcomes. It also allows individual health outcomes to be used clinically for predicting the subsequent survival time.
△ Less
Submitted 15 January, 2016; v1 submitted 12 January, 2013;
originally announced January 2013.