-
The Fallacy of Minimizing Local Regret in the Sequential Task Setting
Authors:
Zi** Xu,
Kelly W. Zhang,
Susan A. Murphy
Abstract:
In the realm of Reinforcement Learning (RL), online RL is often conceptualized as an optimization problem, where an algorithm interacts with an unknown environment to minimize cumulative regret. In a stationary setting, strong theoretical guarantees, like a sublinear ($\sqrt{T}$) regret bound, can be obtained, which typically implies the convergence to an optimal policy and the cessation of explor…
▽ More
In the realm of Reinforcement Learning (RL), online RL is often conceptualized as an optimization problem, where an algorithm interacts with an unknown environment to minimize cumulative regret. In a stationary setting, strong theoretical guarantees, like a sublinear ($\sqrt{T}$) regret bound, can be obtained, which typically implies the convergence to an optimal policy and the cessation of exploration. However, these theoretical setups often oversimplify the complexities encountered in real-world RL implementations, where tasks arrive sequentially with substantial changes between tasks and the algorithm may not be allowed to adaptively learn within certain tasks. We study the changes beyond the outcome distributions, encompassing changes in the reward designs (map**s from outcomes to rewards) and the permissible policy spaces. Our results reveal the fallacy of myopically minimizing regret within each task: obtaining optimal regret rates in the early tasks may lead to worse rates in the subsequent ones, even when the outcome distributions stay the same. To realize the optimal cumulative regret bound across all the tasks, the algorithm has to overly explore in the earlier tasks. This theoretical insight is practically significant, suggesting that due to unanticipated changes (e.g., rapid technological development or human-in-the-loop involvement) between tasks, the algorithm needs to explore more than it would in the usual stationary setting within each task. Such implication resonates with the common practice of using clipped policies in mobile health clinical trials and maintaining a fixed rate of $ε$-greedy exploration in robotic learning.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Dyadic Reinforcement Learning
Authors:
Shuangning Li,
Lluis Salvat Niell,
Sung Won Choi,
Inbal Nahum-Shani,
Guy Shani,
Susan Murphy
Abstract:
Mobile health aims to enhance health outcomes by delivering interventions to individuals as they go about their daily life. The involvement of care partners and social support networks often proves crucial in hel** individuals managing burdensome medical conditions. This presents opportunities in mobile health to design interventions that target the dyadic relationship -- the relationship betwee…
▽ More
Mobile health aims to enhance health outcomes by delivering interventions to individuals as they go about their daily life. The involvement of care partners and social support networks often proves crucial in hel** individuals managing burdensome medical conditions. This presents opportunities in mobile health to design interventions that target the dyadic relationship -- the relationship between a target person and their care partner -- with the aim of enhancing social support. In this paper, we develop dyadic RL, an online reinforcement learning algorithm designed to personalize intervention delivery based on contextual factors and past responses of a target person and their care partner. Here, multiple sets of interventions impact the dyad across multiple time intervals. The developed dyadic RL is Bayesian and hierarchical. We formally introduce the problem setup, develop dyadic RL and establish a regret bound. We demonstrate dyadic RL's empirical performance through simulation studies on both toy scenarios and on a realistic test bed constructed from data collected in a mobile health study.
△ Less
Submitted 1 November, 2023; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Online learning in bandits with predicted context
Authors:
Yongyi Guo,
Zi** Xu,
Susan Murphy
Abstract:
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications where the true context for decision-making is unobserved, and only a prediction of the context by a potentially complex machine learning algorithm is available.…
▽ More
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications where the true context for decision-making is unobserved, and only a prediction of the context by a potentially complex machine learning algorithm is available. When the context error is non-vanishing, classical bandit algorithms fail to achieve sublinear regret. We propose the first online algorithm in this setting with sublinear regret guarantees under mild conditions. The key idea is to extend the measurement error model in classical statistics to the online decision-making setting, which is nontrivial due to the policy being dependent on the noisy context observations. We further demonstrate the benefits of the proposed approach in simulation environments based on synthetic and real digital intervention datasets.
△ Less
Submitted 17 March, 2024; v1 submitted 25 July, 2023;
originally announced July 2023.
-
The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning
Authors:
Sarah Rathnam,
Sonali Parbhoo,
Weiwei Pan,
Susan A. Murphy,
Finale Doshi-Velez
Abstract:
Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of d…
▽ More
Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific method across simple empirical examples as well as a medical cancer simulator.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Effect-Invariant Mechanisms for Policy Generalization
Authors:
Sorawit Saengkyongam,
Niklas Pfister,
Predrag Klasnja,
Susan Murphy,
Jonas Peters
Abstract:
Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call f…
▽ More
Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach.
△ Less
Submitted 27 June, 2023; v1 submitted 19 June, 2023;
originally announced June 2023.
-
Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling
Authors:
Susobhan Ghosh,
Raphael Kim,
Prasidh Chhabria,
Raaz Dwivedi,
Predrag Klasnja,
Peng Liao,
Kelly Zhang,
Susan Murphy
Abstract:
There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this pro…
▽ More
There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an ``optimized'' intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.
△ Less
Submitted 7 August, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Doubly robust nearest neighbors in factor models
Authors:
Raaz Dwivedi,
Katherine Tian,
Sabina Tomkins,
Predrag Klasnja,
Susan Murphy,
Devavrat Shah
Abstract:
We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for esti…
▽ More
We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for estimating the mean $f(u_i, v_t)$ relies on existence of other rows $j$ with $u_j \approx u_i$. Similarly, time-time NN strategy relies on existence of columns $t'$ with $v_{t'} \approx v_t$. These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both good row and good column neighbors exist, it provides a (near-)quadratic improvement in the non-asymptotic error and admits a significantly narrower asymptotic confidence interval when compared to both unit-unit or time-time NN.
△ Less
Submitted 29 January, 2024; v1 submitted 25 November, 2022;
originally announced November 2022.
-
Estimating causal effects with optimization-based methods: A review and empirical comparison
Authors:
Martin Cousineau,
Vedat Verter,
Susan A. Murphy,
Joelle Pineau
Abstract:
In the absence of randomized controlled and natural experiments, it is necessary to balance the distributions of (observable) covariates of the treated and control groups in order to obtain an unbiased estimate of a causal effect of interest; otherwise, a different effect size may be estimated, and incorrect recommendations may be given. To achieve this balance, there exist a wide variety of metho…
▽ More
In the absence of randomized controlled and natural experiments, it is necessary to balance the distributions of (observable) covariates of the treated and control groups in order to obtain an unbiased estimate of a causal effect of interest; otherwise, a different effect size may be estimated, and incorrect recommendations may be given. To achieve this balance, there exist a wide variety of methods. In particular, several methods based on optimization models have been recently proposed in the causal inference literature. While these optimization-based methods empirically showed an improvement over a limited number of other causal inference methods in their relative ability to balance the distributions of covariates and to estimate causal effects, they have not been thoroughly compared to each other and to other noteworthy causal inference methods. In addition, we believe that there exist several unaddressed opportunities that operational researchers could contribute with their advanced knowledge of optimization, for the benefits of the applied researchers that use causal inference tools. In this review paper, we present an overview of the causal inference literature and describe in more detail the optimization-based causal inference methods, provide a comparative analysis of the prevailing optimization-based methods, and discuss opportunities for new methods.
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
Statistical Inference After Adaptive Sampling for Longitudinal Data
Authors:
Kelly W. Zhang,
Lucas Janson,
Susan A. Murphy
Abstract:
Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or "p…
▽ More
Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or "pooling" data across users allows adaptive sampling algorithms to potentially learn faster. However, by pooling, these algorithms induce dependence between the sampled user data trajectories; we show that this can cause standard variance estimators for i.i.d. data to underestimate the true variance of common estimators on this data type. We develop novel methods to perform a variety of statistical analyses on such adaptively sampled data via Z-estimation. Specifically, we introduce the \textit{adaptive} sandwich variance estimator, a corrected sandwich estimator that leads to consistent variance estimates under adaptive sampling. Additionally, to prove our results we develop novel theoretical tools for empirical processes on non-i.i.d., adaptively sampled longitudinal data which may be of independent interest. This work is motivated by our efforts in designing experiments in which online reinforcement learning algorithms optimize treatment decisions, yet statistical inference is essential for conducting analyses after experiments conclude.
△ Less
Submitted 19 April, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Counterfactual inference for sequential experiments
Authors:
Raaz Dwivedi,
Katherine Tian,
Sabina Tomkins,
Predrag Klasnja,
Susan Murphy,
Devavrat Shah
Abstract:
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumpt…
▽ More
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$ together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.
△ Less
Submitted 16 April, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning
Authors:
Sarah Rathnam,
Susan A. Murphy,
Finale Doshi-Velez
Abstract:
In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to c…
▽ More
In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
Online structural kernel selection for mobile health
Authors:
Eura Shin,
Pedja Klasnja,
Susan Murphy,
Finale Doshi-Velez
Abstract:
Motivated by the need for efficient and personalized learning in mobile health, we investigate the problem of online kernel selection for Gaussian Process regression in the multi-task setting. We propose a novel generative process on the kernel composition for this purpose. Our method demonstrates that trajectories of kernel evolutions can be transferred between users to improve learning and that…
▽ More
Motivated by the need for efficient and personalized learning in mobile health, we investigate the problem of online kernel selection for Gaussian Process regression in the multi-task setting. We propose a novel generative process on the kernel composition for this purpose. Our method demonstrates that trajectories of kernel evolutions can be transferred between users to improve learning and that the kernels themselves are meaningful for an mHealth prediction goal.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
The Micro-Randomized Trial for Develo** Digital Interventions: Experimental Design and Data Analysis Considerations
Authors:
Tianchen Qian,
Ashley E. Walton,
Linda M. Collins,
Predrag Klasnja,
Stephanie T. Lanza,
Inbal Nahum-Shani,
Mashifiqui Rabbi,
Michael A. Russell,
Maureen A. Walton,
Hyesun Yoo,
Susan A. Murphy
Abstract:
Just-in-time adaptive interventions (JITAIs) are time-varying adaptive interventions that use frequent opportunities for the intervention to be adapted--weekly, daily, or even many times a day. The micro-randomized trial (MRT) has emerged for use in informing the construction of JITAIs. MRTs can be used to address research questions about whether and under what circumstances JITAI components are e…
▽ More
Just-in-time adaptive interventions (JITAIs) are time-varying adaptive interventions that use frequent opportunities for the intervention to be adapted--weekly, daily, or even many times a day. The micro-randomized trial (MRT) has emerged for use in informing the construction of JITAIs. MRTs can be used to address research questions about whether and under what circumstances JITAI components are effective, with the ultimate objective of develo** effective and efficient JITAI. The purpose of this article is to clarify why, when, and how to use MRTs; to highlight elements that must be considered when designing and implementing an MRT; and to review primary and secondary analyses methods for MRTs. We briefly review key elements of JITAIs and discuss a variety of considerations that go into planning and designing an MRT. We provide a definition of causal excursion effects suitable for use in primary and secondary analyses of MRT data to inform JITAI development. We review the weighted and centered least-squares (WCLS) estimator which provides consistent causal excursion effect estimators from MRT data. We describe how the WCLS estimator along with associated test statistics can be obtained using standard statistical software such as R (R Core Team, 2019). Throughout we illustrate the MRT design and analyses using the HeartSteps MRT, for develo** a JITAI to increase physical activity among sedentary individuals. We supplement the HeartSteps MRT with two other MRTs, SARA and BariFit, each of which highlights different research questions that can be addressed using the MRT and experimental design considerations that might arise.
△ Less
Submitted 25 November, 2021; v1 submitted 7 July, 2021;
originally announced July 2021.
-
Fast Physical Activity Suggestions: Efficient Hyperparameter Learning in Mobile Health
Authors:
Marianne Menictas,
Sabina Tomkins,
Susan Murphy
Abstract:
Users can be supported to adopt healthy behaviors, such as regular physical activity, via relevant and timely suggestions on their mobile devices. Recently, reinforcement learning algorithms have been found to be effective for learning the optimal context under which to provide suggestions. However, these algorithms are not necessarily designed for the constraints posed by mobile health (mHealth)…
▽ More
Users can be supported to adopt healthy behaviors, such as regular physical activity, via relevant and timely suggestions on their mobile devices. Recently, reinforcement learning algorithms have been found to be effective for learning the optimal context under which to provide suggestions. However, these algorithms are not necessarily designed for the constraints posed by mobile health (mHealth) settings, that they be efficient, domain-informed and computationally affordable. We propose an algorithm for providing physical activity suggestions in mHealth settings. Using domain-science, we formulate a contextual bandit algorithm which makes use of a linear mixed effects model. We then introduce a procedure to efficiently perform hyper-parameter updating, using far less computational resources than competing approaches. Not only is our approach computationally efficient, it is also easily implemented with closed form matrix algebraic updates and we show improvements over state of the art approaches both in speed and accuracy of up to 99% and 56% respectively.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
Individualized Prediction of COVID-19 Adverse outcomes with MLHO
Authors:
Hossein Estiri,
Zachary H. Strasser,
Shawn N. Murphy
Abstract:
We developed MLHO (pronounced as melo), an end-to-end Machine Learning framework that leverages iterative feature and algorithm selection to predict Health Outcomes. MLHO implements iterative sequential representation mining, and feature and model selection, for predicting the patient-level risk of hospitalization, ICU admission, need for mechanical ventilation, and death. It bases this prediction…
▽ More
We developed MLHO (pronounced as melo), an end-to-end Machine Learning framework that leverages iterative feature and algorithm selection to predict Health Outcomes. MLHO implements iterative sequential representation mining, and feature and model selection, for predicting the patient-level risk of hospitalization, ICU admission, need for mechanical ventilation, and death. It bases this prediction on data from patients' past medical records (before their COVID-19 infection). MLHO's architecture enables a parallel and outcome-oriented model calibration, in which different statistical learning algorithms and vectors of features are simultaneously tested to improve the prediction of health outcomes. Using clinical and demographic data from a large cohort of over 13,000 COVID-19-positive patients, we modeled the four adverse outcomes utilizing about 600 features representing patients' pre-COVID health records and demographics. The mean AUC ROC for mortality prediction was 0.91, while the prediction performance ranged between 0.80 and 0.81 for the ICU, hospitalization, and ventilation. We broadly describe the clusters of features that were utilized in modeling and their relative influence for predicting each outcome. Our results demonstrated that while demographic variables (namely age) are important predictors of adverse outcomes after a COVID-19 infection, the incorporation of the past clinical records are vital for a reliable prediction model. As the COVID-19 pandemic unfolds around the world, adaptable and interpretable machine learning frameworks (like MLHO) are crucial to improve our readiness for confronting the potential future waves of COVID-19, as well as other novel infectious diseases that may emerge.
△ Less
Submitted 29 December, 2020; v1 submitted 9 August, 2020;
originally announced August 2020.
-
IntelligentPooling: Practical Thompson Sampling for mHealth
Authors:
Sabina Tomkins,
Peng Liao,
Predrag Klasnja,
Susan Murphy
Abstract:
In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of hel** the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobi…
▽ More
In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of hel** the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobile healthcare setting. In this work we are concerned with the following challenges: 1) individuals who are in the same context can exhibit differential response to treatments 2) only a limited amount of data is available for learning on any one individual, and 3) non-stationary responses to treatment. To address these challenges we generalize Thompson-Sampling bandit algorithms to develop IntelligentPooling. IntelligentPooling learns personalized treatment policies thus addressing challenge one. To address the second challenge, IntelligentPooling updates each user's degree of personalization while making use of available data on other users to speed up learning. Lastly, IntelligentPooling allows responsivity to vary as a function of a user's time since beginning treatment, thus addressing challenge three. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art. We demonstrate the promise of this approach and its ability to learn from even a small group of users in a live clinical trial.
△ Less
Submitted 12 December, 2020; v1 submitted 31 July, 2020;
originally announced August 2020.
-
Batch Policy Learning in Average Reward Markov Decision Processes
Authors:
Peng Liao,
Zhengling Qi,
Runzhe Wan,
Predrag Klasnja,
Susan Murphy
Abstract:
We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optim…
▽ More
We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optimal policy in a parameterized stochastic policy class. The performance of the estimated policy is measured by the difference between the optimal average reward in the policy class and the average reward of the estimated policy and we establish a finite-sample regret guarantee. The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.
△ Less
Submitted 17 September, 2022; v1 submitted 22 July, 2020;
originally announced July 2020.
-
The Micro-Randomized Trial for Develo** Digital Interventions: Experimental Design Considerations
Authors:
Ashley E. Walton,
Linda M. Collins,
Predrag Klasnja,
Inbal Nahum-Shani,
Mashfiqui Rabbi,
Maureen A. Walton,
Susan A. Murphy
Abstract:
Just-in-time adaptive interventions (JITAIs) are time-varying adaptive interventions that use frequent opportunities for the intervention to be adapted such as weekly, daily, or even many times a day. This high intensity of adaptation is facilitated by the ability of digital technology to continuously collect information about an individual's current context and deliver treatments adapted to this…
▽ More
Just-in-time adaptive interventions (JITAIs) are time-varying adaptive interventions that use frequent opportunities for the intervention to be adapted such as weekly, daily, or even many times a day. This high intensity of adaptation is facilitated by the ability of digital technology to continuously collect information about an individual's current context and deliver treatments adapted to this information. The micro-randomized trial (MRT) has emerged for use in informing the construction of JITAIs. MRTs operate in, and take advantage of, the rapidly time-varying digital intervention environment. MRTs can be used to address research questions about whether and under what circumstances particular components of a JITAI are effective, with the ultimate objective of develo** effective and efficient components. The purpose of this article is to clarify why, when, and how to use MRTs; to highlight elements that must be considered when designing and implementing an MRT; and to discuss the possibilities this emerging optimization trial design offers for future research in the behavioral sciences, education, and other fields. We briefly review key elements of JITAIs, and then describe three case studies of MRTs, each of which highlights research questions that can be addressed using the MRT and experimental design considerations that might arise. We also discuss a variety of considerations that go into planning and designing an MRT, using the case studies as examples.
△ Less
Submitted 23 April, 2020;
originally announced May 2020.
-
The Micro-Randomized Trial for Develo** Digital Interventions: Data Analysis Methods
Authors:
Tianchen Qian,
Michael A. Russell,
Linda M. Collins,
Predrag Klasnja,
Stephanie T. Lanza,
Hyesun Yoo,
Susan A. Murphy
Abstract:
Although there is much excitement surrounding the use of mobile and wearable technology for the purposes of delivering interventions as people go through their day-to-day lives, data analysis methods for constructing and optimizing digital interventions lag behind. Here, we elucidate data analysis methods for primary and secondary analyses of micro-randomized trials (MRTs), an experimental design…
▽ More
Although there is much excitement surrounding the use of mobile and wearable technology for the purposes of delivering interventions as people go through their day-to-day lives, data analysis methods for constructing and optimizing digital interventions lag behind. Here, we elucidate data analysis methods for primary and secondary analyses of micro-randomized trials (MRTs), an experimental design to optimize digital just-in-time adaptive interventions. We provide a definition of causal "excursion" effects suitable for use in digital intervention development. We introduce the weighted and centered least-squares (WCLS) estimator which provides consistent causal excursion effect estimators for digital interventions from MRT data. We describe how the WCLS estimator along with associated test statistics can be obtained using standard statistical software such as SAS or R. Throughout we use HeartSteps, an MRT designed to increase physical activity among sedentary individuals, to illustrate potential primary and secondary analyses.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Power Constrained Bandits
Authors:
Jiayu Yao,
Emma Brunskill,
Weiwei Pan,
Susan Murphy,
Finale Doshi-Velez
Abstract:
Contextual bandits often provide simple and effective personalization in decision making problems, making them popular tools to deliver personalized interventions in mobile health as well as other health applications. However, when bandits are deployed in the context of a scientific study -- e.g. a clinical trial to test if a mobile health intervention is effective -- the aim is not only to person…
▽ More
Contextual bandits often provide simple and effective personalization in decision making problems, making them popular tools to deliver personalized interventions in mobile health as well as other health applications. However, when bandits are deployed in the context of a scientific study -- e.g. a clinical trial to test if a mobile health intervention is effective -- the aim is not only to personalize for an individual, but also to determine, with sufficient statistical power, whether or not the system's intervention is effective. It is essential to assess the effectiveness of the intervention before broader deployment for better resource allocation. The two objectives are often deployed under different model assumptions, making it hard to determine how achieving the personalization and statistical power affect each other. In this work, we develop general meta-algorithms to modify existing algorithms such that sufficient power is guaranteed while still improving each user's well-being. We also demonstrate that our meta-algorithms are robust to various model mis-specifications possibly appearing in statistical studies, thus providing a valuable tool to study designers.
△ Less
Submitted 27 July, 2021; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Streamlined Empirical Bayes Fitting of Linear Mixed Models in Mobile Health
Authors:
Marianne Menictas,
Sabina Tomkins,
Susan A Murphy
Abstract:
To effect behavior change a successful algorithm must make high-quality decisions in real-time. For example, a mobile health (mHealth) application designed to increase physical activity must make contextually relevant suggestions to motivate users. While machine learning offers solutions for certain stylized settings, such as when batch data can be processed offline, there is a dearth of approache…
▽ More
To effect behavior change a successful algorithm must make high-quality decisions in real-time. For example, a mobile health (mHealth) application designed to increase physical activity must make contextually relevant suggestions to motivate users. While machine learning offers solutions for certain stylized settings, such as when batch data can be processed offline, there is a dearth of approaches which can deliver high-quality solutions under the specific constraints of mHealth. We propose an algorithm which provides users with contextualized and personalized physical activity suggestions. This algorithm is able to overcome a challenge critical to mHealth that complex models be trained efficiently. We propose a tractable streamlined empirical Bayes procedure which fits linear mixed effects models in large-data settings. Our procedure takes advantage of sparsity introduced by hierarchical random effects to efficiently learn the posterior distribution of a linear mixed effects model. A key contribution of this work is that we provide explicit updates in order to learn both fixed effects, random effects and hyper-parameter values. We demonstrate the success of this approach in a mobile health (mHealth) reinforcement learning application, a domain in which fast computations are crucial for real time interventions. Not only is our approach computationally efficient, it is also easily implemented with closed form matrix algebraic updates and we show improvements over state of the art approaches both in speed and accuracy of up to 99% and 56% respectively.
△ Less
Submitted 28 March, 2020;
originally announced March 2020.
-
Rapidly Personalizing Mobile Health Treatment Policies with Limited Data
Authors:
Sabina Tomkins,
Peng Liao,
Predrag Klasnja,
Serena Yeung,
Susan Murphy
Abstract:
In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challengi…
▽ More
In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challenging. We present IntelligentPooling, which learns personalized policies via an adaptive, principled use of other users' data. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art across all generative models. Additionally, we inspect the behavior of this approach in a live clinical trial, demonstrating its ability to learn from even a small group of users.
△ Less
Submitted 23 February, 2020;
originally announced February 2020.
-
Inference for Batched Bandits
Authors:
Kelly W. Zhang,
Lucas Janson,
Susan A. Murphy
Abstract:
As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We first prove that the ordinary least squares estimator (OLS), which is asympto…
▽ More
As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We first prove that the ordinary least squares estimator (OLS), which is asymptotically normal on independently sampled data, is not asymptotically normal on data collected using standard bandit algorithms when there is no unique optimal arm. This asymptotic non-normality result implies that the naive assumption that the OLS estimator is approximately normal can lead to Type-1 error inflation and confidence intervals with below-nominal coverage probabilities. Second, we introduce the Batched OLS estimator (BOLS) that we prove is (1) asymptotically normal on data collected from both multi-arm and contextual bandits and (2) robust to non-stationarity in the baseline reward.
△ Less
Submitted 8 January, 2021; v1 submitted 8 February, 2020;
originally announced February 2020.
-
Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health
Authors:
Peng Liao,
Predrag Klasnja,
Susan Murphy
Abstract:
Due to the recent advancements in wearables and sensing technology, health scientists are increasingly develo** mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHea…
▽ More
Due to the recent advancements in wearables and sensing technology, health scientists are increasingly develo** mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHealth intervention policies, often called just-in-time adaptive interventions, are decision rules that map an individual's current state (e.g., individual's past behaviors as well as current observations of time, location, social activity, stress and urges to smoke) to a particular treatment at each of many time points. The vast majority of current mHealth interventions deploy expert-derived policies. In this paper, we provide an approach for conducting inference about the performance of one or more such policies using historical data collected under a possibly different policy. Our measure of performance is the average of proximal outcomes over a long time period should the particular mHealth policy be followed. We provide an estimator as well as confidence intervals. This work is motivated by HeartSteps, an mHealth physical activity intervention.
△ Less
Submitted 22 July, 2020; v1 submitted 30 December, 2019;
originally announced December 2019.
-
Estimating Time-Varying Causal Excursion Effect in Mobile Health with Binary Outcomes
Authors:
Tianchen Qian,
Hyesun Yoo,
Predrag Klasnja,
Daniel Almirall,
Susan A. Murphy
Abstract:
Advances in wearables and digital technology now make it possible to deliver behavioral mobile health interventions to individuals in their everyday life. The micro-randomized trial (MRT) is increasingly used to provide data to inform the construction of these interventions. In an MRT, each individual is repeatedly randomized among multiple intervention options, often hundreds or even thousands of…
▽ More
Advances in wearables and digital technology now make it possible to deliver behavioral mobile health interventions to individuals in their everyday life. The micro-randomized trial (MRT) is increasingly used to provide data to inform the construction of these interventions. In an MRT, each individual is repeatedly randomized among multiple intervention options, often hundreds or even thousands of times, over the course of the trial. This work is motivated by multiple MRTs that have been conducted, or are currently in the field, in which the primary outcome is a longitudinal binary outcome. The primary aim of such MRTs is to examine whether a particular time-varying intervention has an effect on the longitudinal binary outcome, often marginally over all but a small subset of the individual's data. We propose the definition of causal excursion effect that can be used in such primary aim analysis for MRTs with binary outcomes. Under rather restrictive assumptions one can, based on existing literature, derive a semiparametric, locally efficient estimator of the causal effect. We, starting from this estimator, develop an estimator that can be used as the basis of a primary aim analysis under more plausible assumptions. Simulation studies are conducted to compare the estimators. We illustrate the developed methods using data from the MRT, BariFit. In BariFit, the goal is to support weight maintenance for individuals who received bariatric surgery.
△ Less
Submitted 29 July, 2020; v1 submitted 2 June, 2019;
originally announced June 2019.
-
Linear mixed models with endogenous covariates: modeling sequential treatment effects with application to a mobile health study
Authors:
Tianchen Qian,
Predrag Klasnja,
Susan A. Murphy
Abstract:
Mobile health is a rapidly develo** field in which behavioral treatments are delivered to individuals via wearables or smartphones to facilitate health-related behavior change. Micro-randomized trials (MRT) are an experimental design for develo** mobile health interventions. In an MRT the treatments are randomized numerous times for each individual over course of the trial. Along with assessin…
▽ More
Mobile health is a rapidly develo** field in which behavioral treatments are delivered to individuals via wearables or smartphones to facilitate health-related behavior change. Micro-randomized trials (MRT) are an experimental design for develo** mobile health interventions. In an MRT the treatments are randomized numerous times for each individual over course of the trial. Along with assessing treatment effects, behavioral scientists aim to understand between-person heterogeneity in the treatment effect. A natural approach is the familiar linear mixed model. However, directly applying linear mixed models is problematic because potential moderators of the treatment effect are frequently endogenous---that is, may depend on prior treatment. We discuss model interpretation and biases that arise in the absence of additional assumptions when endogenous covariates are included in a linear mixed model. In particular, when there are endogenous covariates, the coefficients no longer have the customary marginal interpretation. However, these coefficients still have a conditional-on-the-random-effect interpretation. We provide an additional assumption that, if true, allows scientists to use standard software to fit linear mixed model with endogenous covariates, and person-specific predictions of effects can be provided. As an illustration, we assess the effect of activity suggestion in the HeartSteps MRT and analyze the between-person treatment effect heterogeneity.
△ Less
Submitted 10 May, 2019; v1 submitted 27 February, 2019;
originally announced February 2019.
-
Practical Considerations for Data Collection and Management in Mobile Health Micro-randomized Trials
Authors:
Nicholas J. Seewald,
Shawna N. Smith,
Andy **seok Lee,
Predrag Klasnja,
Susan A. Murphy
Abstract:
There is a growing interest in leveraging the prevalence of mobile technology to improve health by delivering momentary, contextualized interventions to individuals' smartphones. A just-in-time adaptive intervention (JITAI) adjusts to an individual's changing state and/or context to provide the right treatment, at the right time, in the right place. Micro-randomized trials (MRTs) allow for the col…
▽ More
There is a growing interest in leveraging the prevalence of mobile technology to improve health by delivering momentary, contextualized interventions to individuals' smartphones. A just-in-time adaptive intervention (JITAI) adjusts to an individual's changing state and/or context to provide the right treatment, at the right time, in the right place. Micro-randomized trials (MRTs) allow for the collection of data which aid in the construction of an optimized JITAI by sequentially randomizing participants to different treatment options at each of many decision points throughout the study. Often, this data is collected passively using a mobile phone. To assess the causal effect of treatment on a near-term outcome, care must be taken when designing the data collection system to ensure it is of appropriately high quality. Here, we make several recommendations for collecting and managing data from an MRT. We provide advice on selecting which features to collect and when, choosing between "agents" to implement randomization, identifying sources of missing data, and overcoming other novel challenges. The recommendations are informed by our experience with HeartSteps, an MRT designed to test the effects of an intervention aimed at increasing physical activity in sedentary adults. We also provide a checklist which can be used in designing a data collection system so that scientists can focus more on their questions of interest, and less on cleaning data.
△ Less
Submitted 27 December, 2018;
originally announced December 2018.
-
Personalizing Intervention Probabilities By Pooling
Authors:
Sabina Tomkins,
Predrag Klasnja,
Susan Murphy
Abstract:
In many mobile health interventions, treatments should only be delivered in a particular context, for example when a user is currently stressed, walking or sedentary. Even in an optimal context, concerns about user burden can restrict which treatments are sent. To diffuse the treatment delivery over times when a user is in a desired context, it is critical to predict the future number of times the…
▽ More
In many mobile health interventions, treatments should only be delivered in a particular context, for example when a user is currently stressed, walking or sedentary. Even in an optimal context, concerns about user burden can restrict which treatments are sent. To diffuse the treatment delivery over times when a user is in a desired context, it is critical to predict the future number of times the context will occur. The focus of this paper is on whether personalization can improve predictions in these settings. Though the variance between individuals' behavioral patterns suggest that personalization should be useful, the amount of individual-level data limits its capabilities. Thus, we investigate several methods which pool data across users to overcome these deficiencies and find that pooling lowers the overall error rate relative to both personalized and batch approaches.
△ Less
Submitted 2 December, 2018;
originally announced December 2018.
-
Action Centered Contextual Bandits
Authors:
Kristjan Greenewald,
Ambuj Tewari,
Predrag Klasnja,
Susan Murphy
Abstract:
Contextual bandits have become popular as they offer a middle ground between very simple approaches based on multi-armed bandits and very complex approaches using the full power of reinforcement learning. They have demonstrated success in web applications and have a rich body of associated theoretical guarantees. Linear models are well understood theoretically and preferred by practitioners becaus…
▽ More
Contextual bandits have become popular as they offer a middle ground between very simple approaches based on multi-armed bandits and very complex approaches using the full power of reinforcement learning. They have demonstrated success in web applications and have a rich body of associated theoretical guarantees. Linear models are well understood theoretically and preferred by practitioners because they are not only easily interpretable but also simple to implement and debug. Furthermore, if the linear model is true, we get very strong performance guarantees. Unfortunately, in emerging applications in mobile health, the time-invariant linear model assumption is untenable. We provide an extension of the linear model for contextual bandits that has two parts: baseline reward and treatment effect. We allow the former to be complex but keep the latter simple. We argue that this model is plausible for mobile health applications. At the same time, it leads to algorithms with strong performance guarantees as in the linear model setting, while still allowing for complex nonlinear baseline modeling. Our theory is supported by experiments on data gathered in a recently concluded mobile health study.
△ Less
Submitted 9 November, 2017;
originally announced November 2017.
-
The stratified micro-randomized trial design: sample size considerations for testing nested causal effects of time-varying treatments
Authors:
Walter Dempsey,
Peng Liao,
Santosh Kumar,
Susan A. Murphy
Abstract:
Technological advancements in the field of mobile devices and wearable sensors have helped overcome obstacles in the delivery of care, making it possible to deliver behavioral treatments anytime and anywhere. Increasingly the delivery of these treatments is triggered by predictions of risk or engagement which may have been impacted by prior treatments. Furthermore the treatments are often designed…
▽ More
Technological advancements in the field of mobile devices and wearable sensors have helped overcome obstacles in the delivery of care, making it possible to deliver behavioral treatments anytime and anywhere. Increasingly the delivery of these treatments is triggered by predictions of risk or engagement which may have been impacted by prior treatments. Furthermore the treatments are often designed to have an impact on individuals over a span of time during which subsequent treatments may be provided.
Here we discuss our work on the design of a mobile health smoking cessation experimental study in which two challenges arose. First the randomizations to treatment should occur at times of stress and second the outcome of interest accrues over a period that may include subsequent treatment. To address these challenges we develop the "stratified micro-randomized trial," in which each individual is randomized among treatments at times determined by predictions constructed from outcomes to prior treatment and with randomization probabilities depending on these outcomes. We define both conditional and marginal proximal treatment effects. Depending on the scientific goal these effects may be defined over a period of time during which subsequent treatments may be provided. We develop a primary analysis method and associated sample size formulae for testing these effects.
△ Less
Submitted 9 November, 2017;
originally announced November 2017.
-
An Actor-Critic Contextual Bandit Algorithm for Personalized Mobile Health Interventions
Authors:
Huitian Lei,
Yangyi Lu,
Ambuj Tewari,
Susan A. Murphy
Abstract:
Increasing technological sophistication and widespread use of smartphones and wearable devices provide opportunities for innovative and highly personalized health interventions. A Just-In-Time Adaptive Intervention (JITAI) uses real-time data collection and communication capabilities of modern mobile devices to deliver interventions in real-time that are adapted to the in-the-moment needs of the u…
▽ More
Increasing technological sophistication and widespread use of smartphones and wearable devices provide opportunities for innovative and highly personalized health interventions. A Just-In-Time Adaptive Intervention (JITAI) uses real-time data collection and communication capabilities of modern mobile devices to deliver interventions in real-time that are adapted to the in-the-moment needs of the user. The lack of methodological guidance in constructing data-based JITAIs remains a hurdle in advancing JITAI research despite the increasing popularity of JITAIs among clinical scientists. In this article, we make a first attempt to bridge this methodological gap by formulating the task of tailoring interventions in real-time as a contextual bandit problem. Interpretability requirements in the domain of mobile health lead us to formulate the problem differently from existing formulations intended for web applications such as ad or news article placement. Under the assumption of linear reward function, we choose the reward function (the "critic") parameterization separately from a lower dimensional parameterization of stochastic policies (the "actor"). We provide an online actor-critic algorithm that guides the construction and refinement of a JITAI. Asymptotic properties of the actor-critic algorithm are developed and backed up by numerical experiments. Additional numerical experiments are conducted to test the robustness of the algorithm when idealized assumptions used in the analysis of contextual bandit algorithm are breached.
△ Less
Submitted 22 April, 2022; v1 submitted 27 June, 2017;
originally announced June 2017.
-
A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward
Authors:
S. A. Murphy,
Y. Deng,
E. B. Laber,
H. R. Maei,
R. S. Sutton,
K. Witkiewitz
Abstract:
We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.
We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.
△ Less
Submitted 18 July, 2016;
originally announced July 2016.
-
Assessing Time-Varying Causal Effect Moderation in Mobile Health
Authors:
Audrey Boruvka,
Daniel Almirall,
Katie Witkiewitz,
Susan A. Murphy
Abstract:
In mobile health interventions aimed at behavior change and maintenance, treatments are provided in real time to manage current or impending high risk situations or promote healthy behaviors in near real time. Currently there is great scientific interest in develo** data analysis approaches to guide the development of mobile interventions. In particular data from mobile health studies might be u…
▽ More
In mobile health interventions aimed at behavior change and maintenance, treatments are provided in real time to manage current or impending high risk situations or promote healthy behaviors in near real time. Currently there is great scientific interest in develo** data analysis approaches to guide the development of mobile interventions. In particular data from mobile health studies might be used to examine effect moderators-i.e., individual characteristics, time-varying context or past treatment response that moderate the effect of current treatment on a subsequent response. This paper introduces a formal definition for moderated effects in terms of potential outcomes, a definition that is particularly suited to mobile interventions, where treatment occasions are numerous, individuals are not always available for treatment, and potential moderators might be influenced by past treatment. Methods for estimating moderated effects are developed and compared. The proposed approach is illustrated using BASICS-Mobile, a smartphone-based intervention designed to curb heavy drinking and smoking among college students.
△ Less
Submitted 16 August, 2016; v1 submitted 2 January, 2016;
originally announced January 2016.
-
Sample Size Calculations for Micro-randomized Trials in mHealth
Authors:
Peng Liao,
Predrag Klasnja,
Ambuj Tewari,
Susan A. Murphy
Abstract:
The use and development of mobile interventions are experiencing rapid growth. In "just-in-time" mobile interventions, treatments are provided via a mobile device and they are intended to help an individual make healthy decisions "in the moment," and thus have a proximal, near future impact. Currently the development of mobile interventions is proceeding at a much faster pace than that of associat…
▽ More
The use and development of mobile interventions are experiencing rapid growth. In "just-in-time" mobile interventions, treatments are provided via a mobile device and they are intended to help an individual make healthy decisions "in the moment," and thus have a proximal, near future impact. Currently the development of mobile interventions is proceeding at a much faster pace than that of associated data science methods. A first step toward develo** data-based methods is to provide an experimental design for testing the proximal effects of these just-in-time treatments. In this paper, we propose a "micro-randomized" trial design for this purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the study, with the result that each participant may be randomized at the 100s or 1000s of occasions at which a treatment might be provided. Further, we develop a test statistic for assessing the proximal effect of a treatment as well as an associated sample size calculator. We conduct simulation evaluations of the sample size calculator in various settings. Rules of thumb that might be used in designing a micro-randomized trial are discussed. This work is motivated by our collaboration on the HeartSteps mobile application designed to increase physical activity.
△ Less
Submitted 22 July, 2020; v1 submitted 1 April, 2015;
originally announced April 2015.
-
Small Sample Inference for Generalization Error in Classification Using the CUD Bound
Authors:
Eric B. Laber,
Susan A. Murphy
Abstract:
Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the resampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996,Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the genera…
▽ More
Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the resampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996,Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the generalization error to form a confidence set. Unfortunately, these methods do not reliably provide sets of the desired confidence. The poor performance appears to be due to the lack of smoothness of the generalization error as a function of the learned classifier. This results in a non-normal distribution of the estimated generalization error. We construct a confidence set for the generalization error by use of a smooth upper bound on the deviation between the resampled estimate and generalization error. The confidence set is formed by bootstrap** this upper bound. In cases in which the approximation class for the classifier can be represented as a parametric additive model, we provide a computationally efficient algorithm. This method exhibits superior performance across a series of test and simulated data sets.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.
-
Active Learning for Develo** Personalized Treatment
Authors:
Kun Deng,
Joelle Pineau,
Susan A. Murphy
Abstract:
The personalization of treatment via bio-markers and other risk categories has drawn increasing interest among clinical scientists. Personalized treatment strategies can be learned using data from clinical trials, but such trials are very costly to run. This paper explores the use of active learning techniques to design more efficient trials, addressing issues such as whom to recruit, at what poin…
▽ More
The personalization of treatment via bio-markers and other risk categories has drawn increasing interest among clinical scientists. Personalized treatment strategies can be learned using data from clinical trials, but such trials are very costly to run. This paper explores the use of active learning techniques to design more efficient trials, addressing issues such as whom to recruit, at what point in the trial, and which treatment to assign, throughout the duration of the trial. We propose a minimax bandit model with two different optimization criteria, and discuss the computational challenges and issues pertaining to this approach. We evaluate our active learning policies using both simulated data, and data modeled after a clinical trial for treating depressed individuals, and contrast our methods with other plausible active learning policies.
△ Less
Submitted 14 February, 2012;
originally announced February 2012.
-
Statistical Inference in Dynamic Treatment Regimes
Authors:
Eric B. Laber,
Min Qian,
Dan J. Lizotte,
William E. Pelham,
Susan A. Murphy
Abstract:
Dynamic treatment regimes are of growing interest across the clinical sciences as these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. A dynamic treatment regime is a sequence of decision rules, with a decision rule per stage of clinical intervention; each decision rule maps up-to-date patient information to a recommended treatment. We b…
▽ More
Dynamic treatment regimes are of growing interest across the clinical sciences as these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. A dynamic treatment regime is a sequence of decision rules, with a decision rule per stage of clinical intervention; each decision rule maps up-to-date patient information to a recommended treatment. We briefly review a variety of approaches for using data to construct the decision rules. We then review an interesting challenge, that of nonregularity that often arises in this area. By nonregularity, we mean the parameters indexing the optimal dynamic treatment regime are nonsmooth functionals of the underlying generative distribution.
A consequence is that no regular or asymptotically unbiased estimator of these parameters exists. Nonregularity arises in inference for parameters in the optimal dynamic treatment regime; we illustrate the effect of nonregularity on asymptotic bias and via sensitivity of asymptotic, limiting, distributions to local perturbations. We propose and evaluate a locally consistent Adaptive Confidence Interval (ACI) for the parameters of the optimal dynamic treatment regime. We use data from the Adaptive Interventions for Children with ADHD study as an illustrative example. We conclude by highlighting and discussing emerging theoretical problems in this area.
△ Less
Submitted 26 November, 2013; v1 submitted 30 June, 2010;
originally announced June 2010.