Search | arXiv e-print repository

Sequential Knockoffs for Variable Selection in Reinforcement Learning

Authors: Tao Ma, Hengrui Cai, Zhengling Qi, Chengchun Shi, Eric B. Laber

Abstract: In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state which is larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the sta… ▽ More In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state which is larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the state can slow learning and obfuscate the learned policy. We introduce the notion of a minimal sufficient state in a Markov decision process (MDP) as the smallest subvector of the original state under which the process remains an MDP and shares the same optimal policy as the original process. We propose a novel sequential knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics. In large samples, the proposed method controls the false discovery rate, and selects all sufficient variables with probability approaching one. As the method is agnostic to the reinforcement learning algorithm being applied, it benefits downstream tasks such as policy optimization. Empirical experiments verify theoretical results and show the proposed approach outperforms several competing methods in terms of variable selection accuracy and regret. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2003.05084 [pdf, other]

A spatiotemporal recommendation engine for malaria control

Authors: Qian Guan, Brian J. Reich, Eric B. Laber

Abstract: Malaria is an infectious disease affecting a large population across the world, and interventions need to be efficiently applied to reduce the burden of malaria. We develop a framework to help policy-makers decide how to allocate limited resources in realtime for malaria control. We formalize a policy for the resource allocation as a sequence of decisions, one per intervention decision, that map u… ▽ More Malaria is an infectious disease affecting a large population across the world, and interventions need to be efficiently applied to reduce the burden of malaria. We develop a framework to help policy-makers decide how to allocate limited resources in realtime for malaria control. We formalize a policy for the resource allocation as a sequence of decisions, one per intervention decision, that map up-to-date disease related information to a resource allocation. An optimal policy must control the spread of the disease while being interpretable and viewed as equitable to stakeholders. We construct an interpretable class of resource allocation policies that can accommodate allocation of resources residing in a continuous domain, and combine a hierarchical Bayesian spatiotemporal model for disease transmission with a policy-search algorithm to estimate an optimal policy for resource allocation within the pre-specified class. The estimated optimal policy under the proposed framework improves the cumulative long-term outcome compared with naive approaches in both simulation experiments and application to malaria interventions in the Democratic Republic of the Congo. △ Less

Submitted 10 March, 2020; originally announced March 2020.

arXiv:1912.06667 [pdf, other]

High dimensional precision medicine from patient-derived xenografts

Authors: Naim U. Rashid, Daniel J. Luckett, **gxiang Chen, Michael T. Lawson, Longshaokan Wang, Yunshu Zhang, Eric B. Laber, Yufeng Liu, Jen Jen Yeh, Donglin Zeng, Michael R. Kosorok

Abstract: The complexity of human cancer often results in significant heterogeneity in response to treatment. Precision medicine offers potential to improve patient outcomes by leveraging this heterogeneity. Individualized treatment rules (ITRs) formalize precision medicine as maps from the patient covariate space into the space of allowable treatments. The optimal ITR is that which maximizes the mean of a… ▽ More The complexity of human cancer often results in significant heterogeneity in response to treatment. Precision medicine offers potential to improve patient outcomes by leveraging this heterogeneity. Individualized treatment rules (ITRs) formalize precision medicine as maps from the patient covariate space into the space of allowable treatments. The optimal ITR is that which maximizes the mean of a clinical outcome in a population of interest. Patient-derived xenograft (PDX) studies permit the evaluation of multiple treatments within a single tumor and thus are ideally suited for estimating optimal ITRs. PDX data are characterized by correlated outcomes, a high-dimensional feature space, and a large number of treatments. Existing methods for estimating optimal ITRs do not take advantage of the unique structure of PDX data or handle the associated challenges well. In this paper, we explore machine learning methods for estimating optimal ITRs from PDX data. We analyze data from a large PDX study to identify biomarkers that are informative for develo** personalized treatment recommendations in multiple cancers. We estimate optimal ITRs using regression-based approaches such as Q-learning and direct search methods such as outcome weighted learning. Finally, we implement a superlearner approach to combine a set of estimated ITRs and show that the resulting ITR performs better than any of the input ITRs, mitigating uncertainty regarding user choice of any particular ITR estimation methodology. Our results indicate that PDX data are a valuable resource for develo** individualized treatment strategies in oncology. △ Less

Submitted 13 December, 2019; originally announced December 2019.

arXiv:1906.06646 [pdf, other]

Sample Size Calculations for SMARTs

Authors: Eric J. Rose, Eric B. Laber, Marie Davidian, Anastasios A. Tsiatis, Ying-Qi Zhao, Michael R. Kosorok

Abstract: Sequential Multiple Assignment Randomized Trials (SMARTs) are considered the gold standard for estimation and evaluation of treatment regimes. SMARTs are typically sized to ensure sufficient power for a simple comparison, e.g., the comparison of two fixed treatment sequences. Estimation of an optimal treatment regime is conducted as part of a secondary and hypothesis-generating analysis with forma… ▽ More Sequential Multiple Assignment Randomized Trials (SMARTs) are considered the gold standard for estimation and evaluation of treatment regimes. SMARTs are typically sized to ensure sufficient power for a simple comparison, e.g., the comparison of two fixed treatment sequences. Estimation of an optimal treatment regime is conducted as part of a secondary and hypothesis-generating analysis with formal evaluation of the estimated optimal regime deferred to a follow-up trial. However, running a follow-up trial to evaluate an estimated optimal treatment regime is costly and time-consuming; furthermore, the estimated optimal regime that is to be evaluated in such a follow-up trial may be far from optimal if the original trial was underpowered for estimation of an optimal regime. We derive sample size procedures for a SMART that ensure: (i) sufficient power for comparing the optimal treatment regime with standard of care; and (ii) the estimated optimal regime is within a given tolerance of the true optimal regime with high-probability. We establish asymptotic validity of the proposed procedures and demonstrate their finite sample performance in a series of simulation experiments. △ Less

Submitted 16 June, 2019; originally announced June 2019.

arXiv:1905.11765 [pdf, other]

Global forensic geolocation with deep neural networks

Authors: Neal S. Grantham, Brian J. Reich, Eric B. Laber, Krishna Pacifici, Robert R. Dunn, Noah Fierer, Matthew Gebert, Julia S. Allwood, Seth A. Faith

Abstract: An important problem in forensic analyses is identifying the provenance of materials at a crime scene, such as biological material on a piece of clothing. This procedure, known as geolocation, is conventionally guided by expert knowledge of the biological evidence and therefore tends to be application-specific, labor-intensive, and subjective. Purely data-driven methods have yet to be fully realiz… ▽ More An important problem in forensic analyses is identifying the provenance of materials at a crime scene, such as biological material on a piece of clothing. This procedure, known as geolocation, is conventionally guided by expert knowledge of the biological evidence and therefore tends to be application-specific, labor-intensive, and subjective. Purely data-driven methods have yet to be fully realized due in part to the lack of a sufficiently rich data source. However, high-throughput sequencing technologies are able to identify tens of thousands of microbial taxa using DNA recovered from a single swab collected from nearly any object or surface. We present a new algorithm for geolocation that aggregates over an ensemble of deep neural network classifiers trained on randomly-generated Voronoi partitions of a spatial domain. We apply the algorithm to fungi present in each of 1300 dust samples collected across the continental United States and then to a global dataset of dust samples from 28 countries. Our algorithm makes remarkably good point predictions with more than half of the geolocation errors under 100 kilometers for the continental analysis and nearly 90% classification accuracy of a sample's country of origin for the global analysis. We suggest that the effectiveness of this model sets the stage for a new, quantitative approach to forensic geolocation. △ Less

Submitted 28 May, 2019; originally announced May 2019.

arXiv:1905.04735 [pdf, ps, other]

Note on Thompson sampling for large decision problems

Authors: Tao Hu, Eric B. Laber, Zhen Li, Nick J. Meyer, Krishna Pacifici

Abstract: There is increasing interest in using streaming data to inform decision making across a wide range of application domains including mobile health, food safety, security, and resource management. A decision support system formalizes online decision making as a map from up-to-date information to a recommended decision. Online estimation of an optimal decision strategy from streaming data requires si… ▽ More There is increasing interest in using streaming data to inform decision making across a wide range of application domains including mobile health, food safety, security, and resource management. A decision support system formalizes online decision making as a map from up-to-date information to a recommended decision. Online estimation of an optimal decision strategy from streaming data requires simultaneous estimation of components of the underlying system dynamics as well as the optimal decision strategy given these dynamics; thus, there is an inherent trade-off between choosing decisions that lead to improved estimates and choosing decisions that appear to be optimal based on current estimates. Thompson (1933) was among the first to formalize this trade-off in the context of choosing between two treatments for a stream of patients; he proposed a simple heuristic wherein a treatment is selected randomly at each time point with selection probability proportional to the posterior probability that it is optimal. We consider a variant of Thompson sampling that is simple to implement and can be applied to large and complex decision problems. We show that the proposed Thompson sampling estimator is consistent for the optimal decision support system and provide rates of convergence and finite sample error bounds. The proposed algorithm is illustrated using an agent-based model of the spread of influenza on a network and management of mallard populations in the United States. △ Less

Submitted 12 May, 2019; originally announced May 2019.

arXiv:1901.00663 [pdf, other]

Efficient augmentation and relaxation learning for individualized treatment rules using observational data

Authors: Ying-Qi Zhao, Eric B. Laber, Yang Ning, Sumona Saha, Bruce Sands

Abstract: Individualized treatment rules aim to identify if, when, which, and to whom treatment should be applied. A globally aging population, rising healthcare costs, and increased access to patient-level data have created an urgent need for high-quality estimators of individualized treatment rules that can be applied to observational data. A recent and promising line of research for estimating individual… ▽ More Individualized treatment rules aim to identify if, when, which, and to whom treatment should be applied. A globally aging population, rising healthcare costs, and increased access to patient-level data have created an urgent need for high-quality estimators of individualized treatment rules that can be applied to observational data. A recent and promising line of research for estimating individualized treatment rules recasts the problem of estimating an optimal treatment rule as a weighted classification problem. We consider a class of estimators for optimal treatment rules that are analogous to convex large-margin classifiers. The proposed class applies to observational data and is doubly-robust in the sense that correct specification of either a propensity or outcome model leads to consistent estimation of the optimal individualized treatment rule. Using techniques from semiparametric efficiency theory, we derive rates of convergence for the proposed estimators and use these rates to characterize the bias-variance trade-off for estimating individualized treatment rules with classification-based methods. Simulation experiments informed by these results demonstrate that it is possible to construct new estimators within the proposed framework that significantly outperform existing ones. We illustrate the proposed methods using data from a labor training program and a study of inflammatory bowel syndrome. △ Less

Submitted 3 January, 2019; originally announced January 2019.

arXiv:1812.08696 [pdf, other]

Generalization error for decision problems

Authors: Eric B. Laber, Min Qian

Abstract: In this entry we review the generalization error for classification and single-stage decision problems. We distinguish three alternative definitions of the generalization error which have, at times, been conflated in the statistics literature and show that these definitions need not be equivalent even asymptotically. Because the generalization error is a non-smooth functional of the underlying gen… ▽ More In this entry we review the generalization error for classification and single-stage decision problems. We distinguish three alternative definitions of the generalization error which have, at times, been conflated in the statistics literature and show that these definitions need not be equivalent even asymptotically. Because the generalization error is a non-smooth functional of the underlying generative model, standard asymptotic approximations, e.g., the bootstrap or normal approximations, cannot guarantee correct frequentist operating characteristics without modification. We provide simple data-adaptive procedures that can be used to construct asymptotically valid confidence sets for the generalization error. We conclude the entry with a discussion of extensions and related problems. △ Less

Submitted 20 December, 2018; originally announced December 2018.

arXiv:1812.01162 [pdf, other]

doi 10.1007/978-3-030-33416-1_7

Hierarchical Continuous Time Hidden Markov Model, with Application in Zero-Inflated Accelerometer Data

Authors: Zekun Xu, Eric B. Laber, Ana-Maria Staicu

Abstract: Wearable devices including accelerometers are increasingly being used to collect high-frequency human activity data in situ. There is tremendous potential to use such data to inform medical decision making and public health policies. However, modeling such data is challenging as they are high-dimensional, heterogeneous, and subject to informative missingness, e.g., zero readings when the device is… ▽ More Wearable devices including accelerometers are increasingly being used to collect high-frequency human activity data in situ. There is tremendous potential to use such data to inform medical decision making and public health policies. However, modeling such data is challenging as they are high-dimensional, heterogeneous, and subject to informative missingness, e.g., zero readings when the device is removed by the participant. We propose a flexible and extensible continuous-time hidden Markov model to extract meaningful activity patterns from human accelerometer data. To facilitate estimation with massive data we derive an efficient learning algorithm that exploits the hierarchical structure of the parameters indexing the proposed model. We also propose a bootstrap procedure for interval estimation. The proposed methods are illustrated using data from the 2003 - 2004 and 2005 - 2006 National Health and Nutrition Examination Survey. △ Less

Submitted 25 December, 2018; v1 submitted 3 December, 2018; originally announced December 2018.

Comments: 18 pages, 4 figures

arXiv:1811.04471 [pdf, other]

Thompson Sampling for Pursuit-Evasion Problems

Authors: Zhen Li, Nicholas J. Meyer, Eric B. Laber, Robert Brigantic

Abstract: Pursuit-evasion is a multi-agent sequential decision problem wherein a group of agents known as pursuers coordinate their traversal of a spatial domain to locate an agent trying to evade them. Pursuit evasion problems arise in a number of import application domains including defense and route planning. Learning to optimally coordinate pursuer behaviors so as to minimize time to capture of the evad… ▽ More Pursuit-evasion is a multi-agent sequential decision problem wherein a group of agents known as pursuers coordinate their traversal of a spatial domain to locate an agent trying to evade them. Pursuit evasion problems arise in a number of import application domains including defense and route planning. Learning to optimally coordinate pursuer behaviors so as to minimize time to capture of the evader is challenging because of a large action space and sparse noisy state information; consequently, previous approaches have relied primarily on heuristics. We propose a variant of Thompson Sampling for pursuit-evasion that allows for the application of existing model-based planning algorithms. This approach is general in that it allows for an arbitrary number of pursuers, a general spatial domain, and the integration of auxiliary information provided by informants. In a suite of simulation experiments, Thompson Sampling for pursuit evasion significantly reduces time-to-capture relative to competing algorithms. △ Less

Submitted 11 November, 2018; originally announced November 2018.

arXiv:1810.04338 [pdf, other]

Bayesian Nonparametric Policy Search with Application to Periodontal Recall Intervals

Authors: Qian Guan, Brian J. Reich, Eric B. Laber, Dipankar Bandyopadhyay

Abstract: Tooth loss from periodontal disease is a major public health burden in the United States. Standard clinical practice is to recommend a dental visit every six months; however, this practice is not evidence-based, and poor dental outcomes and increasing dental insurance premiums indicate room for improvement. We consider a tailored approach that recommends recall time based on patient characteristic… ▽ More Tooth loss from periodontal disease is a major public health burden in the United States. Standard clinical practice is to recommend a dental visit every six months; however, this practice is not evidence-based, and poor dental outcomes and increasing dental insurance premiums indicate room for improvement. We consider a tailored approach that recommends recall time based on patient characteristics and medical history to minimize disease progression without increasing resource expenditures. We formalize this method as a dynamic treatment regime which comprises a sequence of decisions, one per stage of intervention, that follow a decision rule which maps current patient information to a recommendation for their next visit time. The dynamics of periodontal health, visit frequency, and patient compliance are complex, yet the estimated optimal regime must be interpretable to domain experts if it is to be integrated into clinical practice. We combine non-parametric Bayesian dynamics modeling with policy-search algorithms to estimate the optimal dynamic treatment regime within an interpretable class of regimes. Both simulation experiments and application to a rich database of electronic dental records from the HealthPartners HMO shows that our proposed method leads to better dental health without increasing the average recommended recall time relative to competing methods. △ Less

Submitted 9 October, 2018; originally announced October 2018.

arXiv:1807.06711 [pdf, ps, other]

Receiver Operating Characteristic Curves and Confidence Bands for Support Vector Machines

Authors: Daniel J. Luckett, Eric B. Laber, Samer S. El-Kamary, Cheng Fan, Ravi Jhaveri, Charles M. Perou, Fatma M. Shebl, Michael R. Kosorok

Abstract: Many problems that appear in biomedical decision making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The costs of false positives and false negatives vary across application domains and receiver operating characteristic (ROC) curves provide a visual representation of this trade-off. Nonparametric estimators for the ROC curve,… ▽ More Many problems that appear in biomedical decision making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The costs of false positives and false negatives vary across application domains and receiver operating characteristic (ROC) curves provide a visual representation of this trade-off. Nonparametric estimators for the ROC curve, such as a weighted support vector machine (SVM), are desirable because they are robust to model misspecification. While weighted SVMs have great potential for estimating ROC curves, their theoretical properties were heretofore underdeveloped. We propose a method for constructing confidence bands for the SVM ROC curve and provide the theoretical justification for the SVM ROC curve by showing that the risk function of the estimated decision rule is uniformly consistent across the weight parameter. We demonstrate the proposed confidence band method and the superior sensitivity and specificity of the weighted SVM compared to commonly used methods in diagnostic medicine using simulation studies. We present two illustrative examples: diagnosis of hepatitis C and a predictive model for treatment response in breast cancer. △ Less

Submitted 17 July, 2018; originally announced July 2018.

arXiv:1711.10581 [pdf, ps, other]

Estimation and Optimization of Composite Outcomes

Authors: Daniel J. Luckett, Eric B. Laber, Michael R. Kosorok

Abstract: There is tremendous interest in precision medicine as a means to improve patient outcomes by tailoring treatment to individual characteristics. An individualized treatment rule formalizes precision medicine as a map from patient information to a recommended treatment. A treatment rule is defined to be optimal if it maximizes the mean of a scalar outcome in a population of interest, e.g., symptom r… ▽ More There is tremendous interest in precision medicine as a means to improve patient outcomes by tailoring treatment to individual characteristics. An individualized treatment rule formalizes precision medicine as a map from patient information to a recommended treatment. A treatment rule is defined to be optimal if it maximizes the mean of a scalar outcome in a population of interest, e.g., symptom reduction. However, clinical and intervention scientists often must balance multiple and possibly competing outcomes, e.g., symptom reduction and the risk of an adverse event. One approach to precision medicine in this setting is to elicit a composite outcome which balances all competing outcomes; unfortunately, eliciting a composite outcome directly from patients is difficult without a high-quality instrument, and an expert-derived composite outcome may not account for heterogeneity in patient preferences. We propose a new paradigm for the study of precision medicine using observational data that relies solely on the assumption that clinicians are approximately (i.e., imperfectly) making decisions to maximize individual patient utility. Estimated composite outcomes are subsequently used to construct an estimator of an individualized treatment rule which maximizes the mean of patient-specific composite outcomes. The estimated composite outcomes and estimated optimal individualized treatment rule provide new insights into patient preference heterogeneity, clinician behavior, and the value of precision medicine in a given domain. We derive inference procedures for the proposed estimators under mild conditions and demonstrate their finite sample performance through a suite of simulation experiments and an illustrative application to data from a study of bipolar depression. △ Less

Submitted 26 May, 2020; v1 submitted 28 November, 2017; originally announced November 2017.

arXiv:1704.07531 [pdf, other]

Sufficient Markov Decision Processes with Alternating Deep Neural Networks

Authors: Longshaokan Wang, Eric B. Laber, Katie Witkiewitz

Abstract: Advances in mobile computing technologies have made it possible to monitor and apply data-driven interventions across complex systems in real time. Markov decision processes (MDPs) are the primary model for sequential decision problems with a large or indefinite time horizon. Choosing a representation of the underlying decision process that is both Markov and low-dimensional is non-trivial. We pro… ▽ More Advances in mobile computing technologies have made it possible to monitor and apply data-driven interventions across complex systems in real time. Markov decision processes (MDPs) are the primary model for sequential decision problems with a large or indefinite time horizon. Choosing a representation of the underlying decision process that is both Markov and low-dimensional is non-trivial. We propose a method for constructing a low-dimensional representation of the original decision process for which: 1. the MDP model holds; 2. a decision strategy that maximizes mean utility when applied to the low-dimensional representation also maximizes mean utility when applied to the original process. We use a deep neural network to define a class of potential process representations and estimate the process of lowest dimension within this class. The method is illustrated using data from a mobile study on heavy drinking and smoking among college students. △ Less

Submitted 17 March, 2018; v1 submitted 25 April, 2017; originally announced April 2017.

Comments: 31 pages, 3 figures, extended abstract in the proceedings of RLDM2017. (v2 revisions: Fixed a minor bug in the code w.r.t. setting seed, as a result numbers in the simulation experiments had some slight changes, but conclusions stayed the same. Corrected typos. Improved notations.)

arXiv:1611.03531 [pdf, ps, other]

Estimating Dynamic Treatment Regimes in Mobile Health Using V-learning

Authors: Daniel J. Luckett, Eric B. Laber, Anna R. Kahkoska, David M. Maahs, Elizabeth Mayer-Davis, Michael R. Kosorok

Abstract: The vision for precision medicine is to use individual patient characteristics to inform a personalized treatment plan that leads to the best healthcare possible for each patient. Mobile technologies have an important role to play in this vision as they offer a means to monitor a patient's health status in real-time and subsequently to deliver interventions if, when, and in the dose that they are… ▽ More The vision for precision medicine is to use individual patient characteristics to inform a personalized treatment plan that leads to the best healthcare possible for each patient. Mobile technologies have an important role to play in this vision as they offer a means to monitor a patient's health status in real-time and subsequently to deliver interventions if, when, and in the dose that they are needed. Dynamic treatment regimes formalize individualized treatment plans as sequences of decision rules, one per stage of clinical intervention, that map current patient information to a recommended treatment. However, existing methods for estimating optimal dynamic treatment regimes are designed for a small number of fixed decision points occurring on a coarse time-scale. We propose a new reinforcement learning method for estimating an optimal treatment regime that is applicable to data collected using mobile technologies in an outpatient setting. The proposed method accommodates an indefinite time horizon and minute-by-minute decision making that are common in mobile health applications. We show the proposed estimators are consistent and asymptotically normal under mild conditions. The proposed methods are applied to estimate an optimal dynamic treatment regime for controlling blood glucose levels in patients with type 1 diabetes. △ Less

Submitted 14 October, 2017; v1 submitted 10 November, 2016; originally announced November 2016.

arXiv:1607.05047 [pdf, other]

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

Authors: S. A. Murphy, Y. Deng, E. B. Laber, H. R. Maei, R. S. Sutton, K. Witkiewitz

Abstract: We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health. We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health. △ Less

Submitted 18 July, 2016; originally announced July 2016.

arXiv:1606.01472 [pdf, other]

Interpretable Dynamic Treatment Regimes

Authors: Yichi Zhang, Eric B. Laber, Anastasios Tsiatis, Marie Davidian

Abstract: Precision medicine is currently a topic of great interest in clinical and intervention science. One way to formalize precision medicine is through a treatment regime, which is a sequence of decision rules, one per stage of clinical intervention, that map up-to-date patient information to a recommended treatment. An optimal treatment regime is defined as maximizing the mean of some cumulative clini… ▽ More Precision medicine is currently a topic of great interest in clinical and intervention science. One way to formalize precision medicine is through a treatment regime, which is a sequence of decision rules, one per stage of clinical intervention, that map up-to-date patient information to a recommended treatment. An optimal treatment regime is defined as maximizing the mean of some cumulative clinical outcome if applied to a population of interest. It is well-known that even under simple generative models an optimal treatment regime can be a highly nonlinear function of patient information. Consequently, a focal point of recent methodological research has been the development of flexible models for estimating optimal treatment regimes. However, in many settings, estimation of an optimal treatment regime is an exploratory analysis intended to generate new hypotheses for subsequent research and not to directly dictate treatment to new patients. In such settings, an estimated treatment regime that is interpretable in a domain context may be of greater value than an unintelligible treatment regime built using "black-box" estimation methods. We propose an estimator of an optimal treatment regime composed of a sequence of decision rules, each expressible as a list of "if-then" statements that can be presented as either a paragraph or as a simple flowchart that is immediately interpretable to domain experts. The discreteness of these lists precludes smooth, i.e., gradient-based, methods of estimation and leads to non-standard asymptotics. Nevertheless, we provide a computationally efficient estimation algorithm, prove consistency of the proposed estimator, and derive rates of convergence. We illustrate the proposed methods using a series of simulation examples and application to data from a sequential clinical trial on bipolar disorder. △ Less

Submitted 5 June, 2016; originally announced June 2016.

arXiv:1504.07715 [pdf, other]

Using Decision Lists to Construct Interpretable and Parsimonious Treatment Regimes

Authors: Yichi Zhang, Eric B. Laber, Anastasios Tsiatis, Marie Davidian

Abstract: A treatment regime formalizes personalized medicine as a function from individual patient characteristics to a recommended treatment. A high-quality treatment regime can improve patient outcomes while reducing cost, resource consumption, and treatment burden. Thus, there is tremendous interest in estimating treatment regimes from observational and randomized studies. However, the development of tr… ▽ More A treatment regime formalizes personalized medicine as a function from individual patient characteristics to a recommended treatment. A high-quality treatment regime can improve patient outcomes while reducing cost, resource consumption, and treatment burden. Thus, there is tremendous interest in estimating treatment regimes from observational and randomized studies. However, the development of treatment regimes for application in clinical practice requires the long-term, joint effort of statisticians and clinical scientists. In this collaborative process, the statistician must integrate clinical science into the statistical models underlying a treatment regime and the clinician must scrutinize the estimated treatment regime for scientific validity. To facilitate meaningful information exchange, it is important that estimated treatment regimes be interpretable in a subject-matter context. We propose a simple, yet flexible class of treatment regimes whose members are representable as a short list of if-then statements. Regimes in this class are immediately interpretable and are therefore an appealing choice for broad application in practice. We derive a robust estimator of the optimal regime within this class and demonstrate its finite sample performance using simulation experiments. The proposed method is illustrated with data from two clinical trials. △ Less

Submitted 28 April, 2015; originally announced April 2015.

arXiv:1407.3414 [pdf, other]

Interactive Q-learning for Probabilities and Quantiles

Authors: Kristin A. Linn, Eric B. Laber, Leonard A. Stefanski

Abstract: A dynamic treatment regime is a sequence of decision rules in which each decision rule recommends treatment based on features of patient medical history such as past treatments and outcomes. Existing methods for estimating optimal dynamic treatment regimes from data optimize the mean of a response variable. However, the mean may not always be the most appropriate summary of performance. We derive… ▽ More A dynamic treatment regime is a sequence of decision rules in which each decision rule recommends treatment based on features of patient medical history such as past treatments and outcomes. Existing methods for estimating optimal dynamic treatment regimes from data optimize the mean of a response variable. However, the mean may not always be the most appropriate summary of performance. We derive estimators of decision rules for optimizing probabilities and quantiles computed with respect to the response distribution for two-stage, binary treatment settings. This enables estimation of dynamic treatment regimes that optimize the cumulative distribution function of the response at a prespecified point or a prespecified quantile of the response distribution such as the median. The proposed methods perform favorably in simulation experiments. We illustrate our approach with data from a sequentially randomized trial where the primary outcome is remission of depression symptoms. △ Less

Submitted 21 May, 2015; v1 submitted 12 July, 2014; originally announced July 2014.

arXiv:1207.3100 [pdf, other]

Set-valued dynamic treatment regimes for competing outcomes

Authors: Eric B. Laber, Daniel J. Lizotte, Bradley Ferguson

Abstract: Dynamic treatment regimes operationalize the clinical decision process as a sequence of functions, one for each clinical decision, where each function takes as input up-to-date patient information and gives as output a single recommended treatment. Current methods for estimating optimal dynamic treatment regimes, for example Q-learning, require the specification of a single outcome by which the `g… ▽ More Dynamic treatment regimes operationalize the clinical decision process as a sequence of functions, one for each clinical decision, where each function takes as input up-to-date patient information and gives as output a single recommended treatment. Current methods for estimating optimal dynamic treatment regimes, for example Q-learning, require the specification of a single outcome by which the `goodness' of competing dynamic treatment regimes are measured. However, this is an over-simplification of the goal of clinical decision making, which aims to balance several potentially competing outcomes. For example, often a balance must be struck between treatment effectiveness and side-effect burden. We propose a method for constructing dynamic treatment regimes that accommodates competing outcomes by recommending sets of treatments at each decision point. Formally, we construct a sequence of set-valued functions that take as input up-to-date patient information and give as output a recommended subset of the possible treatments. For a given patient history, the recommended set of treatments contains all treatments that are not inferior according to any of the competing outcomes. When there is more than one decision point, constructing these set-valued functions requires solving a non-trivial enumeration problem. We offer an exact enumeration algorithm by recasting the problem as a linear mixed integer program. The proposed methods are illustrated using data from a depression study and the CATIE schizophrenia study. △ Less

Submitted 7 August, 2012; v1 submitted 12 July, 2012; originally announced July 2012.

arXiv:1206.3274 [pdf]

Small Sample Inference for Generalization Error in Classification Using the CUD Bound

Authors: Eric B. Laber, Susan A. Murphy

Abstract: Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the resampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996,Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the genera… ▽ More Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the resampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996,Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the generalization error to form a confidence set. Unfortunately, these methods do not reliably provide sets of the desired confidence. The poor performance appears to be due to the lack of smoothness of the generalization error as a function of the learned classifier. This results in a non-normal distribution of the estimated generalization error. We construct a confidence set for the generalization error by use of a smooth upper bound on the deviation between the resampled estimate and generalization error. The confidence set is formed by bootstrap** this upper bound. In cases in which the approximation class for the classifier can be represented as a parametric additive model, we provide a computationally efficient algorithm. This method exhibits superior performance across a series of test and simulated data sets. △ Less

Submitted 13 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

Report number: UAI-P-2008-PG-357-365

arXiv:1203.2879 [pdf, other]

An imputation method for estimating the learning curve in classification problems

Authors: Eric B. Laber, Kerby Shedden, Yang Yang

Abstract: The learning curve expresses the error rate of a predictive modeling procedure as a function of the sample size of the training dataset. It typically is a decreasing, convex function with a positive limiting value. An estimate of the learning curve can be used to assess whether a modeling procedure should be expected to become substantially more accurate if additional training data become availabl… ▽ More The learning curve expresses the error rate of a predictive modeling procedure as a function of the sample size of the training dataset. It typically is a decreasing, convex function with a positive limiting value. An estimate of the learning curve can be used to assess whether a modeling procedure should be expected to become substantially more accurate if additional training data become available. This article proposes a new procedure for estimating learning curves using imputation. We focus on classification, although the idea is applicable to other predictive modeling settings. Simulation studies indicate that the learning curve can be estimated with useful accuracy for a roughly four-fold increase in the size of the training set relative to the available data, and that the proposed imputation approach outperforms an alternative estimation approach based on parameterizing the learning curve. We illustrate the method with an application that predicts the risk of disease progression for people with chronic lymphocytic leukemia. △ Less

Submitted 13 March, 2012; originally announced March 2012.

arXiv:1202.4177 [pdf, ps, other]

doi 10.1214/13-STS450

$Q$- and $A$-Learning Methods for Estimating Optimal Dynamic Treatment Regimes

Authors: Phillip J. Schulte, Anastasios A. Tsiatis, Eric B. Laber, Marie Davidian

Abstract: In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data,… ▽ More In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study. △ Less

Submitted 3 February, 2015; v1 submitted 19 February, 2012; originally announced February 2012.

Comments: Published in at http://dx.doi.org/10.1214/13-STS450 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS450

Journal ref: Statistical Science 2014, Vol. 29, No. 4, 640-661

arXiv:1006.5831 [pdf, other]

Statistical Inference in Dynamic Treatment Regimes

Authors: Eric B. Laber, Min Qian, Dan J. Lizotte, William E. Pelham, Susan A. Murphy

Abstract: Dynamic treatment regimes are of growing interest across the clinical sciences as these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. A dynamic treatment regime is a sequence of decision rules, with a decision rule per stage of clinical intervention; each decision rule maps up-to-date patient information to a recommended treatment. We b… ▽ More Dynamic treatment regimes are of growing interest across the clinical sciences as these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. A dynamic treatment regime is a sequence of decision rules, with a decision rule per stage of clinical intervention; each decision rule maps up-to-date patient information to a recommended treatment. We briefly review a variety of approaches for using data to construct the decision rules. We then review an interesting challenge, that of nonregularity that often arises in this area. By nonregularity, we mean the parameters indexing the optimal dynamic treatment regime are nonsmooth functionals of the underlying generative distribution. A consequence is that no regular or asymptotically unbiased estimator of these parameters exists. Nonregularity arises in inference for parameters in the optimal dynamic treatment regime; we illustrate the effect of nonregularity on asymptotic bias and via sensitivity of asymptotic, limiting, distributions to local perturbations. We propose and evaluate a locally consistent Adaptive Confidence Interval (ACI) for the parameters of the optimal dynamic treatment regime. We use data from the Adaptive Interventions for Children with ADHD study as an illustrative example. We conclude by highlighting and discussing emerging theoretical problems in this area. △ Less

Submitted 26 November, 2013; v1 submitted 30 June, 2010; originally announced June 2010.

MSC Class: 47N30

Showing 1–24 of 24 results for author: Laber, E B