Search | arXiv e-print repository

Off-policy Evaluation in Doubly Inhomogeneous Environments

Authors: Zeyu Bian, Chengchun Shi, Zhengling Qi, Lan Wang

Abstract: This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we propose a class of latent factor models for the reward and observation transition functions, under which we develop a general OPE framework that consists of both m… ▽ More This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we propose a class of latent factor models for the reward and observation transition functions, under which we develop a general OPE framework that consists of both model-based and model-free approaches. To our knowledge, this is the first paper that develops statistically sound OPE methods in offline RL with double inhomogeneities. It contributes to a deeper understanding of OPE in environments, where standard RL assumptions are not met, and provides several practical approaches in these settings. We establish the theoretical properties of the proposed value estimators and empirically show that our approach outperforms competing methods that ignore either temporal nonstationarity or individual heterogeneity. Finally, we illustrate our method on a data set from the Medical Information Mart for Intensive Care. △ Less

Submitted 7 September, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

arXiv:2208.03291 [pdf]

Comparing Unit Trains versus Manifest Trains for the Risk of Rail Transport of Hazardous Materials -- Part II: Application and Case Study

Authors: Di Kang, Jiaxi Zhao, C. Tyler Dick, Xiang Liu, Zheyong Bian, Steven W. Kirkpatrick, Chen-Yu Lin

Abstract: Built upon the risk analysis methodology (presented in the part I paper), this part II paper focuses on applying this methodology. Five illustrative scenarios were used to analyze the best or worst cases and compare the transportation risk differences between service options using unit trains and manifest trains. The comparison results indicate that if all tank cars are placed at the positions wit… ▽ More Built upon the risk analysis methodology (presented in the part I paper), this part II paper focuses on applying this methodology. Five illustrative scenarios were used to analyze the best or worst cases and compare the transportation risk differences between service options using unit trains and manifest trains. The comparison results indicate that if all tank cars are placed at the positions with the lowest probability of derailing and if switching tank cars alone in classification yards, it could provide the lowest risk estimate given the same transportation demand (i.e., number of tank cars to transport). This paper also shows that based on the data and parameters in the case study, risks during arrival/departure events and yard switching events could be as significant as risks that on mainlines. This paper provides a way to use the risk analysis methodology for rail safety decisions. The methodology and its application can be tailored to specific infrastructure and rolling stock characteristics. △ Less

Submitted 4 July, 2022; originally announced August 2022.

arXiv:2207.02113 [pdf]

Comparing Unit Trains versus Manifest Trains for the Risk of Rail Transport of Hazardous Materials -- Part I: Risk Analysis Methodology

Authors: Di Kang, Jiaxi Zhao, C. Tyler Dick, Xiang Liu, Zheyong Bian, Steven W. Kirkpatrick, Chen-Yu Lin

Abstract: Transporting hazardous materials (hazmats) using tank cars has more significant economic benefits than other transportation modes. Although railway transportation is roughly four times more fuel-efficient than roadway transportation, a train derailment has greater potential to cause more disastrous consequences than a truck incident. Train types, such as unit train or manifest train (also called m… ▽ More Transporting hazardous materials (hazmats) using tank cars has more significant economic benefits than other transportation modes. Although railway transportation is roughly four times more fuel-efficient than roadway transportation, a train derailment has greater potential to cause more disastrous consequences than a truck incident. Train types, such as unit train or manifest train (also called mixed train), can influence transport risks in several ways. For example, unit trains only experience risks on mainlines and when arriving at or departing from terminals, while manifest trains experience additional switching risks in yards. Based on prior studies and various data sources covering the years 1996-2018, this paper constructs event chains for line-haul risks on mainlines (for both unit trains and manifest trains), arrival/departure risks in terminals (for unit trains) and yards (for manifest trains), and yard switching risks for manifest trains using various probabilistic models, and finally determines expected casualties as the consequences of a potential train derailment and release incident. This is the first analysis to quantify the total risks a train may encounter throughout the shipment process, either on mainlines or in yards/terminals, distinguishing train types. It provides a methodology applicable to any train to calculate the expected risks (quantified as expected casualties in this paper) from an origin to a destination. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:2205.13609 [pdf, ps, other]

Variable Selection for Individualized Treatment Rules with Discrete Outcomes

Authors: Zeyu Bian, Erica EM Moodie, Susan M Shortreed, Sylvie D Lambert, Sahir Bhatnagar

Abstract: An individualized treatment rule (ITR) is a decision rule that aims to improve individual patients health outcomes by recommending optimal treatments according to patients specific information. In observational studies, collected data may contain many variables that are irrelevant for making treatment decisions. Including all available variables in the statistical model for the ITR could yield a l… ▽ More An individualized treatment rule (ITR) is a decision rule that aims to improve individual patients health outcomes by recommending optimal treatments according to patients specific information. In observational studies, collected data may contain many variables that are irrelevant for making treatment decisions. Including all available variables in the statistical model for the ITR could yield a loss of efficiency and an unnecessarily complicated treatment rule, which is difficult for physicians to interpret or implement. Thus, a data-driven approach to select important tailoring variables with the aim of improving the estimated decision rules is crucial. While there is a growing body of literature on selecting variables in ITRs with continuous outcomes, relatively few methods exist for discrete outcomes, which pose additional computational challenges even in the absence of variable selection. In this paper, we propose a variable selection method for ITRs with discrete outcomes. We show theoretically and empirically that our approach has the double robustness property, and that it compares favorably with other competing approaches. We illustrate the proposed method on data from a study of an adaptive web-based stress management tool to identify which variables are relevant for tailoring treatment. △ Less

Submitted 29 September, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2101.07359 [pdf, other]

Variable Selection in Regression-based Estimation of Dynamic Treatment Regimes

Authors: Zeyu Bian, Erica EM Moodie, Susan M Shortreed, Sahir Bhatnagar

Abstract: Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that finds effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include the interaction between treatment and a small number of covariates which are often chosen a priori. However, with increasingly large and complex data… ▽ More Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that finds effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include the interaction between treatment and a small number of covariates which are often chosen a priori. However, with increasingly large and complex data being collected, it is difficult to know which prognostic factors might be relevant in the treatment rule. Therefore, a more data-driven approach of selecting these covariates might improve the estimated decision rules and simplify models to make them easier to interpret. We propose a variable selection method for DTR estimation using penalized dynamic weighted least squares. Our method has the strong heredity property, that is, an interaction term can be included in the model only if the corresponding main terms have also been selected. Through simulations, we show our method has both the double robustness property and the oracle property, and the newly proposed methods compare favorably with other variable selection approaches. △ Less

Submitted 3 December, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

arXiv:2010.05749 [pdf, ps, other]

Detecting the skewness of data from the five-number summary and its application in meta-analysis

Authors: Jiandong Shi, Dehui Luo, Xiang Wan, Yue Liu, Jiming Liu, Zhaoxiang Bian, Tiejun Tong

Abstract: For clinical studies with continuous outcomes, when the data are potentially skewed, researchers may choose to report the whole or part of the five-number summary (the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation. In the recent literature, it is often suggested to transform the five-number summary back to the s… ▽ More For clinical studies with continuous outcomes, when the data are potentially skewed, researchers may choose to report the whole or part of the five-number summary (the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation. In the recent literature, it is often suggested to transform the five-number summary back to the sample mean and standard deviation, which can be subsequently used in a meta-analysis. However, if a study contains skewed data, this transformation and hence the conclusions from the meta-analysis are unreliable. Therefore, we introduce a novel method for detecting the skewness of data using only the five-number summary and the sample size, and meanwhile propose a new flow chart to handle the skewed studies in a different manner. We further show by simulations that our skewness tests are able to control the type I error rates and provide good statistical power, followed by a simulated meta-analysis and a real data example that illustrate the usefulness of our new method in meta-analysis and evidence-based medicine. △ Less

Submitted 5 May, 2023; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: 40 pages, 10 figures, 7 tables

arXiv:1802.04920 [pdf, other]

DVAE++: Discrete Variational Autoencoders with Overlap** Transformations

Authors: Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, Evgeny Andriyash

Abstract: Training of discrete latent variable models remains challenging because passing gradient information through discrete units is difficult. We propose a new class of smoothing transformations based on a mixture of two overlap** distributions, and show that the proposed transformation can be used for training binary latent models with either directed or undirected priors. We derive a new variationa… ▽ More Training of discrete latent variable models remains challenging because passing gradient information through discrete units is difficult. We propose a new class of smoothing transformations based on a mixture of two overlap** distributions, and show that the proposed transformation can be used for training binary latent models with either directed or undirected priors. We derive a new variational bound to efficiently train with Boltzmann machine priors. Using this bound, we develop DVAE++, a generative model with a global discrete prior and a hierarchy of convolutional continuous variables. Experiments on several benchmarks show that overlap** transformations outperform other recent continuous relaxations of discrete latent variables including Gumbel-Softmax (Maddison et al., 2016; Jang et al., 2016), and discrete variational autoencoders (Rolfe 2016). △ Less

Submitted 25 May, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

Comments: Published as a conference paper at International Conference on Machine Learning (ICML), 2018

arXiv:1611.04528 [pdf, other]

Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines

Authors: Dmytro Korenkevych, Yanbo Xue, Zhengbing Bian, Fabian Chudak, William G. Macready, Jason Rolfe, Evgeny Andriyash

Abstract: Quantum annealing (QA) is a hardware-based heuristic optimization and sampling method applicable to discrete undirected graphical models. While similar to simulated annealing, QA relies on quantum, rather than thermal, effects to explore complex search spaces. For many classes of problems, QA is known to offer computational advantages over simulated annealing. Here we report on the ability of rece… ▽ More Quantum annealing (QA) is a hardware-based heuristic optimization and sampling method applicable to discrete undirected graphical models. While similar to simulated annealing, QA relies on quantum, rather than thermal, effects to explore complex search spaces. For many classes of problems, QA is known to offer computational advantages over simulated annealing. Here we report on the ability of recent QA hardware to accelerate training of fully visible Boltzmann machines. We characterize the sampling distribution of QA hardware, and show that in many cases, the quantum distributions differ significantly from classical Boltzmann distributions. In spite of this difference, training (which seeks to match data and model statistics) using standard classical gradient updates is still effective. We investigate the use of QA for seeding Markov chains as an alternative to contrastive divergence (CD) and persistent contrastive divergence (PCD). Using $k=50$ Gibbs steps, we show that for problems with high-energy barriers between modes, QA-based seeds can improve upon chains with CD and PCD initializations. For these hard problems, QA gradient estimates are more accurate, and allow for faster learning. Furthermore, and interestingly, even the case of raw QA samples (that is, $k=0$) achieved similar improvements. We argue that this relates to the fact that we are training a quantum rather than classical Boltzmann distribution in this case. The learned parameters give rise to hardware QA distributions closely approximating classical Boltzmann distributions that are hard to train with CD/PCD. △ Less

Submitted 14 November, 2016; originally announced November 2016.

Comments: 22 pages, 13 figures, D-Wave quantum system for sampling Boltzmann machines

Showing 1–8 of 8 results for author: Bian, Z