Skip to main content

Showing 1–50 of 83 results for author: Shi, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.19531  [pdf, other

    stat.ML cs.LG

    Forward and Backward State Abstractions for Off-policy Evaluation

    Authors: Meiling Hao, **fan Su, Liyuan Hu, Zoltan Szabo, Qingyuan Zhao, Chengchun Shi

    Abstract: Off-policy evaluation (OPE) is crucial for evaluating a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging.This paper studies state abstractions-originally designed for policy learning-in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstracti… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 42 pages, 5 figures

    ACM Class: G.3; I.2.6; G.1.2

  2. arXiv:2406.00317  [pdf, other

    stat.ML cs.LG stat.ME

    Combining Experimental and Historical Data for Policy Evaluation

    Authors: Ting Li, Chengchun Shi, Qianglin Wen, Yang Sui, Yongli Qin, Chunbo Lai, Hongtu Zhu

    Abstract: This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to min… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  3. arXiv:2403.17285  [pdf, other

    stat.ML cs.LG

    An Analysis of Switchback Designs in Reinforcement Learning

    Authors: Qianglin Wen, Chengchun Shi, Ying Yang, Niansheng Tang, Hongtu Zhu

    Abstract: This paper offers a detailed investigation of switchback designs in A/B testing, which alternate between baseline and new policies over time. Our aim is to thoroughly evaluate the effects of these designs on the accuracy of their resulting average treatment effect (ATE) estimators. We propose a novel "weak signal analysis" framework, which substantially simplifies the calculations of the mean squa… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  4. arXiv:2403.11841  [pdf, other

    stat.ML cs.AI cs.LG

    Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data

    Authors: Danyang Wang, Chengchun Shi, Shikai Luo, Will Wei Sun

    Abstract: In real-world scenarios, datasets collected from randomized experiments are often constrained by size, due to limitations in time and budget. As a result, leveraging large observational datasets becomes a more attractive option for achieving high-quality policy learning. However, most existing offline reinforcement learning (RL) methods depend on two key assumptions--unconfoundedness and positivit… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  5. arXiv:2402.09723  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    Efficient Prompt Optimization Through the Lens of Best Arm Identification

    Authors: Chengshuai Shi, Kun Yang, Zihan Chen, Jundong Li, **g Yang, Cong Shen

    Abstract: The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically finding good prompts, i.e., prompt optimization. Most existing works follow the scheme of selecting from a pre-generated pool of candidate prompts. However, these designs mainly focus on the generation strategy, while limited attention has been paid to the selection metho… ▽ More

    Submitted 30 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  6. arXiv:2402.08105  [pdf, other

    cs.LG stat.ML

    Learning Cartesian Product Graphs with Laplacian Constraints

    Authors: Changhao Shi, Gal Mishne

    Abstract: Graph Laplacian learning, also known as network topology inference, is a problem of great interest to multiple communities. In Gaussian graphical models (GM), graph learning amounts to endowing covariance selection with the Laplacian structure. In graph signal processing (GSP), it is essential to infer the unobserved graph from the outputs of a filtering system. In this paper, we study the problem… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted to AISTATS 2024

  7. arXiv:2401.14989  [pdf

    cs.LG stat.ML

    Map**-to-Parameter Nonlinear Functional Regression with Novel B-spline Free Knot Placement Algorithm

    Authors: Chengdong Shi, Ching-Hsun Tseng, Wei Zhao, Xiao-Jun Zeng

    Abstract: We propose a novel approach to nonlinear functional regression, called the Map**-to-Parameter function model, which addresses complex and nonlinear functional regression problems in parameter space by employing any supervised learning technique. Central to this model is the map** of function data from an infinite-dimensional function space to a finite-dimensional parameter space. This is accom… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  8. arXiv:2401.05517  [pdf, other

    stat.ME econ.EM math.ST

    On Efficient Inference of Causal Effects with Multiple Mediators

    Authors: Haoyu Wei, Hengrui Cai, Chengchun Shi, Rui Song

    Abstract: This paper provides robust estimators and efficient inference of causal effects involving multiple interacting mediators. Most existing works either impose a linear model assumption among the mediators or are restricted to handle conditionally independent mediators given the exposure. To overcome these limitations, we define causal and individual mediation effects in a general setting, and employ… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    MSC Class: 62A09; 62G05; 62G35

  9. arXiv:2312.16341  [pdf, other

    stat.ML cs.IT cs.LG cs.MA

    Harnessing the Power of Federated Learning in Federated Contextual Bandits

    Authors: Chengshuai Shi, Ruida Zhou, Kun Yang, Cong Shen

    Abstract: Federated learning (FL) has demonstrated great potential in revolutionizing distributed machine learning, and tremendous efforts have been made to extend it beyond the original focus on supervised learning. Among many directions, federated contextual bandits (FCB), a pivotal integration of FL and sequential decision-making, has garnered significant attention in recent years. Despite substantial pr… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: A preliminary version appeared in the Multi-Agent Security Workshop at NeurIPS 2023

  10. arXiv:2311.02532  [pdf, other

    stat.ME

    Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making

    Authors: Ting Li, Chengchun Shi, Jianing Wang, Fan Zhou, Hongtu Zhu

    Abstract: A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately. We propose three optimal allocation strategies in a dynamic setting where treatments are sequentia… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  11. arXiv:2310.18715  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Offline Reinforcement learning with Heavy-Tailed Rewards

    Authors: ** Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi

    Abstract: This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-m… ▽ More

    Submitted 30 March, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: 23 pages, 6 figures. Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

  12. arXiv:2310.16203  [pdf, other

    stat.ME

    Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework

    Authors: Lan Luo, Chengchun Shi, Jitao Wang, Zhenke Wu, Lexin Li

    Abstract: Mediation analysis is an important analytic tool commonly used in a broad range of scientific applications. In this article, we study the problem of mediation analysis when there are multivariate and conditionally dependent mediators, and when the variables are observed over multiple time points. The problem is challenging, because the effect of a mediator involves not only the path from the treat… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  13. arXiv:2306.08719  [pdf, other

    stat.ME cs.LG

    Off-policy Evaluation in Doubly Inhomogeneous Environments

    Authors: Zeyu Bian, Chengchun Shi, Zhengling Qi, Lan Wang

    Abstract: This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we propose a class of latent factor models for the reward and observation transition functions, under which we develop a general OPE framework that consists of both m… ▽ More

    Submitted 7 September, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  14. arXiv:2306.08364  [pdf, other

    stat.ML cs.IT cs.LG

    Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources

    Authors: Chengshuai Shi, Wei Xiong, Cong Shen, **g Yang

    Abstract: Existing theoretical studies on offline reinforcement learning (RL) mostly consider a dataset sampled directly from the target task. In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  15. arXiv:2306.06041  [pdf, other

    cs.LG stat.ML

    A Graph Dynamics Prior for Relational Inference

    Authors: Liming Pan, Cheng Shi, Ivan Dokmanić

    Abstract: Relational inference aims to identify interactions between parts of a dynamical system from the observed dynamics. Current state-of-the-art methods fit the dynamics with a graph neural network (GNN) on a learnable graph. They use one-step message-passing GNNs -- intuitively the right choice since non-locality of multi-step or spectral GNNs may confuse direct and indirect interactions. But the \tex… ▽ More

    Submitted 20 December, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  16. arXiv:2305.19244  [pdf, other

    stat.ML cs.LG

    Testing for the Markov Property in Time Series via Deep Conditional Generative Learning

    Authors: Yunzhe Zhou, Chengchun Shi, Lexin Li, Qiwei Yao

    Abstract: The Markov property is widely imposed in analysis of time series data. Correspondingly, testing the Markov property, and relatedly, inferring the order of a Markov model, are of paramount importance. In this article, we propose a nonparametric test for the Markov property in high-dimensional time series via deep conditional generative learning. We also apply the test sequentially to determine the… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  17. arXiv:2305.13856  [pdf, ps, other

    cs.LG math.OC stat.ML

    On the Optimal Batch Size for Byzantine-Robust Distributed Learning

    Authors: Yi-Rui Yang, Chang-Wei Shi, Wu-Jun Li

    Abstract: Byzantine-robust distributed learning (BRDL), in which computing devices are likely to behave abnormally due to accidental failures or malicious attacks, has recently become a hot research topic. However, even in the independent and identically distributed (i.i.d.) case, existing BRDL methods will suffer from a significant drop on model accuracy due to the large variance of stochastic gradients. I… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  18. arXiv:2305.10187  [pdf, other

    stat.ME cs.LG stat.ML

    Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing

    Authors: Ting Li, Chengchun Shi, Zhaohua Lu, Yi Li, Hongtu Zhu

    Abstract: Many modern tech companies, such as Google, Uber, and Didi, utilize online experiments (also known as A/B testing) to evaluate new policies against existing ones. While most studies concentrate on average treatment effects, situations with skewed and heavy-tailed outcome distributions may benefit from alternative criteria, such as quantiles. However, assessing dynamic quantile treatment effects (Q… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  19. arXiv:2305.03884  [pdf, other

    stat.ML cs.IT cs.LG eess.SP

    On High-dimensional and Low-rank Tensor Bandits

    Authors: Chengshuai Shi, Cong Shen, Nicholas D. Sidiropoulos

    Abstract: Most existing studies on linear bandits focus on the one-dimensional characterization of the overall system. While being representative, this formulation may fail to model applications with high-dimensional but favorable structures, such as the low-rank tensor representation for recommender systems. To address this limitation, this work studies a general tensor bandits model, where actions and sys… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to the 2023 IEEE International Symposium on Information Theory (ISIT 2023)

  20. arXiv:2305.02441  [pdf, other

    stat.ML cs.IT cs.LG eess.SP

    Reward Teaching for Federated Multi-armed Bandits

    Authors: Chengshuai Shi, Wei Xiong, Cong Shen, **g Yang

    Abstract: Most of the existing federated multi-armed bandits (FMAB) designs are based on the presumption that clients will implement the specified design to collaborate with the server. In reality, however, it may not be possible to modify the clients' existing protocols. To address this challenge, this work focuses on clients who always maximize their individual cumulative rewards, and introduces a novel i… ▽ More

    Submitted 20 November, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted to IEEE Transactions on Signal Processing

  21. arXiv:2303.14281  [pdf, other

    stat.ML cs.LG

    Sequential Knockoffs for Variable Selection in Reinforcement Learning

    Authors: Tao Ma, Hengrui Cai, Zhengling Qi, Chengchun Shi, Eric B. Laber

    Abstract: In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state which is larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the sta… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  22. arXiv:2302.12777  [pdf, other

    stat.ME econ.EM

    On the Misspecification of Linear Assumptions in Synthetic Control

    Authors: Achille Nazaret, Claudia Shi, David M. Blei

    Abstract: The synthetic control (SC) method is a popular approach for estimating treatment effects from observational panel data. It rests on a crucial assumption that we can write the treated unit as a linear combination of the untreated units. This linearity assumption, however, can be unlikely to hold in practice and, when violated, the resulting SC estimates are incorrect. In this paper we examine two q… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  23. arXiv:2302.12670  [pdf, ps, other

    stat.ME cs.LG econ.EM stat.ML

    Personalized Pricing with Invalid Instrumental Variables: Identification, Estimation, and Policy Learning

    Authors: Rui Miao, Zhengling Qi, Cong Shi, Lin Lin

    Abstract: Pricing based on individual customer characteristics is widely used to maximize sellers' revenues. This work studies offline personalized pricing under endogeneity using an instrumental variable approach. Standard instrumental variable methods in causal inference/econometrics either focus on a discrete treatment space or require the exclusion restriction of instruments from having a direct effect… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  24. arXiv:2302.03821  [pdf, other

    cs.LG math.OC stat.ME stat.ML

    PASTA: Pessimistic Assortment Optimization

    Authors: Juncheng Dong, Weibin Mo, Zhengling Qi, Cong Shi, Ethan X. Fang, Vahid Tarokh

    Abstract: We consider a class of assortment optimization problems in an offline data-driven setting. A firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimiza… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  25. arXiv:2301.13348  [pdf, other

    stat.ML cs.LG stat.ME

    A Reinforcement Learning Framework for Dynamic Mediation Analysis

    Authors: Lin Ge, Jitao Wang, Chengchun Shi, Zhenke Wu, Rui Song

    Abstract: Mediation analysis learns the causal effect transmitted via mediator variables between treatments and outcomes and receives increasing attention in various scientific domains to elucidate causal relations. Most existing works focus on point-exposure studies where each subject only receives one treatment at a single time point. However, there are a number of applications (e.g., mobile health) where… ▽ More

    Submitted 2 September, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  26. arXiv:2301.02220  [pdf, ps, other

    stat.ML cs.LG

    Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization

    Authors: Chengchun Shi, Zhengling Qi, Jianing Wang, Fan Zhou

    Abstract: Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing literature are developed in \textit{online} settings where the data are easy to collect or simulate. Motivated by high stake domains such as mobile health studies with l… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

  27. arXiv:2301.00927  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    Deep Spectral Q-learning with Application to Mobile Health

    Authors: Yuhe Gao, Chengchun Shi, Rui Song

    Abstract: Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

  28. arXiv:2212.14468  [pdf, other

    stat.ML cs.LG stat.ME

    An Instrumental Variable Approach to Confounded Off-Policy Evaluation

    Authors: Yang Xu, ** Zhu, Chengchun Shi, Shikai Luo, Rui Song

    Abstract: Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable… ▽ More

    Submitted 2 February, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

  29. arXiv:2212.14466  [pdf, other

    stat.ML cs.LG

    Quantile Off-Policy Evaluation via Deep Conditional Generative Learning

    Authors: Yang Xu, Chengchun Shi, Shikai Luo, Lan Wang, Rui Song

    Abstract: Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy. It is critical in a number of sequential decision making problems ranging from healthcare to technology industries. Most of the work in existing literature is focused on evaluating the mean outcome of a given policy, and ignores the variability of th… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

  30. arXiv:2212.13069  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Homophily modulates double descent generalization in graph convolution networks

    Authors: Cheng Shi, Liming Pan, Hong Hu, Ivan Dokmanić

    Abstract: Graph neural networks (GNNs) excel in modeling relational data such as biological, social, and transportation networks, but the underpinnings of their success are not well understood. Traditional complexity measures from statistical learning theory fail to account for observed phenomena like the double descent or the impact of relational semantics on generalization error. Motivated by experimental… ▽ More

    Submitted 23 January, 2024; v1 submitted 26 December, 2022; originally announced December 2022.

  31. arXiv:2212.06355  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    A Review of Off-Policy Evaluation in Reinforcement Learning

    Authors: Masatoshi Uehara, Chengchun Shi, Nathan Kallus

    Abstract: Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Still under revision

  32. arXiv:2212.00885  [pdf, other

    stat.AP

    Identifying most typical and most ideal attribute levels in small populations of expert decision makers: Studying the Go/No Go decision of disaster relief organizations

    Authors: Paul Isihara, Chaojun Shi, Jonathan Ward, Leo O'Malley, Skyler Laney, Danilo Diedrichs, Gabriel Flores

    Abstract: This paper proposes the use of Most Typical (MT) and Most Ideal (MI) levels when an adaptive choice-based conjoint (ACBC) survey can only obtain a small sample size n from a small population size N.

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: 14 pages, 4 figures

    Journal ref: Journal of Choice Modelling Volume 35, June 2020, 100204

  33. arXiv:2211.03983  [pdf, other

    stat.ML cs.AI cs.LG

    Doubly Inhomogeneous Reinforcement Learning

    Authors: Liyuan Hu, Mengbing Li, Chengchun Shi, Zhenke Wu, Piotr Fryzlewicz

    Abstract: This paper studies reinforcement learning (RL) in doubly inhomogeneous environments under temporal non-stationarity and subject heterogeneity. In a number of applications, it is commonplace to encounter datasets generated by system dynamics that may change over time and population, challenging high-quality sequential decision making. Nonetheless, most existing RL solutions require either temporal… ▽ More

    Submitted 12 November, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

  34. arXiv:2210.14420  [pdf, other

    stat.ML cs.LG

    Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach

    Authors: Yunzhe Zhou, Zhengling Qi, Chengchun Shi, Lexin Li

    Abstract: In this article, we propose a novel pessimism-based Bayesian learning method for optimal dynamic treatment regimes in the offline setting. When the coverage condition does not hold, which is common for offline data, the existing solutions would produce sub-optimal policies. The pessimism principle addresses this issue by discouraging recommendation of actions that are less explored conditioning on… ▽ More

    Submitted 21 February, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: 18 pages, 6 figures. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023

  35. arXiv:2209.15448  [pdf, other

    cs.LG math.ST stat.ME

    Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

    Authors: Jiayi Wang, Zhengling Qi, Chengchun Shi

    Abstract: As AI becomes more prevalent throughout society, effective methods of integrating humans and AI systems that leverage their respective strengths and mitigate risk have become an important priority. In this paper, we introduce the paradigm of super reinforcement learning that takes advantage of Human-AI interaction for data driven sequential decision making. This approach utilizes the observed acti… ▽ More

    Submitted 20 October, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

  36. arXiv:2209.11363  [pdf, other

    stat.ME stat.AP stat.CO

    Sure Screening for Transelliptical Graphical Models

    Authors: Yuxiang Xie, Chengchun Shi, Rui Song

    Abstract: We propose a sure screening approach for recovering the structure of a transelliptical graphical model in the high dimensional setting. We estimate the partial correlation graph by thresholding the elements of an estimator of the sample correlation matrix obtained using Kendall's tau statistic. Under a simple assumption on the relationship between the correlation and partial correlation graphs, we… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: The paper won the David Byar travel award in the Joint Statistical Meetings (JSM) 2016

  37. arXiv:2207.13081  [pdf, other

    cs.LG stat.ML

    Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

    Authors: Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

    Abstract: We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs.… ▽ More

    Submitted 14 November, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: This paper was accepted in NeurIPS 2023

  38. arXiv:2206.06711  [pdf, other

    stat.ML cs.LG

    Conformal Off-policy Prediction

    Authors: Yingying Zhang, Chengchun Shi, Shikai Luo

    Abstract: Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment. Most existing methods focus on the expected return, define the target parameter through averaging and provide a point estimator only. In this paper, we develop a novel procedure to produce reliable interval estimators for a target policy's return starting from any… ▽ More

    Submitted 9 February, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: This paper is accepted at the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023)

  39. arXiv:2205.15512  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game

    Authors: Wei Xiong, Han Zhong, Chengshuai Shi, Cong Shen, Liwei Wang, Tong Zhang

    Abstract: Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-collected dataset without further interactions with the environment. While various algorithms have been proposed for offline RL in the previous literature, the minimax optimality has only been (nearly) established for tabular Markov decision processes (MDPs). In this paper, we focus on offline RL with linear funct… ▽ More

    Submitted 1 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  40. arXiv:2203.01707  [pdf, other

    stat.ML cs.LG

    Testing Stationarity and Change Point Detection in Reinforcement Learning

    Authors: Mengbing Li, Chengchun Shi, Zhenke Wu, Piotr Fryzlewicz

    Abstract: We consider offline reinforcement learning (RL) methods in possibly nonstationary environments. Many existing RL algorithms in the literature rely on the stationarity assumption that requires the system transition and the reward function to be constant over time. However, the stationarity assumption is restrictive in practice and is likely to be violated in a number of applications, including traf… ▽ More

    Submitted 7 March, 2024; v1 submitted 3 March, 2022; originally announced March 2022.

  41. arXiv:2202.13163  [pdf, other

    stat.ML cs.LG

    Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

    Authors: Chengchun Shi, Shikai Luo, Yuan Le, Hongtu Zhu, Rui Song

    Abstract: We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remai… ▽ More

    Submitted 26 July, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

  42. arXiv:2202.10887  [pdf, other

    stat.ME cs.LG stat.ML

    Policy Evaluation for Temporal and/or Spatial Dependent Experiments

    Authors: Shikai Luo, Ying Yang, Chengchun Shi, Fang Yao, Jie** Ye, Hongtu Zhu

    Abstract: The aim of this paper is to establish a causal link between the policies implemented by technology companies and the outcomes they yield within intricate temporal and/or spatial dependent experiments. We propose a novel temporal/spatio-temporal Varying Coefficient Decision Process (VCDP) model, capable of effectively capturing the evolving treatment effects in situations characterized by temporal… ▽ More

    Submitted 3 December, 2023; v1 submitted 22 February, 2022; originally announced February 2022.

  43. arXiv:2202.10589  [pdf, other

    stat.ML cs.LG

    Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

    Authors: Chengchun Shi, ** Zhu, Ye Shen, Shikai Luo, Hongtu Zhu, Rui Song

    Abstract: This paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In th… ▽ More

    Submitted 3 November, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

  44. arXiv:2202.10574  [pdf, other

    stat.ML cs.LG

    A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

    Authors: Chengchun Shi, Runzhe Wan, Ge Song, Shikai Luo, Rui Song, Hongtu Zhu

    Abstract: The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multi… ▽ More

    Submitted 26 March, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

  45. arXiv:2112.05671  [pdf, other

    stat.ME econ.EM

    On the Assumptions of Synthetic Control Methods

    Authors: Claudia Shi, Dhanya Sridhar, Vishal Misra, David M. Blei

    Abstract: Synthetic control (SC) methods have been widely applied to estimate the causal effect of large-scale interventions, e.g., the state-wide effect of a change in policy. The idea of synthetic controls is to approximate one unit's counterfactual outcomes using a weighted combination of some other units' observed outcomes. The motivating question of this paper is: how does the SC strategy lead to valid… ▽ More

    Submitted 14 December, 2021; v1 submitted 10 December, 2021; originally announced December 2021.

  46. arXiv:2112.03493  [pdf, other

    stat.ME

    Conformal Sensitivity Analysis for Individual Treatment Effects

    Authors: Mingzhang Yin, Claudia Shi, Yixin Wang, David M. Blei

    Abstract: Estimating an individual treatment effect (ITE) is essential to personalized decision making. However, existing methods for estimating the ITE often rely on unconfoundedness, an assumption that is fundamentally untestable with observed data. To assess the robustness of individual-level causal conclusion with unconfoundedness, this paper proposes a method for sensitivity analysis of the ITE, a way… ▽ More

    Submitted 12 July, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: Journal of the American Statistical Association

  47. arXiv:2111.08885  [pdf, other

    stat.ME cs.LG math.ST stat.ML

    Jump Interval-Learning for Individualized Decision Making

    Authors: Hengrui Cai, Chengchun Shi, Rui Song, Wenbin Lu

    Abstract: An individualized decision rule (IDR) is a decision function that assigns each individual a given treatment based on his/her observed characteristics. Most of the existing works in the literature consider settings with binary or finitely many treatment options. In this paper, we focus on the continuous treatment setting and propose a jump interval-learning to develop an individualized interval-val… ▽ More

    Submitted 28 January, 2023; v1 submitted 16 November, 2021; originally announced November 2021.

  48. arXiv:2111.06784  [pdf, other

    cs.LG stat.ML

    A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

    Authors: Chengchun Shi, Masatoshi Uehara, Jiawei Huang, Nan Jiang

    Abstract: We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), where the evaluation policy depends only on observable variables and the behavior policy depends on unobservable latent variables. Existing works either assume no unmeasured confounders, or focus on settings where both the observation and the state spaces are tabular. In this work, we first propose… ▽ More

    Submitted 15 June, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

  49. arXiv:2111.03908  [pdf, other

    stat.ME math.ST

    An Online Sequential Test for Qualitative Treatment Effects

    Authors: Chengchun Shi, Shikai Luo, Hongtu Zhu, Rui Song

    Abstract: Tech companies (e.g., Google or Facebook) often use randomized online experiments and/or A/B testing primarily based on the average treatment effects to compare their new product with an old one. However, it is also critically important to detect qualitative treatment effects such that the new one may significantly outperform the existing one only under some specific circumstances. The aim of this… ▽ More

    Submitted 6 November, 2021; originally announced November 2021.

  50. arXiv:2110.14628  [pdf, ps, other

    stat.ML cs.IT cs.LG

    (Almost) Free Incentivized Exploration from Decentralized Learning Agents

    Authors: Chengshuai Shi, Haifeng Xu, Wei Xiong, Cong Shen

    Abstract: Incentivized exploration in multi-armed bandits (MAB) has witnessed increasing interests and many progresses in recent years, where a principal offers bonuses to agents to do explorations on her behalf. However, almost all existing studies are confined to temporary myopic agents. In this work, we break this barrier and study incentivized exploration with multiple and long-term strategic agents, wh… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021, camera-ready version