Skip to main content

Showing 1–32 of 32 results for author: Qi, Z

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.01900  [pdf, other

    stat.ML cs.LG

    Distributional Off-policy Evaluation with Bellman Residual Minimization

    Authors: Sungee Hong, Zhengling Qi, Raymond K. W. Wong

    Abstract: We consider the problem of distributional off-policy evaluation which serves as the foundation of many distributional reinforcement learning (DRL) algorithms. In contrast to most existing works (that rely on supremum-extended statistical distances such as supremum-Wasserstein distance), we study the expectation-extended statistical distance for quantifying the distributional Bellman residuals and… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  2. arXiv:2310.18715  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Offline Reinforcement learning with Heavy-Tailed Rewards

    Authors: ** Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi

    Abstract: This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-m… ▽ More

    Submitted 30 March, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: 23 pages, 6 figures. Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

  3. arXiv:2306.08719  [pdf, other

    stat.ME cs.LG

    Off-policy Evaluation in Doubly Inhomogeneous Environments

    Authors: Zeyu Bian, Chengchun Shi, Zhengling Qi, Lan Wang

    Abstract: This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we propose a class of latent factor models for the reward and observation transition functions, under which we develop a general OPE framework that consists of both m… ▽ More

    Submitted 7 September, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  4. arXiv:2305.17083  [pdf, other

    stat.ML cs.LG econ.EM math.ST stat.ME

    A Policy Gradient Method for Confounded POMDPs

    Authors: Mao Hong, Zhengling Qi, Yanxun Xu

    Abstract: In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of c… ▽ More

    Submitted 30 November, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: 95 pages, 3 figures

  5. arXiv:2303.14281  [pdf, other

    stat.ML cs.LG

    Sequential Knockoffs for Variable Selection in Reinforcement Learning

    Authors: Tao Ma, Hengrui Cai, Zhengling Qi, Chengchun Shi, Eric B. Laber

    Abstract: In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state which is larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the sta… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  6. arXiv:2302.12670  [pdf, ps, other

    stat.ME cs.LG econ.EM stat.ML

    Personalized Pricing with Invalid Instrumental Variables: Identification, Estimation, and Policy Learning

    Authors: Rui Miao, Zhengling Qi, Cong Shi, Lin Lin

    Abstract: Pricing based on individual customer characteristics is widely used to maximize sellers' revenues. This work studies offline personalized pricing under endogeneity using an instrumental variable approach. Standard instrumental variable methods in causal inference/econometrics either focus on a discrete treatment space or require the exclusion restriction of instruments from having a direct effect… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  7. arXiv:2302.03821  [pdf, other

    cs.LG math.OC stat.ME stat.ML

    PASTA: Pessimistic Assortment Optimization

    Authors: Juncheng Dong, Weibin Mo, Zhengling Qi, Cong Shi, Ethan X. Fang, Vahid Tarokh

    Abstract: We consider a class of assortment optimization problems in an offline data-driven setting. A firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimiza… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  8. arXiv:2301.13152  [pdf, other

    stat.ML cs.LG econ.EM stat.ME

    STEEL: Singularity-aware Reinforcement Learning

    Authors: Xiaohong Chen, Zhengling Qi, Runzhe Wan

    Abstract: Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. The existing methods require absolutely continuous assumption (e.g., there do not exist non-overlap** regions) on the distribution induced by target policies with respect to the data distribution over either the state or action or b… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

  9. arXiv:2301.02220  [pdf, ps, other

    stat.ML cs.LG

    Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization

    Authors: Chengchun Shi, Zhengling Qi, Jianing Wang, Fan Zhou

    Abstract: Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing literature are developed in \textit{online} settings where the data are easy to collect or simulate. Motivated by high stake domains such as mobile health studies with l… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

  10. arXiv:2212.12167  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

    Authors: Zuyue Fu, Zhengling Qi, Zhuoran Yang, Zhaoran Wang, Lan Wang

    Abstract: Motivated by the human-machine interaction such as training chatbots for improving customer satisfaction, we study human-guided human-machine interaction involving private information. We model this interaction as a two-player turn-based game, where one player (Alice, a human) guides the other player (Bob, a machine) towards a common goal. Specifically, we focus on offline reinforcement learning (… ▽ More

    Submitted 23 December, 2022; originally announced December 2022.

  11. arXiv:2211.06569  [pdf, other

    cs.LG stat.AP stat.ME stat.ML

    RISE: Robust Individualized Decision Learning with Sensitive Variables

    Authors: Xiaoqing Tan, Zhengling Qi, Christopher W. Seymour, Lu Tang

    Abstract: This paper introduces RISE, a robust individualized decision learning framework with sensitive variables, where sensitive variables are collectible data and important to the intervention decision, but their inclusion in decision making is prohibited due to reasons such as delayed availability or fairness concerns. A naive baseline is to ignore these sensitive variables in learning decision rules,… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: Accepted at NeurIPS 2022

  12. arXiv:2210.14420  [pdf, other

    stat.ML cs.LG

    Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach

    Authors: Yunzhe Zhou, Zhengling Qi, Chengchun Shi, Lexin Li

    Abstract: In this article, we propose a novel pessimism-based Bayesian learning method for optimal dynamic treatment regimes in the offline setting. When the coverage condition does not hold, which is common for offline data, the existing solutions would produce sub-optimal policies. The pessimism principle addresses this issue by discouraging recommendation of actions that are less explored conditioning on… ▽ More

    Submitted 21 February, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: 18 pages, 6 figures. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023

  13. arXiv:2209.15448  [pdf, other

    cs.LG math.ST stat.ME

    Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

    Authors: Jiayi Wang, Zhengling Qi, Chengchun Shi

    Abstract: As AI becomes more prevalent throughout society, effective methods of integrating humans and AI systems that leverage their respective strengths and mitigate risk have become an important priority. In this paper, we introduce the paradigm of super reinforcement learning that takes advantage of Human-AI interaction for data driven sequential decision making. This approach utilizes the observed acti… ▽ More

    Submitted 20 October, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

  14. arXiv:2209.10064  [pdf, other

    stat.ML cs.LG math.ST

    Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models

    Authors: Rui Miao, Zhengling Qi, Xiaoke Zhang

    Abstract: We study the problem of off-policy evaluation (OPE) for episodic Partially Observable Markov Decision Processes (POMDPs) with continuous states. Motivated by the recently proposed proximal causal inference framework, we develop a non-parametric identification result for estimating the policy value via a sequence of so-called V-bridge functions with the help of time-dependent proxy variables. We th… ▽ More

    Submitted 16 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

  15. arXiv:2209.08666  [pdf, other

    cs.LG stat.ME

    Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

    Authors: Zuyue Fu, Zhengling Qi, Zhaoran Wang, Zhuoran Yang, Yanxun Xu, Michael R. Kosorok

    Abstract: We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment. To tackle the above chal… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

  16. arXiv:2207.11532  [pdf, other

    stat.ME

    Change Point Detection for High-dimensional Linear Models: A General Tail-adaptive Approach

    Authors: Bin Liu, Zhengling Qi, Xinsheng Zhang, Yufeng Liu

    Abstract: We propose a novel approach for detecting change points in high-dimensional linear regression models. Unlike previous research that relied on strict Gaussian/sub-Gaussian error assumptions and had prior knowledge of change points, we propose a tail-adaptive method for change point detection and estimation. We use a weighted combination of composite quantile and least squared losses to build a new… ▽ More

    Submitted 21 May, 2024; v1 submitted 23 July, 2022; originally announced July 2022.

  17. arXiv:2201.06169  [pdf, ps, other

    math.ST cs.LG econ.EM stat.ML

    On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

    Authors: Xiaohong Chen, Zhengling Qi

    Abstract: We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of $Q$-function estimation is well-posed in the sense of $L^2$-measure of… ▽ More

    Submitted 26 June, 2022; v1 submitted 16 January, 2022; originally announced January 2022.

  18. Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment Rules

    Authors: Weibin Mo, Zhengling Qi, Yufeng Liu

    Abstract: We thank the opportunity offered by editors for this discussion and the discussants for their insightful comments and thoughtful contributions. We also want to congratulate Kallus (2020) for his inspiring work in improving the efficiency of policy learning by retargeting. Motivated from the discussion in Dukes and Vansteelandt (2020), we first point out interesting connections and distinctions bet… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Journal ref: Journal of the American Statistical Association, 116:534, 699-707 (2021)

  19. arXiv:2109.04640  [pdf, other

    cs.LG stat.ME

    Projected State-action Balancing Weights for Offline Reinforcement Learning

    Authors: Jiayi Wang, Zhengling Qi, Raymond K. W. Wong

    Abstract: Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on the value estimation of a target policy based on pre-collected data generated from a possibly different policy, under the framework of infinite-horizon Markov decision processes. Motivated by the recently developed marginal importance sampling method in RL and t… ▽ More

    Submitted 9 June, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

  20. arXiv:2105.10635  [pdf, other

    cs.LG stat.ML

    Two-stage Training for Learning from Label Proportions

    Authors: Jiabin Liu, Bo Wang, Xin Shen, Zhiquan Qi, Yingjie Tian

    Abstract: Learning from label proportions (LLP) aims at learning an instance-level classifier with label proportions in grouped training data. Existing deep learning based LLP methods utilize end-to-end pipelines to obtain the proportional loss with Kullback-Leibler divergence between the bag-level prior and posterior class distributions. However, the unconstrained optimization on this objective can hardly… ▽ More

    Submitted 21 May, 2021; originally announced May 2021.

    Comments: 10 pages, 4 figures, 5 tables, accepted by IJCAI 2021

  21. arXiv:2105.01187  [pdf, ps, other

    stat.ME cs.LG stat.ML

    Proximal Learning for Individualized Treatment Regimes Under Unmeasured Confounding

    Authors: Zhengling Qi, Rui Miao, Xiaoke Zhang

    Abstract: Data-driven individualized decision making has recently received increasing research interests. Most existing methods rely on the assumption of no unmeasured confounding, which unfortunately cannot be ensured in practice especially in observational studies. Motivated by the recent proposed proximal causal inference, we develop several proximal learning approaches to estimating optimal individualiz… ▽ More

    Submitted 22 December, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

  22. arXiv:2011.04185  [pdf, other

    math.ST cs.LG stat.ML

    Robust Batch Policy Learning in Markov Decision Processes

    Authors: Zhengling Qi, Peng Liao

    Abstract: We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP). In order to enhance the generalizability and adaptivity of the learned policy, we propose to evaluate each policy by a set of the average rewards with respect to distributions centered at the policy induced stationary distribution. Given a pre-collected dataset of multiple traject… ▽ More

    Submitted 9 November, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

  23. arXiv:2007.11771  [pdf, other

    math.ST stat.ML

    Batch Policy Learning in Average Reward Markov Decision Processes

    Authors: Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan Murphy

    Abstract: We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optim… ▽ More

    Submitted 17 September, 2022; v1 submitted 22 July, 2020; originally announced July 2020.

  24. arXiv:2006.15121  [pdf, other

    stat.ML cs.LG

    Learning Optimal Distributionally Robust Individualized Treatment Rules

    Authors: Weibin Mo, Zhengling Qi, Yufeng Liu

    Abstract: Recent development in the data-driven decision science has seen great advances in individualized decision making. Given data with individual covariates, treatment assignments and outcomes, policy makers best individualized treatment rule (ITR) that maximizes the expected outcome, known as the value function. Many existing methods assume that the training and testing distributions are the same. How… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  25. arXiv:1912.08865  [pdf, other

    cs.LG math.ST stat.ML

    Adversarial VC-dimension and Sample Complexity of Neural Networks

    Authors: Zetong Qi, T. J. Wilder

    Abstract: Adversarial attacks during the testing phase of neural networks pose a challenge for the deployment of neural networks in security critical settings. These attacks can be performed by adding noise that is imperceptible to humans on top of the original data. By doing so, an attacker can create an adversarial sample, which will cause neural networks to misclassify. In this paper, we seek to understa… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

  26. arXiv:1911.08967  [pdf, ps, other

    cs.LG stat.ML

    Transfer Learning Toolkit: Primers and Benchmarks

    Authors: Fuzhen Zhuang, Keyu Duan, Tongjia Guo, Yongchun Zhu, Dongbo Xi, Zhiyuan Qi, Qing He

    Abstract: The transfer learning toolkit wraps the codes of 17 transfer learning models and provides integrated interfaces, allowing users to use those models by calling a simple function. It is easy for primary researchers to use this toolkit and to choose proper models for real-world applications. The toolkit is written in Python and distributed under MIT open source license. In this paper, the current sta… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: A Transfer Learning Toolkit

  27. arXiv:1911.02685  [pdf, ps, other

    cs.LG stat.ML

    A Comprehensive Survey on Transfer Learning

    Authors: Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, Qing He

    Abstract: Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learn… ▽ More

    Submitted 23 June, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

    Comments: 31 pages, 7 figures

  28. arXiv:1909.02180  [pdf, other

    cs.LG stat.ML

    Learning from Label Proportions with Generative Adversarial Networks

    Authors: Jiabin Liu, Bo Wang, Zhiquan Qi, Yingjie Tian, Yong Shi

    Abstract: In this paper, we leverage generative adversarial networks (GANs) to derive an effective algorithm LLP-GAN for learning from label proportions (LLP), where only the bag-level proportional information in labels is available. Endowed with end-to-end structure, LLP-GAN performs approximation in the light of an adversarial learning mechanism, without imposing restricted assumptions on distribution. Ac… ▽ More

    Submitted 2 December, 2019; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: Accepted as a conference paper at NeurIPS 2019

  29. arXiv:1908.10742  [pdf, ps, other

    math.OC cs.LG stat.ME stat.ML

    Estimation of Individualized Decision Rules Based on an Optimized Covariate-Dependent Equivalent of Random Outcomes

    Authors: Zhengling Qi, Ying Cui, Yufeng Liu, Jong-Shi Pang

    Abstract: Recent exploration of optimal individualized decision rules (IDRs) for patients in precision medicine has attracted a lot of attention due to the heterogeneous responses of patients to different treatments. In the existing literature of precision medicine, an optimal IDR is defined as a decision function map** from the patients' covariate space into the treatment space that maximizes the expecte… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

  30. arXiv:1903.04367  [pdf, other

    stat.ME

    On Robustness of Individualized Decision Rules

    Authors: Zhengling Qi, Jong-Shi Pang, Yufeng Liu

    Abstract: With the emergence of precision medicine, estimating optimal individualized decision rules (IDRs) has attracted tremendous attention in many scientific areas. Most existing literature has focused on finding optimal IDRs that can maximize the expected outcome for each individual. Motivated by complex individualized decision making procedures and the popular conditional value at risk (CVaR) measure,… ▽ More

    Submitted 26 June, 2022; v1 submitted 11 March, 2019; originally announced March 2019.

  31. arXiv:1812.07150  [pdf, other

    cs.LG cs.CV stat.ML

    Interactive Naming for Explaining Deep Neural Networks: A Formative Study

    Authors: Mandana Hamidi-Haines, Zhongang Qi, Alan Fern, Fuxin Li, Prasad Tadepalli

    Abstract: We consider the problem of explaining the decisions of deep neural networks for image recognition in terms of human-recognizable visual concepts. In particular, given a test set of images, we aim to explain each classification in terms of a small number of image regions, or activation maps, which have been associated with semantic concepts by a human annotator. This allows for generating summary v… ▽ More

    Submitted 20 December, 2018; v1 submitted 17 December, 2018; originally announced December 2018.

  32. arXiv:1803.06071  [pdf, other

    cs.DB cs.LG stat.ML

    Impacts of Dirty Data: and Experimental Evaluation

    Authors: Zhixin Qi, Hongzhi Wang, Jianzhong Li, Hong Gao

    Abstract: Data quality issues have attracted widespread attention due to the negative impacts of dirty data on data mining and machine learning results. The relationship between data quality and the accuracy of results could be applied on the selection of the appropriate algorithm with the consideration of data quality and the determination of the data share to clean. However, rare research has focused on e… ▽ More

    Submitted 26 April, 2021; v1 submitted 16 March, 2018; originally announced March 2018.

    Comments: 22 pages, 192 figures