Skip to main content

Showing 1–15 of 15 results for author: Wan, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2312.15595  [pdf, other

    stat.ML cs.LG econ.EM

    Zero-Inflated Bandits

    Authors: Haoyu Wei, Runzhe Wan, Lei Shi, Rui Song

    Abstract: Many real applications of bandits have sparse non-zero rewards, leading to slow learning rates. A careful distribution modeling that utilizes problem-specific structures is known as critical to estimation efficiency in the statistics literature, yet is under-explored in bandits. To fill the gap, we initiate the study of zero-inflated bandits, where the reward is modeled as a classic semi-parametri… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

  2. arXiv:2312.12871  [pdf, other

    cs.LG stat.ML

    Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches

    Authors: Yu Liu, Runzhe Wan, James McQueen, Doug Hains, **xiang Gu, Rui Song

    Abstract: The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge. However, this method becomes impractical for online experimentation services managing numerous experiments, and a more automated approach is hence of great demand. We initiate the study of da… ▽ More

    Submitted 17 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  3. arXiv:2310.18715  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Offline Reinforcement learning with Heavy-Tailed Rewards

    Authors: ** Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi

    Abstract: This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-m… ▽ More

    Submitted 30 March, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: 23 pages, 6 figures. Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

  4. arXiv:2301.13152  [pdf, other

    stat.ML cs.LG econ.EM stat.ME

    STEEL: Singularity-aware Reinforcement Learning

    Authors: Xiaohong Chen, Zhengling Qi, Runzhe Wan

    Abstract: Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. The existing methods require absolutely continuous assumption (e.g., there do not exist non-overlap** regions) on the distribution induced by target policies with respect to the data distribution over either the state or action or b… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

  5. arXiv:2212.14580  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    Heterogeneous Synthetic Learner for Panel Data

    Authors: Ye Shen, Runzhe Wan, Hengrui Cai, Rui Song

    Abstract: In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on th… ▽ More

    Submitted 29 January, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

  6. arXiv:2212.12845  [pdf, ps, other

    stat.ME cs.LG

    Mining the Factor Zoo: Estimation of Latent Factor Models with Sufficient Proxies

    Authors: Runzhe Wan, Yingying Li, Wenbin Lu, Rui Song

    Abstract: Latent factor model estimation typically relies on either using domain knowledge to manually pick several observed covariates as factor proxies, or purely conducting multivariate analysis such as principal component analysis. However, the former approach may suffer from the bias while the latter can not incorporate additional information. We propose to bridge these two approaches while allowing th… ▽ More

    Submitted 2 January, 2023; v1 submitted 24 December, 2022; originally announced December 2022.

  7. arXiv:2202.10574  [pdf, other

    stat.ML cs.LG

    A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

    Authors: Chengchun Shi, Runzhe Wan, Ge Song, Shikai Luo, Rui Song, Hongtu Zhu

    Abstract: The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multi… ▽ More

    Submitted 26 March, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

  8. arXiv:2105.04646  [pdf, other

    stat.ML cs.AI cs.LG

    Deeply-Debiased Off-Policy Interval Estimation

    Authors: Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song

    Abstract: Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexib… ▽ More

    Submitted 7 June, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

  9. arXiv:2009.04607  [pdf, other

    cs.LG stat.ML

    Multi-Objective Model-based Reinforcement Learning for Infectious Disease Control

    Authors: Runzhe Wan, Xinyu Zhang, Rui Song

    Abstract: Severe infectious diseases such as the novel coronavirus (COVID-19) pose a huge threat to public health. Stringent control measures, such as school closures and stay-at-home orders, while having significant effects, also bring huge economic losses. In the face of an emerging infectious disease, a crucial question for policymakers is how to make the trade-off and implement the appropriate intervent… ▽ More

    Submitted 26 February, 2022; v1 submitted 9 September, 2020; originally announced September 2020.

  10. arXiv:2007.11771  [pdf, other

    math.ST stat.ML

    Batch Policy Learning in Average Reward Markov Decision Processes

    Authors: Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan Murphy

    Abstract: We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optim… ▽ More

    Submitted 17 September, 2022; v1 submitted 22 July, 2020; originally announced July 2020.

  11. arXiv:2006.08419  [pdf, other

    stat.ML cs.CV cs.LG

    Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD

    Authors: Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun

    Abstract: In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD). Most related works study SMD by focusing on "effective learning rate" in "equilibrium" condition, where weight norm remains unchanged. However, their discussions on why equilibrium condition can be reached in SMD i… ▽ More

    Submitted 27 November, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Theoretical analysis on joint effect of normalization and weight decay

  12. arXiv:2002.01751  [pdf, other

    stat.ML cs.LG

    Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making

    Authors: Chengchun Shi, Runzhe Wan, Rui Song, Wenbin Lu, Ling Leng

    Abstract: The Markov assumption (MA) is fundamental to the empirical validity of reinforcement learning. In this paper, we propose a novel Forward-Backward Learning procedure to test MA in sequential decision making. The proposed test does not assume any parametric form on the joint distribution of the observed data and plays an important role for identifying the optimal policy in high-order Markov decision… ▽ More

    Submitted 5 February, 2020; originally announced February 2020.

  13. arXiv:2001.06838  [pdf, other

    cs.CV cs.LG stat.ML

    Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

    Authors: Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei Zhang, Yichen Wei, Jian Sun

    Abstract: Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been pr… ▽ More

    Submitted 8 April, 2020; v1 submitted 19 January, 2020; originally announced January 2020.

    Comments: ICLR2020; https://github.com/megvii-model/MABN

  14. arXiv:1911.07489  [pdf, other

    cs.LG stat.ML

    Towards Making Deep Transfer Learning Never Hurt

    Authors: Ruosi Wan, Haoyi Xiong, Xingjian Li, Zhanxing Zhu, Jun Huan

    Abstract: Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networks hurts training procedure and may lead to even l… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

    Comments: 10 pages

    Journal ref: accapted as long paper at the 19th IEEE International Conference on Data Mining, 2019

  15. arXiv:1806.00159  [pdf, other

    stat.ML cs.LG

    Neural Control Variates for Variance Reduction

    Authors: Ruosi Wan, Mingjun Zhong, Haoyi Xiong, Zhanxing Zhu

    Abstract: In statistics and machine learning, approximation of an intractable integration is often achieved by using the unbiased Monte Carlo estimator, but the variances of the estimation are generally high in many applications. Control variates approaches are well-known to reduce the variance of the estimation. These control variates are typically constructed by employing predefined parametric functions o… ▽ More

    Submitted 15 October, 2019; v1 submitted 31 May, 2018; originally announced June 2018.

    Comments: Published as a conference paper at ECML PKDD 2019