Skip to main content

Showing 1–50 of 256 results for author: Yang, Z

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.17862  [pdf, other

    cs.LG stat.ML

    Towards robust prediction of material properties for nuclear reactor design under scarce data -- a study in creep rupture property

    Authors: Yu Chen, Edoardo Patelli, Zhen Yang, Adolphus Lye

    Abstract: Advances in Deep Learning bring further investigation into credibility and robustness, especially for safety-critical engineering applications such as the nuclear industry. The key challenges include the availability of data set (often scarce and sparse) and insufficient consideration of the uncertainty in the data, model, and prediction. This paper therefore presents a meta-learning based approac… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 8 pages, submitted to REC 2024 (International Workshop on Reliable Engineering Computing)

  2. arXiv:2405.17490  [pdf, other

    cs.LG stat.ML

    Revisit, Extend, and Enhance Hessian-Free Influence Functions

    Authors: Ziao Yang, Han Yue, Jian Chen, Hongfu Liu

    Abstract: Influence functions serve as crucial tools for assessing sample influence in model interpretation, subset training set selection, noisy label detection, and more. By employing the first-order Taylor extension, influence functions can estimate sample influence without the need for expensive model retraining. However, applying influence functions directly to deep models presents challenges, primaril… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2405.01744  [pdf, other

    cs.LG cs.AI cs.CL stat.ME

    ALCM: Autonomous LLM-Augmented Causal Discovery Framework

    Authors: Elahe Khatibi, Mahyar Abbasian, Zhongqi Yang, Iman Azimi, Amir M. Rahmani

    Abstract: To perform effective causal inference in high-dimensional datasets, initiating the process with causal discovery is imperative, wherein a causal graph is generated based on observational data. However, obtaining a complete and accurate causal graph poses a formidable challenge, recognized as an NP-hard problem. Recently, the advent of Large Language Models (LLMs) has ushered in a new era, indicati… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  4. arXiv:2404.12648  [pdf, ps, other

    cs.LG stat.ML

    Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation

    Authors: Jianliang He, Han Zhong, Zhuoran Yang

    Abstract: We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. Specifically, we propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP), which incorporates both model-based and value-based incarnations. In particular, LOOP features a novel construction of confidence sets and a low-switching policy upda… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  5. arXiv:2404.12312  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization

    Authors: Yuchen Zhu, Yufeng Zhang, Zhaoran Wang, Zhuoran Yang, Xiaohong Chen

    Abstract: This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparameterized two-layer neural networks. In particular, we consider the minimax optimization problem stemming from estimating linear functional equations defined by conditional expectations, where the objective functions are quadratic in the functional spaces. We address (i) the convergence o… ▽ More

    Submitted 25 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Submitted

  6. arXiv:2404.10243  [pdf

    stat.AP

    Using Multi-Source Data to Identify High-Emitting Heavy-Duty Diesel Vehicles

    Authors: Zhuoqian Yang, Ke Han, Linwei Liao, Jiaxin Wu

    Abstract: Identifying and managing high-emitters among heavy-duty diesel vehicles is a key to mitigating urban air pollution, as a small number of such vehicles could contribute a significant amount of total transport emissions. On-board monitoring (OBM) systems can directly monitor the real-time emission performance of heavy-duty vehicles on road and have become part of the future emissions compliance fram… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 17 pages, 4 figures

  7. arXiv:2404.09362  [pdf, other

    stat.ME stat.AP

    A Bayesian Joint Modelling for Misclassified Interval-censoring and Competing Risks

    Authors: Zhenwei Yang, Dimitris Rizopoulos, Eveline A. M. Heijnsdijk, Lisa F. Newcomb, Nicole S. Erler

    Abstract: In active surveillance of prostate cancer, cancer progression is interval-censored and the examination to detect progression is subject to misclassification, usually false negatives. Meanwhile, patients may initiate early treatment before progression detection, constituting a competing risk. We developed the Misclassification-Corrected Interval-censored Cause-specific Joint Model (MCICJM) to estim… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  8. arXiv:2403.18549  [pdf, other

    stat.ME

    A communication-efficient, online changepoint detection method for monitoring distributed sensor networks

    Authors: Ziyang Yang, Idris A. Eckley, Paul Fearnhead

    Abstract: We consider the challenge of efficiently detecting changes within a network of sensors, where we also need to minimise communication between sensors and the cloud. We propose an online, communication-efficient method to detect such changes. The procedure works by performing likelihood ratio tests at each time point, and two thresholds are chosen to filter unimportant test statistics and make decis… ▽ More

    Submitted 9 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 36 pages, 8 figures, 5 tables

  9. arXiv:2403.11968  [pdf, other

    cs.LG math.ST stat.ML

    Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory

    Authors: Hengyu Fu, Zhuoran Yang, Mengdi Wang, Minshuo Chen

    Abstract: Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning. In these applications, conditional diffusion models incorporate various conditional information, such as prompt input, to guide the sample generation towards desired properties. Despite the empirical success, theory of condit… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 92 pages, 5 figures

  10. arXiv:2403.00993  [pdf, other

    cs.LG cs.AI stat.ML

    On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

    Authors: Awni Altabaa, Zhuoran Yang

    Abstract: In a sequential decision-making problem, the information structure is the description of how events in the system occurring at different points in time affect each other. Classical models of reinforcement learning (e.g., MDPs, POMDPs) assume a simple and highly regular information structure, while more general models like predictive state representations do not explicitly model the information str… ▽ More

    Submitted 27 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 59 pages, 5 figures

  11. arXiv:2402.19442  [pdf, other

    cs.LG cs.AI math.OC math.ST stat.ML

    Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality

    Authors: Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang

    Abstract: We study the dynamics of gradient flow for training a multi-head softmax attention model for in-context learning of multi-task linear regression. We establish the global convergence of gradient flow under suitable choices of initialization. In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving… ▽ More

    Submitted 10 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 141 pages, 7 figures

  12. arXiv:2402.10810  [pdf, ps, other

    cs.LG math.OC stat.ML

    Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning

    Authors: Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang

    Abstract: We study the Constrained Convex Markov Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure, subject to a convex constraint. Designing algorithms for a constrained convex MDP faces several challenges, including (1) handling the large state space, (2) managing the exploration/exploitation tradeoff, and (3) solving the constrained optimization where the… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  13. arXiv:2402.06886  [pdf, other

    cs.LG math.OC stat.ML

    Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF

    Authors: Han Shen, Zhuoran Yang, Tianyi Chen

    Abstract: Bilevel optimization has been recently applied to many machine learning tasks. However, their applications have been restricted to the supervised learning setting, where static objective functions with benign structures are considered. But bilevel problems such as incentive design, inverse reinforcement learning (RL), and RL from human feedback (RLHF) are often modeled as dynamic objective functio… ▽ More

    Submitted 31 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: Shorter version accepted to ICML 2024

  14. arXiv:2402.05203  [pdf, other

    cs.LG stat.ML

    Bellman Conformal Inference: Calibrating Prediction Intervals For Time Series

    Authors: Zitong Yang, Emmanuel Candès, Lihua Lei

    Abstract: We introduce Bellman Conformal Inference (BCI), a framework that wraps around any time series forecasting models and provides approximately calibrated prediction intervals. Unlike existing methods, BCI is able to leverage multi-step ahead forecasts and explicitly optimize the average interval lengths by solving a one-dimensional stochastic control problem (SCP) at each time step. In particular, we… ▽ More

    Submitted 9 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 17 pages, 4 figures

  15. arXiv:2402.01460  [pdf, other

    stat.ML cs.LG

    Deep conditional distribution learning via conditional Föllmer flow

    Authors: **yuan Chang, Zhao Ding, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang

    Abstract: We introduce an ordinary differential equation (ODE) based deep generative method for learning conditional distributions, named Conditional Föllmer Flow. Starting from a standard Gaussian distribution, the proposed flow could approximate the target conditional distribution very well when the time is close to 1. For effective implementation, we discretize the flow with Euler's method where we estim… ▽ More

    Submitted 13 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: The original title of this paper is "Deep Conditional Generative Learning: Model and Error Analysis"

  16. arXiv:2401.04535  [pdf, other

    stat.ML cs.LG

    Semi-Supervised Deep Sobolev Regression: Estimation, Variable Selection and Beyond

    Authors: Zhao Ding, Chenguang Duan, Yuling Jiao, Jerry Zhijian Yang

    Abstract: We propose SDORE, a semi-supervised deep Sobolev regressor, for the nonparametric estimation of the underlying regression function and its gradient. SDORE employs deep neural networks to minimize empirical risk with gradient norm regularization, allowing computation of the gradient norm on unlabeled data. We conduct a comprehensive analysis of the convergence rates of SDORE and establish a minimax… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    MSC Class: 62G05; 62G08; 65N21

  17. arXiv:2312.10920  [pdf, other

    cs.LG stat.ME

    Domain adaption and physical constrains transfer learning for shale gas production

    Authors: Zhaozhong Yang, Liangjie Gou, Chao Min, Duo Yi, Xiaogang Li, Guoquan Wen

    Abstract: Effective prediction of shale gas production is crucial for strategic reservoir development. However, in new shale gas blocks, two main challenges are encountered: (1) the occurrence of negative transfer due to insufficient data, and (2) the limited interpretability of deep learning (DL) models. To tackle these problems, we propose a novel transfer learning methodology that utilizes domain adaptat… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  18. arXiv:2312.01127  [pdf, other

    math.OC stat.ML

    Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems

    Authors: Juno Kim, Kakei Yamamoto, Kazusato Oko, Zhuoran Yang, Taiji Suzuki

    Abstract: In this paper, we extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates. We propose mean-field Langevin averaged gradient (MFL-AG), a single-loop algorithm that implements gradient descent ascent in the distribution spaces with a novel weighted averaging, and establish average-iterate convergence… ▽ More

    Submitted 16 February, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: ICLR 2024 spotlight

  19. arXiv:2311.15283  [pdf, other

    cs.LG cs.AI math.DS math.NA stat.ML

    Bias-Variance Trade-off in Physics-Informed Neural Networks with Randomized Smoothing for High-Dimensional PDEs

    Authors: Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George Em Karniadakis, Kenji Kawaguchi

    Abstract: While physics-informed neural networks (PINNs) have been proven effective for low-dimensional partial differential equations (PDEs), the computational cost remains a hurdle in high-dimensional scenarios. This is particularly pronounced when computing high-order and high-dimensional derivatives in the physics-informed loss. Randomized Smoothing PINN (RS-PINN) introduces Gaussian noise for stochasti… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: 21 pages, 5 figures

    MSC Class: 14J60

  20. arXiv:2311.13180  [pdf, other

    stat.ML cs.LG

    Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks

    Authors: Jianqing Fan, Zhaoran Wang, Zhuoran Yang, Chenlu Ye

    Abstract: We study high-dimensional multi-armed contextual bandits with batched feedback where the $T$ steps of online interactions are divided into $L$ batches. In specific, each batch collects data according to a policy that depends on previous batches and the rewards are revealed only at the end of the batch. Such a feedback structure is popular in applications such as personalized medicine and online ad… ▽ More

    Submitted 24 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

  21. arXiv:2311.10590  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    EduGym: An Environment and Notebook Suite for Reinforcement Learning Education

    Authors: Thomas M. Moerland, Matthias Müller-Brockhausen, Zhao Yang, Andrius Bernatavicius, Koen Ponse, Tom Kouwenhoven, Andreas Sauter, Michiel van der Meer, Bram Renting, Aske Plaat

    Abstract: Due to the empirical success of reinforcement learning, an increasing number of students study the subject. However, from our practical teaching experience, we see students entering the field (bachelor, master and early PhD) often struggle. On the one hand, textbooks and (online) lectures provide the fundamentals, but students find it hard to translate between equations and code. On the other hand… ▽ More

    Submitted 22 February, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

  22. arXiv:2311.03660  [pdf, other

    stat.ME

    Sampling via Föllmer Flow

    Authors: Zhao Ding, Yuling Jiao, Xiliang Lu, Zhijian Yang, Cheng Yuan

    Abstract: We introduce a novel unit-time ordinary differential equation (ODE) flow called the preconditioned Föllmer flow, which efficiently transforms a Gaussian measure into a desired target measure at time 1. To discretize the flow, we apply Euler's method, where the velocity field is calculated either analytically or through Monte Carlo approximation using Gaussian samples. Under reasonable conditions,… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 24 pages, 6 figures, 3 tables

    MSC Class: 62D05; 58J65; 60J60

  23. arXiv:2310.19861  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Posterior Sampling for Competitive RL: Function Approximation and Partial Observation

    Authors: Shuang Qiu, Ziyu Dai, Han Zhong, Zhaoran Wang, Zhuoran Yang, Tong Zhang

    Abstract: This paper investigates posterior sampling algorithms for competitive reinforcement learning (RL) in the context of general function approximations. Focusing on zero-sum Markov games (MGs) under two critical settings, namely self-play and adversarial learning, we first propose the self-play and adversarial generalized eluder coefficient (GEC) as complexity measures for function approximation, capt… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  24. arXiv:2310.17531  [pdf, ps, other

    cs.GT cs.LG stat.ML

    Learning Regularized Graphon Mean-Field Games with Unknown Graphons

    Authors: Fengzhuo Zhang, Vincent Y. F. Tan, Zhaoran Wang, Zhuoran Yang

    Abstract: We design and analyze reinforcement learning algorithms for Graphon Mean-Field Games (GMFGs). In contrast to previous works that require the precise values of the graphons, we aim to learn the Nash Equilibrium (NE) of the regularized GMFGs when the graphons are unknown. Our contributions are threefold. First, we propose the Proximal Policy Optimization for GMFG (GMFG-PPO) algorithm and show that i… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  25. arXiv:2310.08089  [pdf, other

    cs.GT eess.SY stat.ML

    Learning Regularized Monotone Graphon Mean-Field Games

    Authors: Fengzhuo Zhang, Vincent Y. F. Tan, Zhaoran Wang, Zhuoran Yang

    Abstract: This paper studies two fundamental problems in regularized Graphon Mean-Field Games (GMFGs). First, we establish the existence of a Nash Equilibrium (NE) of any $λ$-regularized GMFG (for $λ\geq 0$). This result relies on weaker conditions than those in previous works for analyzing both unregularized GMFGs ($λ=0$) and $λ$-regularized MFGs, which are special cases of GMFGs. Second, we propose provab… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  26. arXiv:2309.05145  [pdf, other

    cs.LG cs.AI stat.ML

    Outlier Robust Adversarial Training

    Authors: Shu Hu, Zhenhuan Yang, Xin Wang, Yiming Ying, Siwei Lyu

    Abstract: Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: Accepted by The 15th Asian Conference on Machine Learning (ACML 2023)

  27. arXiv:2308.04011  [pdf, other

    cs.LG stat.ME

    Generalization bound for estimating causal effects from observational network data

    Authors: Ruichu Cai, Zeqin Yang, Weilin Chen, Yuguang Yan, Zhifeng Hao

    Abstract: Estimating causal effects from observational network data is a significant but challenging problem. Existing works in causal inference for observational network data lack an analysis of the generalization bound, which can theoretically provide support for alleviating the complex confounding bias and practically guide the design of learning objectives in a principled manner. To fill this gap, we de… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  28. arXiv:2307.14085  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks

    Authors: Siyu Chen, Mengdi Wang, Zhuoran Yang

    Abstract: We study reinforcement learning (RL) for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. In specific, at the outset of the game, the leader announces her policy to the follower and commits to it. The follower observes the leader's policy and, in turn, adopts a quantal response policy by solving an entropy-regularized policy optimization… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: 129 pages, 1 figure

  29. arXiv:2307.04055  [pdf, other

    stat.ML cs.AI cs.GT cs.LG

    Contextual Dynamic Pricing with Strategic Buyers

    Authors: Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun

    Abstract: Personalized pricing, which involves tailoring prices based on individual characteristics, is commonly used by firms to implement a consumer-specific pricing policy. In this process, buyers can also strategically manipulate their feature data to obtain a lower price, incurring certain manipulation costs. Such strategic behavior can hinder firms from maximizing their profits. In this paper, we stud… ▽ More

    Submitted 25 June, 2024; v1 submitted 8 July, 2023; originally announced July 2023.

    Comments: The paper has been accepted by JASA

  30. arXiv:2306.13949  [pdf

    stat.ME stat.AP

    Analysis of dynamic restricted mean survival time based on pseudo-observations

    Authors: Zi**g Yang, Chengfeng Zhang, Yawen Hou, Zheng Chen

    Abstract: In clinical follow-up studies with a time-to-event end point, the difference in the restricted mean survival time (RMST) is a suitable substitute for the hazard ratio (HR). However, the RMST only measures the survival of patients over a period of time from the baseline and cannot reflect changes in life expectancy over time. Based on the RMST, we study the conditional restricted mean survival time… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: Biometrics. 2023

    Report number: 13891

  31. arXiv:2306.01435  [pdf, other

    cs.LG stat.ML

    Improving Adversarial Robustness of DEQs with Explicit Regulations Along the Neural Dynamics

    Authors: Zonghan Yang, Peng Li, Tianyu Pang, Yang Liu

    Abstract: Deep equilibrium (DEQ) models replace the multiple-layer stacking of conventional deep networks with a fixed-point iteration of a single-layer transformation. Having been demonstrated to be competitive in a variety of real-world scenarios, the adversarial robustness of general DEQs becomes increasingly crucial for their reliable deployment. Existing works improve the robustness of general DEQ mode… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted at ICML 2023. Our code is available at https://github.com/minicheshire/DEQ-Regulating-Neural-Dynamics

  32. arXiv:2306.01429  [pdf, other

    cs.LG stat.ML

    A Closer Look at the Adversarial Robustness of Deep Equilibrium Models

    Authors: Zonghan Yang, Tianyu Pang, Yang Liu

    Abstract: Deep equilibrium models (DEQs) refrain from the traditional layer-stacking paradigm and turn to find the fixed point of a single layer. DEQs have achieved promising performance on different applications with featured memory efficiency. At the same time, the adversarial vulnerability of DEQs raises concerns. Several works propose to certify robustness for monotone DEQs. However, limited efforts are… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS 2022. Our code is available at https://github.com/minicheshire/DEQ-White-Box-Robustness

  33. arXiv:2305.19420  [pdf, ps, other

    stat.ML cs.LG

    What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization

    Authors: Yufeng Zhang, Fengzhuo Zhang, Zhuoran Yang, Zhaoran Wang

    Abstract: In this paper, we conduct a comprehensive study of In-Context Learning (ICL) by addressing several open questions: (a) What type of ICL estimator is learned by large language models? (b) What is a proper performance metric for ICL and what is the error rate? (c) How does the transformer architecture enable ICL? To answer these questions, we adopt a Bayesian view and formulate ICL as a problem of p… ▽ More

    Submitted 10 October, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

  34. arXiv:2305.18438  [pdf, ps, other

    cs.LG cs.AI math.OC math.ST stat.ML

    Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism

    Authors: Zihao Li, Zhuoran Yang, Mengdi Wang

    Abstract: In this paper, we study offline Reinforcement Learning with Human Feedback (RLHF) where we aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices. RLHF is challenging for multiple reasons: large state space but limited human feedback, the bounded rationality of human decisions, and the off-policy distribution shift. In this paper… ▽ More

    Submitted 3 July, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

  35. arXiv:2305.18258  [pdf, other

    cs.LG cs.AI cs.GT math.OC stat.ML

    Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

    Authors: Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang

    Abstract: In online reinforcement learning (online RL), balancing exploration and exploitation is crucial for finding an optimal policy in a sample-efficient way. To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration. However, in order to cope with general function approximators, most of them involve impractical algorithm… ▽ More

    Submitted 25 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  36. arXiv:2305.04819  [pdf, other

    cs.LG cs.GT cs.MA stat.ML

    Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

    Authors: Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee

    Abstract: Policy optimization methods with function approximation are widely used in multi-agent reinforcement learning. However, it remains elusive how to design such algorithms with statistical guarantees. Leveraging a multi-agent performance difference lemma that characterizes the landscape of multi-agent policy optimization, we find that the localized action value function serves as an ideal descent dir… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  37. arXiv:2304.05010  [pdf, other

    stat.AP

    Characterizing personalized effects of family information on disease risk using graph representation learning

    Authors: Sophie Wharrie, Zhiyu Yang, Andrea Ganna, Samuel Kaski

    Abstract: Family history is considered a risk factor for many diseases because it implicitly captures shared genetic, environmental and lifestyle factors. Finland's nationwide electronic health record (EHR) system spanning multiple generations presents new opportunities for studying a connected network of medical histories for entire families. In this work we present a graph-based deep learning approach for… ▽ More

    Submitted 8 August, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Journal ref: Proceedings of the 8th Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research, 219 (2023) 824-845

  38. arXiv:2303.11536  [pdf, other

    cs.LG cs.AI cs.CV math.ST stat.ML

    Indeterminate Probability Neural Network

    Authors: Tao Yang, Chuang Liu, Xiaofeng Ma, Weijia Lu, Ning Wu, Bingyang Li, Zhifei Yang, Peng Liu, Lin Sun, Xiaodong Zhang, Can Zhang

    Abstract: We propose a new general model called IPNN - Indeterminate Probability Neural Network, which combines neural network and probability theory together. In the classical probability theory, the calculation of probability is based on the occurrence of events, which is hardly used in current neural networks. In this paper, we propose a new general probability theory, which is an extension of classical… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: 13 pages

  39. arXiv:2303.11187  [pdf, other

    cs.LG cs.AI stat.ML

    A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations

    Authors: Siyu Chen, Yitan Wang, Zhaoran Wang, Zhuoran Yang

    Abstract: We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data. However, this data usually contains two deficiencies: (i) some variables that confound actions are not observed, and (ii) missing observations exist in the collected data. Unobserved confounders lead to a confounding bias and missing observations cause bias and inefficiency problems.… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: 76 page, 5 figures

  40. arXiv:2303.08613  [pdf, ps, other

    cs.LG cs.AI cs.GT econ.TH stat.ML

    Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model

    Authors: Siyu Chen, Jibang Wu, Yifan Wu, Zhuoran Yang

    Abstract: We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf. Such a problem is modeled as a Stackelberg game between the principal and the agent, where the principal announces a scoring rule that specifies the payment, and then the agent then chooses an effort level that maximizes her own profit and reports the information. We stu… ▽ More

    Submitted 6 August, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 35 pages, adding an impossible result (Lemma 3.2) with its proof in Section D.1

  41. arXiv:2302.09193  [pdf, other

    stat.ML cs.LG

    Copula-based transferable models for synthetic population generation

    Authors: Pascal Jutras-Dubé, Mohammad B. Al-Khasawneh, Zhichao Yang, Javier Bas, Fabian Bastin, Cinzia Cirillo

    Abstract: Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents for behavioral modeling and simulation. Traditional methods, often reliant on target population samples, such as census data or travel surveys, face limitations due to high costs and small sample sizes, particularly at smaller geographical scales. We propose a novel framework bas… ▽ More

    Submitted 16 March, 2024; v1 submitted 17 February, 2023; originally announced February 2023.

  42. arXiv:2302.03003  [pdf, other

    eess.IV cs.CV stat.ML

    OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing

    Authors: Wenhui Zhu, Peijie Qiu, Oana M. Dumitrascu, Jacob M. Sobczak, Mohammad Farazi, Zhangsihao Yang, Keshav Nandakumar, Yalin Wang

    Abstract: Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the Optimal Transport (OT) theory to propose an… ▽ More

    Submitted 8 April, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted as a conference paper to The 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023)

  43. arXiv:2302.02092  [pdf, other

    cs.LG stat.ML

    Interpolation for Robust Learning: Data Augmentation on Wasserstein Geodesics

    Authors: Jiacheng Zhu, Jielin Qiu, Aritra Guha, Zhuolin Yang, Xuanlong Nguyen, Bo Li, Ding Zhao

    Abstract: We propose to study and promote the robustness of a model as per its performance through the interpolation of training data distributions. Specifically, (1) we augment the data by finding the worst-case Wasserstein barycenter on the geodesic connecting subpopulation distributions of different categories. (2) We regularize the model for smoother performance on the continuous geodesic path connectin… ▽ More

    Submitted 28 August, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: 34 pages, 3 figures, 18 tables

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:43129-43157, 2023

  44. arXiv:2302.01576  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    ResMem: Learn what you can and memorize the rest

    Authors: Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a ne… ▽ More

    Submitted 20 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

  45. arXiv:2212.12167  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

    Authors: Zuyue Fu, Zhengling Qi, Zhuoran Yang, Zhaoran Wang, Lan Wang

    Abstract: Motivated by the human-machine interaction such as training chatbots for improving customer satisfaction, we study human-guided human-machine interaction involving private information. We model this interaction as a two-player turn-based game, where one player (Alice, a human) guides the other player (Bob, a machine) towards a common goal. Specifically, we focus on offline reinforcement learning (… ▽ More

    Submitted 23 December, 2022; originally announced December 2022.

  46. arXiv:2212.10718  [pdf

    cs.LG math.NA stat.ME

    Interpretability and causal discovery of the machine learning models to predict the production of CBM wells after hydraulic fracturing

    Authors: Chao Min, Guoquan Wen, Liangjie Gou, Xiaogang Li, Zhaozhong Yang

    Abstract: Machine learning approaches are widely studied in the production prediction of CBM wells after hydraulic fracturing, but merely used in practice due to the low generalization ability and the lack of interpretability. A novel methodology is proposed in this article to discover the latent causality from observed data, which is aimed at finding an indirect way to interpret the machine learning result… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  47. arXiv:2212.09900  [pdf, ps, other

    cs.LG math.ST stat.ME stat.ML

    Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality

    Authors: Ying **, Zhimei Ren, Zhuoran Yang, Zhaoran Wang

    Abstract: This paper studies offline policy learning, which aims at utilizing observations collected a priori (from either fixed or adaptively evolving behavior policies) to learn the optimal individualized decision rule in a given class. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics are lower bounded… ▽ More

    Submitted 14 March, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

  48. arXiv:2211.01962  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

    Authors: Han Zhong, Wei Xiong, Sirui Zheng, Liwei Wang, Zhaoran Wang, Zhuoran Yang, Tong Zhang

    Abstract: We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generali… ▽ More

    Submitted 30 June, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: We changed the title from the first version. We fixed a technical issue in the first version regarding the $\ell_2$ eluder technique (Lemma D.2)

  49. arXiv:2210.12874  [pdf, other

    cs.LG cs.CL stat.ML

    Global Contrastive Batch Sampling via Optimization on Sample Permutations

    Authors: Vin Sachidananda, Ziyi Yang, Chenguang Zhu

    Abstract: Contrastive Learning has recently achieved state-of-the-art performance in a wide range of tasks. Many contrastive learning approaches use mined hard negatives to make batches more informative during training but these approaches are inefficient as they increase epoch length proportional to the number of mined negatives and require frequent updates of nearest neighbor indices or mining from recent… ▽ More

    Submitted 7 June, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: ICML 2023; 21 pages, 7 figures

  50. arXiv:2210.10278  [pdf, other

    cs.LG cs.GT stat.ML

    A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

    Authors: Rui Ai, Boxiang Lyu, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan

    Abstract: We study reserve price optimization in multi-phase second price auctions, where seller's prior actions affect the bidders' later valuations through a Markov Decision Process (MDP). Compared to the bandit setting in existing works, the setting in ours involves three challenges. First, from the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontru… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.