Search | arXiv e-print repository

Forecast Linear Augmented Projection (FLAP): A free lunch to reduce forecast error variance

Authors: Yangzhuoran Fin Yang, George Athanasopoulos, Rob J. Hyndman, Anastasios Panagiotelis

Abstract: A novel forecast linear augmented projection (FLAP) method is introduced, which reduces the forecast error variance of any unbiased multivariate forecast without introducing bias. The method first constructs new component series which are linear combinations of the original series. Forecasts are then generated for both the original and component series. Finally, the full vector of forecasts is pro… ▽ More A novel forecast linear augmented projection (FLAP) method is introduced, which reduces the forecast error variance of any unbiased multivariate forecast without introducing bias. The method first constructs new component series which are linear combinations of the original series. Forecasts are then generated for both the original and component series. Finally, the full vector of forecasts is projected onto a linear subspace where the constraints implied by the combination weights hold. It is proven that the trace of the forecast error variance is non-increasing with the number of components, and mild conditions are established for which it is strictly decreasing. It is also shown that the proposed method achieves maximum forecast error variance reduction among linear projection methods. The theoretical results are validated through simulations and two empirical applications based on Australian tourism and FRED-MD data. Notably, using FLAP with Principal Component Analysis (PCA) to construct the new series leads to substantial forecast error variance reduction. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.18072 [pdf, ps, other]

Learning for Bandits under Action Erasures

Authors: Osama Hanna, Merve Karakas, Lin F. Yang, Christina Fragouli

Abstract: We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether th… ▽ More We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether the observed reward resulted from the desired action or not. We propose a scheme that can work on top of any (existing or future) MAB algorithm and make it robust to action erasures. Our scheme results in a worst-case regret over action-erasure channels that is at most a factor of $O(1/\sqrt{1-ε})$ away from the no-erasure worst-case regret of the underlying MAB algorithm, where $ε$ is the erasure probability. We also propose a modification of the successive arm elimination algorithm and prove that its worst-case regret is $\Tilde{O}(\sqrt{KT}+K/(1-ε))$, which we prove is optimal by providing a matching lower bound. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2404.18905 [pdf, other]

Detecting critical treatment effect bias in small subgroups

Authors: Piersilvio De Bartolomeis, Javier Abad, Konstantin Donhauser, Fanny Yang

Abstract: Randomized trials are considered the gold standard for making informed decisions in medicine, yet they often lack generalizability to the patient populations in clinical practice. Observational studies, on the other hand, cover a broader patient population but are prone to various biases. Thus, before using an observational study for decision-making, it is crucial to benchmark its treatment effect… ▽ More Randomized trials are considered the gold standard for making informed decisions in medicine, yet they often lack generalizability to the patient populations in clinical practice. Observational studies, on the other hand, cover a broader patient population but are prone to various biases. Thus, before using an observational study for decision-making, it is crucial to benchmark its treatment effect estimates against those derived from a randomized trial. We propose a novel strategy to benchmark observational studies beyond the average treatment effect. First, we design a statistical test for the null hypothesis that the treatment effects estimated from the two studies, conditioned on a set of relevant features, differ up to some tolerance. We then estimate an asymptotically valid lower bound on the maximum bias strength for any subgroup in the observational study. Finally, we validate our benchmarking strategy in a real-world setting and show that it leads to conclusions that align with established medical knowledge. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted for presentation at the Conference on Uncertainty in Artificial Intelligence (UAI) 2024

arXiv:2402.15691 [pdf, other]

Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles

Authors: Fan Yang, Pierre Le Bodic, Michael Kamp, Mario Boley

Abstract: Gradient boosting of prediction rules is an efficient approach to learn potentially interpretable yet accurate probabilistic models. However, actual interpretability requires to limit the number and size of the generated rules, and existing boosting variants are not designed for this purpose. Though corrective boosting refits all rule weights in each iteration to minimise prediction risk, the incl… ▽ More Gradient boosting of prediction rules is an efficient approach to learn potentially interpretable yet accurate probabilistic models. However, actual interpretability requires to limit the number and size of the generated rules, and existing boosting variants are not designed for this purpose. Though corrective boosting refits all rule weights in each iteration to minimise prediction risk, the included rule conditions tend to be sub-optimal, because commonly used objective functions fail to anticipate this refitting. Here, we address this issue by a new objective function that measures the angle between the risk gradient vector and the projection of the condition output vector onto the orthogonal complement of the already selected conditions. This approach correctly approximate the ideal update of adding the risk gradient itself to the model and favours the inclusion of more general and thus shorter rules. As we demonstrate using a wide range of prediction tasks, this significantly improves the comprehensibility/accuracy trade-off of the fitted ensemble. Additionally, we show how objective values for related rule conditions can be computed incrementally to avoid any substantial computational overhead of the new method. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: 21 pages, 11 figures, accepted at AISTATS 2024

arXiv:2401.02708 [pdf, other]

TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis

Authors: Liwen Zhang, Lianzhen Zhong, Fan Yang, Di Dong, Hui Hui, Jie Tian

Abstract: A core challenge in survival analysis is to model the distribution of censored time-to-event data, where the event of interest may be a death, failure, or occurrence of a specific event. Previous studies have showed that ranking and maximum likelihood estimation (MLE)loss functions are widely-used for survival analysis. However, ranking loss only focus on the ranking of survival time and does not… ▽ More A core challenge in survival analysis is to model the distribution of censored time-to-event data, where the event of interest may be a death, failure, or occurrence of a specific event. Previous studies have showed that ranking and maximum likelihood estimation (MLE)loss functions are widely-used for survival analysis. However, ranking loss only focus on the ranking of survival time and does not consider potential effect of samples for exact survival time values. Furthermore, the MLE is unbounded and easily subject to outliers (e.g., censored data), which may cause poor performance of modeling. To handle the complexities of learning process and exploit valuable survival time values, we propose a time-adaptive coordinate loss function, TripleSurv, to achieve adaptive adjustments by introducing the differences in the survival time between sample pairs into the ranking, which can encourage the model to quantitatively rank relative risk of pairs, ultimately enhancing the accuracy of predictions. Most importantly, the TripleSurv is proficient in quantifying the relative risk between samples by ranking ordering of pairs, and consider the time interval as a trade-off to calibrate the robustness of model over sample distribution. Our TripleSurv is evaluated on three real-world survival datasets and a public synthetic dataset. The results show that our method outperforms the state-of-the-art methods and exhibits good model performance and robustness on modeling various sophisticated data distributions with different censor rates. Our code will be available upon acceptance. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 9 pages,6 figures

arXiv:2401.02154 [pdf, other]

Disentangle Estimation of Causal Effects from Cross-Silo Data

Authors: Yuxuan Liu, Haozhao Wang, Shuang Wang, Zhiming He, Wenchao Xu, Jialiang Zhu, Fan Yang

Abstract: Estimating causal effects among different events is of great importance to critical fields such as drug development. Nevertheless, the data features associated with events may be distributed across various silos and remain private within respective parties, impeding direct information exchange between them. This, in turn, can result in biased estimations of local causal effects, which rely on the… ▽ More Estimating causal effects among different events is of great importance to critical fields such as drug development. Nevertheless, the data features associated with events may be distributed across various silos and remain private within respective parties, impeding direct information exchange between them. This, in turn, can result in biased estimations of local causal effects, which rely on the characteristics of only a subset of the covariates. To tackle this challenge, we introduce an innovative disentangle architecture designed to facilitate the seamless cross-silo transmission of model parameters, enriched with causal mechanisms, through a combination of shared and private branches. Besides, we introduce global constraints into the equation to effectively mitigate bias within the various missing domains, thereby elevating the accuracy of our causal effect estimation. Extensive experiments conducted on new semi-synthetic datasets show that our method outperforms state-of-the-art baselines. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2312.04464 [pdf, other]

Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

Authors: Jiayi Huang, Han Zhong, Liwei Wang, Lin F. Yang

Abstract: To tackle long planning horizon problems in reinforcement learning with general function approximation, we propose the first algorithm, termed as UCRL-WVTR, that achieves both \emph{horizon-free} and \emph{instance-dependent}, since it eliminates the polynomial dependency on the planning horizon. The derived regret bound is deemed \emph{sharp}, as it matches the minimax lower bound when specialize… ▽ More To tackle long planning horizon problems in reinforcement learning with general function approximation, we propose the first algorithm, termed as UCRL-WVTR, that achieves both \emph{horizon-free} and \emph{instance-dependent}, since it eliminates the polynomial dependency on the planning horizon. The derived regret bound is deemed \emph{sharp}, as it matches the minimax lower bound when specialized to linear mixture MDPs up to logarithmic factors. Furthermore, UCRL-WVTR is \emph{computationally efficient} with access to a regression oracle. The achievement of such a horizon-free, instance-dependent, and sharp regret bound hinges upon (i) novel algorithm designs: weighted value-targeted regression and a high-order moment estimator in the context of general function approximation; and (ii) fine-grained analyses: a novel concentration bound of weighted non-linear least squares and a refined analysis which leads to the tight instance-dependent bound. We also conduct comprehensive experiments to corroborate our theoretical findings. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.03871 [pdf, other]

Hidden yet quantifiable: A lower bound for confounding strength using randomized trials

Authors: Piersilvio De Bartolomeis, Javier Abad, Konstantin Donhauser, Fanny Yang

Abstract: In the era of fast-paced precision medicine, observational studies play a major role in properly evaluating new treatments in clinical practice. Yet, unobserved confounding can significantly compromise causal conclusions drawn from non-randomized data. We propose a novel strategy that leverages randomized trials to quantify unobserved confounding. First, we design a statistical test to detect unob… ▽ More In the era of fast-paced precision medicine, observational studies play a major role in properly evaluating new treatments in clinical practice. Yet, unobserved confounding can significantly compromise causal conclusions drawn from non-randomized data. We propose a novel strategy that leverages randomized trials to quantify unobserved confounding. First, we design a statistical test to detect unobserved confounding with strength above a given threshold. Then, we use the test to estimate an asymptotically valid lower bound on the unobserved confounding strength. We evaluate the power and validity of our statistical test on several synthetic and semi-synthetic datasets. Further, we show how our lower bound can correctly identify the absence and presence of unobserved confounding in a real-world setting. △ Less

Submitted 1 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Accepted for presentation at the International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

arXiv:2311.18557 [pdf, other]

Can semi-supervised learning use all the data effectively? A lower bound perspective

Authors: Alexandru Ţifrea, Gizem Yüce, Amartya Sanyal, Fanny Yang

Abstract: Prior works have shown that semi-supervised learning algorithms can leverage unlabeled data to improve over the labeled sample complexity of supervised learning (SL) algorithms. However, existing theoretical analyses focus on regimes where the unlabeled data is sufficient to learn a good decision boundary using unsupervised learning (UL) alone. This begs the question: Can SSL algorithms simultaneo… ▽ More Prior works have shown that semi-supervised learning algorithms can leverage unlabeled data to improve over the labeled sample complexity of supervised learning (SL) algorithms. However, existing theoretical analyses focus on regimes where the unlabeled data is sufficient to learn a good decision boundary using unsupervised learning (UL) alone. This begs the question: Can SSL algorithms simultaneously improve upon both UL and SL? To this end, we derive a tight lower bound for 2-Gaussian mixture models that explicitly depends on the labeled and the unlabeled dataset size as well as the signal-to-noise ratio of the mixture distribution. Surprisingly, our result implies that no SSL algorithm can improve upon the minimax-optimal statistical error rates of SL or UL algorithms for these distributions. Nevertheless, we show empirically on real-world data that SSL algorithms can still outperform UL and SL methods. Therefore, our work suggests that, while proving performance gains for SSL algorithms is possible, it requires careful tracking of constants. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: Published in Advances in Neural Information Processing Systems 2023

arXiv:2311.04686 [pdf, other]

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

Authors: Zhanbo Feng, Yuanjie Wang, Jie Li, Fan Yang, Jiong Lou, Tiebin Mi, Robert. C. Qiu, Zhenyu Liao

Abstract: Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, fe… ▽ More Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, federated domain adaptation (FDA) emerges as a powerful approach to address this challenge. Most existing FDA approaches typically focus on aligning the distributions between source and target domains by minimizing their (e.g., MMD) distance. Such strategies, however, inevitably introduce high communication overheads and can be highly sensitive to network reliability. In this paper, we introduce RF-TCA, an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance. Leveraging the computational advantage of RF-TCA, we further extend it to FDA setting with FedRF-TCA. The proposed FedRF-TCA protocol boasts communication complexity that is \emph{independent} of the sample size, while maintaining performance that is either comparable to or even surpasses state-of-the-art FDA methods. We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 21 pages

arXiv:2309.02073 [pdf, other]

Debiased Regression Adjustment in Completely Randomized Experiments with Moderately High-dimensional Covariates

Authors: Xin Lu, Fan Yang, Yuhao Wang

Abstract: Completely randomized experiment is the gold standard for causal inference. When the covariate information for each experimental candidate is available, one typical way is to include them in covariate adjustments for more accurate treatment effect estimation. In this paper, we investigate this problem under the randomization-based framework, i.e., that the covariates and potential outcomes of all… ▽ More Completely randomized experiment is the gold standard for causal inference. When the covariate information for each experimental candidate is available, one typical way is to include them in covariate adjustments for more accurate treatment effect estimation. In this paper, we investigate this problem under the randomization-based framework, i.e., that the covariates and potential outcomes of all experimental candidates are assumed as deterministic quantities and the randomness comes solely from the treatment assignment mechanism. Under this framework, to achieve asymptotically valid inference, existing estimators usually require either (i) that the dimension of covariates $p$ grows at a rate no faster than $O(n^{2 / 3})$ as sample size $n \to \infty$; or (ii) certain sparsity constraints on the linear representations of potential outcomes constructed via possibly high-dimensional covariates. In this paper, we consider the moderately high-dimensional regime where $p$ is allowed to be in the same order of magnitude as $n$. We develop a novel debiased estimator with a corresponding inference procedure and establish its asymptotic normality under mild assumptions. Our estimator is model-free and does not require any sparsity constraint on potential outcome's linear representations. We also discuss its asymptotic efficiency improvements over the unadjusted treatment effect estimator under different dimensionality constraints. Numerical analysis confirms that compared to other regression adjustment based treatment effect estimators, our debiased estimator performs well in moderately high dimensions. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2306.06836 [pdf, other]

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

Authors: Jiayi Huang, Han Zhong, Liwei Wang, Lin F. Yang

Abstract: While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample or time-efficient algorithms for RL with large state-action space exist when the rewards are \emph{heavy-tailed}, i.e., with only finite $(1+ε)$-th moments for some $ε\in(0,1]$. In this work, we address the challenge of such r… ▽ More While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample or time-efficient algorithms for RL with large state-action space exist when the rewards are \emph{heavy-tailed}, i.e., with only finite $(1+ε)$-th moments for some $ε\in(0,1]$. In this work, we address the challenge of such rewards in RL with linear function approximation. We first design an algorithm, \textsc{Heavy-OFUL}, for heavy-tailed linear bandits, achieving an \emph{instance-dependent} $T$-round regret of $\tilde{O}\big(d T^{\frac{1-ε}{2(1+ε)}} \sqrt{\sum_{t=1}^T ν_t^2} + d T^{\frac{1-ε}{2(1+ε)}}\big)$, the \emph{first} of this kind. Here, $d$ is the feature dimension, and $ν_t^{1+ε}$ is the $(1+ε)$-th central moment of the reward at the $t$-th round. We further show the above bound is minimax optimal when applied to the worst-case instances in stochastic and deterministic linear bandits. We then extend this algorithm to the RL settings with linear function approximation. Our algorithm, termed as \textsc{Heavy-LSVI-UCB}, achieves the \emph{first} computationally efficient \emph{instance-dependent} $K$-episode regret of $\tilde{O}(d \sqrt{H \mathcal{U}^*} K^\frac{1}{1+ε} + d \sqrt{H \mathcal{V}^* K})$. Here, $H$ is length of the episode, and $\mathcal{U}^*, \mathcal{V}^*$ are instance-dependent quantities scaling with the central moment of reward and value functions, respectively. We also provide a matching minimax lower bound $Ω(d H K^{\frac{1}{1+ε}} + d \sqrt{H^3 K})$ to demonstrate the optimality of our algorithm in the worst case. Our result is achieved via a novel robust self-normalized concentration inequality that may be of independent interest in handling heavy-tailed noise in general online regression problems. △ Less

Submitted 7 March, 2024; v1 submitted 11 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023

arXiv:2306.03962 [pdf, other]

PILLAR: How to make semi-private learning more effective

Authors: Francesco Pinto, Yaxi Hu, Fanny Yang, Amartya Sanyal

Abstract: In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose a computationally efficient algorithm that, under mild assumptions on the data, provably achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets. For this purpose, we leverage the features extracted by networ… ▽ More In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose a computationally efficient algorithm that, under mild assumptions on the data, provably achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets. For this purpose, we leverage the features extracted by networks pre-trained on public (labelled or unlabelled) data, whose distribution can significantly differ from the one on which SP learning is performed. To validate its empirical effectiveness, we propose a wide variety of experiments under tight privacy constraints ($ε= 0.1$) and with a focus on low-data regimes. In all of these settings, our algorithm exhibits significantly improved performance over available baselines that use similar amounts of public data. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2305.19562 [pdf, other]

Replicability in Reinforcement Learning

Authors: Amin Karbasi, Grigoris Velegkas, Lin F. Yang, Felix Zhou

Abstract: We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL). We focus on the fundamental setting of discounted tabular MDPs with access to a generative model. Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is replicable if, with high probability, it outputs the exact same policy after two executions on i.i.d. samp… ▽ More We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL). We focus on the fundamental setting of discounted tabular MDPs with access to a generative model. Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is replicable if, with high probability, it outputs the exact same policy after two executions on i.i.d. samples drawn from the generator when its internal randomness is the same. We first provide an efficient $ρ$-replicable algorithm for $(\varepsilon, δ)$-optimal policy estimation with sample and time complexity $\widetilde O\left(\frac{N^3\cdot\log(1/δ)}{(1-γ)^5\cdot\varepsilon^2\cdotρ^2}\right)$, where $N$ is the number of state-action pairs. Next, for the subclass of deterministic algorithms, we provide a lower bound of order $Ω\left(\frac{N^3}{(1-γ)^3\cdot\varepsilon^2\cdotρ^2}\right)$. Then, we study a relaxed version of replicability proposed by Kalavasis et al. [2023] called TV indistinguishability. We design a computationally efficient TV indistinguishable algorithm for policy estimation whose sample complexity is $\widetilde O\left(\frac{N^2\cdot\log(1/δ)}{(1-γ)^5\cdot\varepsilon^2\cdotρ^2}\right)$. At the cost of $\exp(N)$ running time, we transform these TV indistinguishable algorithms to $ρ$-replicable ones without increasing their sample complexity. Finally, we introduce the notion of approximate-replicability where we only require that two outputted policies are close under an appropriate statistical divergence (e.g., Renyi) and show an improved sample complexity of $\widetilde O\left(\frac{N\cdot\log(1/δ)}{(1-γ)^5\cdot\varepsilon^2\cdotρ^2}\right)$. △ Less

Submitted 27 October, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: to be published in neurips 2023

arXiv:2305.09798 [pdf]

The Ways of Words: The Impact of Word Choice on Information Engagement and Decision Making

Authors: Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan Yang, Jennifer Romano

Abstract: Little research has explored how information engagement (IE), the degree to which individuals interact with and use information in a manner that manifests cognitively, behaviorally, and affectively. This study explored the impact of phrasing, specifically word choice, on IE and decision making. Synthesizing two theoretical models, User Engagement Theory UET and Information Behavior Theory IBT, a t… ▽ More Little research has explored how information engagement (IE), the degree to which individuals interact with and use information in a manner that manifests cognitively, behaviorally, and affectively. This study explored the impact of phrasing, specifically word choice, on IE and decision making. Synthesizing two theoretical models, User Engagement Theory UET and Information Behavior Theory IBT, a theoretical framework illustrating the impact of and relationships among the three IE dimensions of perception, participation, and perseverance was developed and hypotheses generated. The framework was empirically validated in a large-scale user study measuring how word choice impacts the dimensions of IE. The findings provide evidence that IE differs from other forms of engagement in that it is driven and fostered by the expression of the information itself, regardless of the information system used to view, interact with, and use the information. The findings suggest that phrasing can have a significant effect on the interpretation of and interaction with digital information, indicating the importance of expression of information, in particular word choice, on decision making and IE. The research contributes to the literature by identifying methods for assessment and improvement of IE and decision making with digital text. △ Less

Submitted 16 May, 2023; originally announced May 2023.

MSC Class: 28-08 ACM Class: H.5.2; H.1.2

arXiv:2301.07605 [pdf, other]

Strong inductive biases provably prevent harmless interpolation

Authors: Michael Aerni, Marco Milanta, Konstantin Donhauser, Fanny Yang

Abstract: Classical wisdom suggests that estimators should avoid fitting noise to achieve good generalization. In contrast, modern overparameterized models can yield small test error despite interpolating noise -- a phenomenon often called "benign overfitting" or "harmless interpolation". This paper argues that the degree to which interpolation is harmless hinges upon the strength of an estimator's inductiv… ▽ More Classical wisdom suggests that estimators should avoid fitting noise to achieve good generalization. In contrast, modern overparameterized models can yield small test error despite interpolating noise -- a phenomenon often called "benign overfitting" or "harmless interpolation". This paper argues that the degree to which interpolation is harmless hinges upon the strength of an estimator's inductive bias, i.e., how heavily the estimator favors solutions with a certain structure: while strong inductive biases prevent harmless interpolation, weak inductive biases can even require fitting noise to generalize well. Our main theoretical result establishes tight non-asymptotic bounds for high-dimensional kernel regression that reflect this phenomenon for convolutional kernels, where the filter size regulates the strength of the inductive bias. We further provide empirical evidence of the same behavior for deep neural networks with varying filter sizes and rotational invariance. △ Less

Submitted 1 March, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

Comments: Accepted at ICLR 2023

arXiv:2212.05577 [pdf, other]

Mediation analysis with the mediator and outcome missing not at random

Authors: Shuozhi Zuo, Debashis Ghosh, Peng Ding, Fan Yang

Abstract: Mediation analysis is widely used for investigating direct and indirect causal pathways through which an effect arises. However, many mediation analysis studies are challenged by missingness in the mediator and outcome. In general, when the mediator and outcome are missing not at random, the direct and indirect effects are not identifiable without further assumptions. In this work, we study the id… ▽ More Mediation analysis is widely used for investigating direct and indirect causal pathways through which an effect arises. However, many mediation analysis studies are challenged by missingness in the mediator and outcome. In general, when the mediator and outcome are missing not at random, the direct and indirect effects are not identifiable without further assumptions. In this work, we study the identifiability of the direct and indirect effects under some interpretable mechanisms that allow for missing not at random in the mediator and outcome. We evaluate the performance of statistical inference under those mechanisms through simulation studies and illustrate the proposed methods via the National Job Corps Study. △ Less

Submitted 22 September, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

arXiv:2212.03783 [pdf, ps, other]

Tight bounds for maximum $\ell_1$-margin classifiers

Authors: Stefan Stojanovic, Konstantin Donhauser, Fanny Yang

Abstract: Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum $\ell_1$-margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separable. Previous works consistently show that many estimators relying on the $\ell_1$-norm achieve improved statistical rates for hard sparse ground truths. We show… ▽ More Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum $\ell_1$-margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separable. Previous works consistently show that many estimators relying on the $\ell_1$-norm achieve improved statistical rates for hard sparse ground truths. We show that surprisingly, this adaptivity does not apply to the maximum $\ell_1$-margin classifier for a standard discriminative setting. In particular, for the noiseless setting, we prove tight upper and lower bounds for the prediction error that match existing rates of order $\frac{\|w^*\|_1^{2/3}}{n^{1/3}}$ for general ground truths. To complete the picture, we show that when interpolating noisy observations, the error vanishes at a rate of order $\frac{1}{\sqrt{\log(d/n)}}$. We are therefore first to show benign overfitting for the maximum $\ell_1$-margin classifier. △ Less

Submitted 20 January, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

arXiv:2211.05632 [pdf, ps, other]

Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms

Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli

Abstract: In this paper, we address the stochastic contextual linear bandit problem, where a decision maker is provided a context (a random set of actions drawn from a distribution). The expected reward of each action is specified by the inner product of the action and an unknown parameter. The goal is to design an algorithm that learns to play as close as possible to the unknown optimal policy after a numb… ▽ More In this paper, we address the stochastic contextual linear bandit problem, where a decision maker is provided a context (a random set of actions drawn from a distribution). The expected reward of each action is specified by the inner product of the action and an unknown parameter. The goal is to design an algorithm that learns to play as close as possible to the unknown optimal policy after a number of action plays. This problem is considered more challenging than the linear bandit problem, which can be viewed as a contextual bandit problem with a \emph{fixed} context. Surprisingly, in this paper, we show that the stochastic contextual problem can be solved as if it is a linear bandit problem. In particular, we establish a novel reduction framework that converts every stochastic contextual linear bandit instance to a linear bandit instance, when the context distribution is known. When the context distribution is unknown, we establish an algorithm that reduces the stochastic contextual instance to a sequence of linear bandit instances with small misspecifications and achieves nearly the same worst-case regret bound as the algorithm that solves the misspecified linear bandit instances. As a consequence, our results imply a $O(d\sqrt{T\log T})$ high-probability regret bound for contextual linear bandits, making progress in resolving an open problem in (Li et al., 2019), (Li et al., 2021). Our reduction framework opens up a new way to approach stochastic contextual linear bandit problems, and enables improved regret bounds in a number of instances including the batch setting, contextual bandits with misspecifications, contextual bandits with sparse unknown parameters, and contextual bandits with adversarial corruption. △ Less

Submitted 26 May, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

arXiv:2211.00128 [pdf, other]

SIMPLE-RC: Group Network Inference with Non-Sharp Nulls and Weak Signals

Authors: Jianqing Fan, Yingying Fan, **chi Lv, Fan Yang

Abstract: Large-scale network inference with uncertainty quantification has important applications in natural, social, and medical sciences. The recent work of Fan, Fan, Han and Lv (2022) introduced a general framework of statistical inference on membership profiles in large networks (SIMPLE) for testing the sharp null hypothesis that a pair of given nodes share the same membership profiles. In real applica… ▽ More Large-scale network inference with uncertainty quantification has important applications in natural, social, and medical sciences. The recent work of Fan, Fan, Han and Lv (2022) introduced a general framework of statistical inference on membership profiles in large networks (SIMPLE) for testing the sharp null hypothesis that a pair of given nodes share the same membership profiles. In real applications, there are often groups of nodes under investigation that may share similar membership profiles at the presence of relatively weaker signals than the setting considered in SIMPLE. To address these practical challenges, in this paper we propose a SIMPLE method with random coupling (SIMPLE-RC) for testing the non-sharp null hypothesis that a group of given nodes share similar (not necessarily identical) membership profiles under weaker signals. Utilizing the idea of random coupling, we construct our test as the maximum of the SIMPLE tests for subsampled node pairs from the group. Such technique reduces significantly the correlation among individual SIMPLE tests while largely maintaining the power, enabling delicate analysis on the asymptotic distributions of the SIMPLE-RC test. Our method and theory cover both the cases with and without node degree heterogeneity. These new theoretical developments are empowered by a second-order expansion of spiked eigenvectors under the $\ell_\infty$-norm, built upon our work for random matrices with weak spikes. Our theoretical results and the practical advantages of the newly suggested method are demonstrated through several simulation and real data examples. △ Less

Submitted 31 October, 2022; originally announced November 2022.

Comments: 71 pages, 4 figures

arXiv:2206.06270 [pdf, other]

Near-Optimal Sample Complexity Bounds for Constrained MDPs

Authors: Sharan Vaswani, Lin F. Yang, Csaba Szepesvári

Abstract: In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator).… ▽ More In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an $ε$-optimal policy with probability $1 - δ$, by making $\tilde{O}\left(\frac{S A \log(1/δ)}{(1 - γ)^3 ε^2}\right)$ queries to the generative model, thus matching the sample-complexity for unconstrained MDPs. For (ii), we show that the algorithm's sample complexity is upper-bounded by $\tilde{O} \left(\frac{S A \, \log(1/δ)}{(1 - γ)^5 \, ε^2 ζ^2} \right)$ where $ζ$ is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the first near minimax optimal bounds for discounted CMDPs. Our results show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation. △ Less

Submitted 19 November, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: NeurIPS'22

arXiv:2206.03985 [pdf, other]

How unfair is private learning ?

Authors: Amartya Sanyal, Yaxi Hu, Fanny Yang

Abstract: As machine learning algorithms are deployed on sensitive data in critical decision making processes, it is becoming increasingly important that they are also private and fair. In this paper, we show that, when the data has a long-tailed structure, it is not possible to build accurate learning algorithms that are both private and results in higher accuracy on minority subpopulations. We further sho… ▽ More As machine learning algorithms are deployed on sensitive data in critical decision making processes, it is becoming increasingly important that they are also private and fair. In this paper, we show that, when the data has a long-tailed structure, it is not possible to build accurate learning algorithms that are both private and results in higher accuracy on minority subpopulations. We further show that relaxing overall accuracy can lead to good fairness even with strict privacy requirements. To corroborate our theoretical results in practice, we provide an extensive set of experimental results using a variety of synthetic, vision (CIFAR10 and CelebA), and tabular (Law School) datasets and learning algorithms. △ Less

Submitted 24 December, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: Accepted as an Oral paper in UAI '2022, Major update on 23 Dec, 2022

arXiv:2206.03718 [pdf, other]

Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach

Authors: Fan Yang, Kai He, Linxiao Yang, Hongxia Du, **gbang Yang, Bo Yang, Liang Sun

Abstract: Rule sets are highly interpretable logical models in which the predicates for decision are expressed in disjunctive normal form (DNF, OR-of-ANDs), or, equivalently, the overall model comprises an unordered collection of if-then decision rules. In this paper, we consider a submodular optimization based approach for learning rule sets. The learning problem is framed as a subset selection task in whi… ▽ More Rule sets are highly interpretable logical models in which the predicates for decision are expressed in disjunctive normal form (DNF, OR-of-ANDs), or, equivalently, the overall model comprises an unordered collection of if-then decision rules. In this paper, we consider a submodular optimization based approach for learning rule sets. The learning problem is framed as a subset selection task in which a subset of all possible rules needs to be selected to form an accurate and interpretable rule set. We employ an objective function that exhibits submodularity and thus is amenable to submodular optimization techniques. To overcome the difficulty arose from dealing with the exponential-sized ground set of rules, the subproblem of searching a rule is casted as another subset selection task that asks for a subset of features. We show it is possible to write the induced objective function for the subproblem as a difference of two submodular (DS) functions to make it approximately solvable by DS optimization algorithms. Overall, the proposed approach is simple, scalable, and likely to be benefited from further research on submodular optimization. Experiments on real datasets demonstrate the effectiveness of our method. △ Less

Submitted 8 June, 2022; originally announced June 2022.

Comments: NeurIPS 2021 (Spotlight)

arXiv:2206.00270 [pdf, ps, other]

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

Authors: Sanae Amani, Lin F. Yang, Ching-An Cheng

Abstract: We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that provably achieves sublinear regret for any sequence of tasks, which may be adaptively chosen b… ▽ More We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that provably achieves sublinear regret for any sequence of tasks, which may be adaptively chosen based on the agent's past behaviors. Remarkably, our algorithm uses only sublinear number of planning calls, which means that the agent eventually learns a policy that is near optimal for multiple tasks (seen or unseen) without the need of deliberate planning. A key to this property is a new structural assumption that enables computation sharing across tasks during exploration. Specifically, for $K$ task episodes of horizon $H$, our algorithm has a regret bound $\tilde{\mathcal{O}}(\sqrt{(d^3+d^\prime d)H^4K})$ based on $\mathcal{O}(dH\log(K))$ number of planning calls, where $d$ and $d^\prime$ are the feature dimensions of the dynamics and rewards, respectively. This theoretical guarantee implies that our algorithm can enable a lifelong learning agent to accumulate experiences and learn to rapidly solve new tasks. △ Less

Submitted 1 June, 2022; originally announced June 2022.

arXiv:2205.13170 [pdf, other]

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

Authors: Sanae Amani, Tor Lattimore, András György, Lin F. Yang

Abstract: We study distributed contextual linear bandits with stochastic contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $Ω(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. W… ▽ More We study distributed contextual linear bandits with stochastic contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $Ω(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. We then propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server. We prove that the communication cost of DisBE-LUCB matches our lower bound up to logarithmic factors. In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds. We also provide similar bounds for practical settings where the context distribution can only be estimated. Therefore, our proposed algorithm is nearly minimax optimal in terms of \emph{both regret and communication cost}. Finally, we propose DecBE-LUCB, a fully decentralized version of DisBE-LUCB, which operates without a central server, where agents share information with their \emph{immediate neighbors} through a carefully designed consensus procedure. △ Less

Submitted 7 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2204.00492 [pdf, other]

Provable concept learning for interpretable predictions using variational autoencoders

Authors: Armeen Taeb, Nicolo Ruggeri, Carina Schnuck, Fanny Yang

Abstract: In safety-critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available. Many attempts to provide such explanations revolve around pixel-based attributions or use previously known concepts. In this paper we aim to provide explanations by provably identifying \emph{high-level, previously unknown ground-truth concepts}. To this end, we p… ▽ More In safety-critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available. Many attempts to provide such explanations revolve around pixel-based attributions or use previously known concepts. In this paper we aim to provide explanations by provably identifying \emph{high-level, previously unknown ground-truth concepts}. To this end, we propose a probabilistic modeling framework to derive (C)oncept (L)earning and (P)rediction (CLAP) -- a VAE-based classifier that uses visually interpretable concepts as predictors for a simple classifier. Assuming a generative model for the ground-truth concepts, we prove that CLAP is able to identify them while attaining optimal classification accuracy. Our experiments on synthetic datasets verify that CLAP identifies distinct ground-truth concepts on synthetic datasets and yields promising results on the medical Chest X-Ray dataset. △ Less

Submitted 22 July, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

arXiv:2203.06335 [pdf, other]

Doubly Coupled Designs for Computer Experiments with both Qualitative and Quantitative Factors

Authors: Feng Yang, C. Devon Lin, Yongdao Zhou, Yuanzhen He

Abstract: Computer experiments with both qualitative and quantitative input variables occur frequently in many scientific and engineering applications. How to choose input settings for such experiments is an important issue for accurate statistical analysis, uncertainty quantification and decision making. Sliced Latin hypercube designs are the first systematic approach to address this issue. However, it com… ▽ More Computer experiments with both qualitative and quantitative input variables occur frequently in many scientific and engineering applications. How to choose input settings for such experiments is an important issue for accurate statistical analysis, uncertainty quantification and decision making. Sliced Latin hypercube designs are the first systematic approach to address this issue. However, it comes with the increasing cost associated with an increasing large number of level combinations of the qualitative factors. For the reason of run size economy, marginally coupled designs were proposed in which the design for the quantitative factors is a sliced Latin hypercube design with respect to each qualitative factor. The drawback of such designs is that the corresponding data may not be able to capture the effects between any two (and more) qualitative factors and quantitative factors. To balance the run size and design efficiency, we propose a new type of designs, doubly coupled designs, where the design points for the quantitative factors form a sliced Latin hypercube design with respect to the levels of any qualitative factor and with respect to the level combinations of any two qualitative factors, respectively. The proposed designs have the better stratification property between the qualitative and quantitative factors compared with marginally coupled designs. The existence of the proposed designs is established. Several construction methods are introduced, and the properties of the resulting designs are also studied. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: Statistica Sinica (2021)

arXiv:2203.03597 [pdf, other]

Fast Rates for Noisy Interpolation Require Rethinking the Effects of Inductive Bias

Authors: Konstantin Donhauser, Nicolo Ruggeri, Stefan Stojanovic, Fanny Yang

Abstract: Good generalization performance on high-dimensional data crucially hinges on a simple structure of the ground truth and a corresponding strong inductive bias of the estimator. Even though this intuition is valid for regularized models, in this paper we caution against a strong inductive bias for interpolation in the presence of noise: While a stronger inductive bias encourages a simpler structure… ▽ More Good generalization performance on high-dimensional data crucially hinges on a simple structure of the ground truth and a corresponding strong inductive bias of the estimator. Even though this intuition is valid for regularized models, in this paper we caution against a strong inductive bias for interpolation in the presence of noise: While a stronger inductive bias encourages a simpler structure that is more aligned with the ground truth, it also increases the detrimental effect of noise. Specifically, for both linear regression and classification with a sparse ground truth, we prove that minimum $\ell_p$-norm and maximum $\ell_p$-margin interpolators achieve fast polynomial rates close to order $1/n$ for $p > 1$ compared to a logarithmic rate for $p = 1$. Finally, we provide preliminary experimental evidence that this trade-off may also play a crucial role in understanding non-linear interpolating models used in practice. △ Less

Submitted 26 October, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

arXiv:2203.02006 [pdf, other]

Why adversarial training can hurt robust accuracy

Authors: Jacob Clarysse, Julia Hörrmann, Fanny Yang

Abstract: Machine learning classifiers with high test accuracy often perform poorly under adversarial attacks. It is commonly believed that adversarial training alleviates this issue. In this paper, we demonstrate that, surprisingly, the opposite may be true -- Even though adversarial training helps when enough data is available, it may hurt robust generalization in the small sample size regime. We first pr… ▽ More Machine learning classifiers with high test accuracy often perform poorly under adversarial attacks. It is commonly believed that adversarial training alleviates this issue. In this paper, we demonstrate that, surprisingly, the opposite may be true -- Even though adversarial training helps when enough data is available, it may hurt robust generalization in the small sample size regime. We first prove this phenomenon for a high-dimensional linear classification setting with noiseless observations. Our proof provides explanatory insights that may also transfer to feature learning models. Further, we observe in experiments on standard image datasets that the same behavior occurs for perceptible attacks that effectively reduce class information such as mask attacks and object corruptions. △ Less

Submitted 3 March, 2022; originally announced March 2022.

arXiv:2111.05987 [pdf, other]

Tight bounds for minimum l1-norm interpolation of noisy data

Authors: Guillaume Wang, Konstantin Donhauser, Fanny Yang

Abstract: We provide matching upper and lower bounds of order $σ^2/\log(d/n)$ for the prediction error of the minimum $\ell_1$-norm interpolator, a.k.a. basis pursuit. Our result is tight up to negligible terms when $d \gg n$, and is the first to imply asymptotic consistency of noisy minimum-norm interpolation for isotropic features and sparse ground truths. Our work complements the literature on "benign ov… ▽ More We provide matching upper and lower bounds of order $σ^2/\log(d/n)$ for the prediction error of the minimum $\ell_1$-norm interpolator, a.k.a. basis pursuit. Our result is tight up to negligible terms when $d \gg n$, and is the first to imply asymptotic consistency of noisy minimum-norm interpolation for isotropic features and sparse ground truths. Our work complements the literature on "benign overfitting" for minimum $\ell_2$-norm interpolation, where asymptotic consistency can be achieved only when the features are effectively low-dimensional. △ Less

Submitted 7 March, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: 33 pages, 1 figure; accepted to AISTATS 2022

arXiv:2111.00633 [pdf, ps, other]

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Authors: Yuanzhi Li, Ruosong Wang, Lin F. Yang

Abstract: Recently there is a surge of interest in understanding the horizon-dependence of the sample complexity in reinforcement learning (RL). Notably, for an RL environment with horizon length $H$, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O(1)$-optimal policy using $\mathrm{polylog}(H)$ episodes of environment interactions when the number of… ▽ More Recently there is a surge of interest in understanding the horizon-dependence of the sample complexity in reinforcement learning (RL). Notably, for an RL environment with horizon length $H$, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O(1)$-optimal policy using $\mathrm{polylog}(H)$ episodes of environment interactions when the number of states and actions is fixed. It is yet unknown whether the $\mathrm{polylog}(H)$ dependence is necessary or not. In this work, we resolve this question by develo** an algorithm that achieves the same PAC guarantee while using only $O(1)$ episodes of environment interactions, completely settling the horizon-dependence of the sample complexity in RL. We achieve this bound by (i) establishing a connection between value functions in discounted and finite-horizon Markov decision processes (MDPs) and (ii) a novel perturbation analysis in MDPs. We believe our new techniques are of independent interest and could be applied in related questions in RL. △ Less

Submitted 31 October, 2021; originally announced November 2021.

arXiv:2108.02883 [pdf, other]

Interpolation can hurt robust generalization even when there is no noise

Authors: Konstantin Donhauser, Alexandru Ţifrea, Michael Aerni, Reinhard Heckel, Fanny Yang

Abstract: Numerous recent works show that overparameterization implicitly reduces variance for min-norm interpolators and max-margin classifiers. These findings suggest that ridge regularization has vanishing benefits in high dimensions. We challenge this narrative by showing that, even in the absence of noise, avoiding interpolation through ridge regularization can significantly improve generalization. We… ▽ More Numerous recent works show that overparameterization implicitly reduces variance for min-norm interpolators and max-margin classifiers. These findings suggest that ridge regularization has vanishing benefits in high dimensions. We challenge this narrative by showing that, even in the absence of noise, avoiding interpolation through ridge regularization can significantly improve generalization. We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting. △ Less

Submitted 16 December, 2021; v1 submitted 5 August, 2021; originally announced August 2021.

arXiv:2107.11014 [pdf]

Post-Treatment Confounding in Causal Mediation Studies: A Cutting-Edge Problem and A Novel Solution via Sensitivity Analysis

Authors: Guanglei Hong, Fan Yang, Xu Qin

Abstract: In causal mediation studies that decompose an average treatment effect into a natural indirect effect (NIE) and a natural direct effect (NDE), examples of post-treatment confounding are abundant. Past research has generally considered it infeasible to adjust for a post-treatment confounder of the mediator-outcome relationship due to incomplete information: it is observed under the actual treatment… ▽ More In causal mediation studies that decompose an average treatment effect into a natural indirect effect (NIE) and a natural direct effect (NDE), examples of post-treatment confounding are abundant. Past research has generally considered it infeasible to adjust for a post-treatment confounder of the mediator-outcome relationship due to incomplete information: it is observed under the actual treatment condition while missing under the counterfactual treatment condition. This study proposes a new sensitivity analysis strategy for handling post-treatment confounding and incorporates it into weighting-based causal mediation analysis without making extra identification assumptions. Under the sequential ignorability of the treatment assignment and of the mediator, we obtain the conditional distribution of the post-treatment confounder under the counterfactual treatment as a function of not just pretreatment covariates but also its counterpart under the actual treatment. The sensitivity analysis then generates a bound for the NIE and that for the NDE over a plausible range of the conditional correlation between the post-treatment confounder under the actual and that under the counterfactual conditions. Implemented through either imputation or integration, the strategy is suitable for binary as well as continuous measures of post-treatment confounders. Simulation results demonstrate major strengths and potential limitations of this new solution. A re-analysis of the National Evaluation of Welfare-to-Work Strategies (NEWWS) Riverside data reveals that the initial analytic results are sensitive to omitted post-treatment confounding. △ Less

Submitted 22 July, 2021; originally announced July 2021.

arXiv:2106.07841 [pdf, other]

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

Authors: Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang

Abstract: We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.… ▽ More We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. To attain optimistic value function estimation without resorting to a UCB-style bonus, we introduce an optimistic reward sampling procedure. When the value functions can be represented by a function class $\mathcal{F}$, our algorithm achieves a worst-case regret bound of $\widetilde{O}(\mathrm{poly}(d_EH)\sqrt{T})$ where $T$ is the time elapsed, $H$ is the planning horizon and $d_E$ is the $\textit{eluder dimension}$ of $\mathcal{F}$. In the linear setting, our algorithm reduces to LSVI-PHE, a variant of RLSVI, that enjoys an $\widetilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret. We complement the theory with an empirical evaluation across known difficult exploration tasks. △ Less

Submitted 25 October, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: 32 page, 5 figures, in Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

arXiv:2106.07203 [pdf, ps, other]

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Authors: Dingwen Kong, Ruslan Salakhutdinov, Ruosong Wang, Lin F. Yang

Abstract: Most of the existing works for reinforcement learning (RL) with general function approximation (FA) focus on understanding the statistical complexity or regret bounds. However, the computation complexity of such approaches is far from being understood -- indeed, a simple optimization problem over the function class might be as well intractable. In this paper, we tackle this problem by establishing… ▽ More Most of the existing works for reinforcement learning (RL) with general function approximation (FA) focus on understanding the statistical complexity or regret bounds. However, the computation complexity of such approaches is far from being understood -- indeed, a simple optimization problem over the function class might be as well intractable. In this paper, we tackle this problem by establishing an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm and uses the measurement to guide exploration. For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $\propto\operatorname{poly}\log(K)$ times for running the RL algorithm for $K$ episodes while still achieving a small near-optimal regret bound. In contrast to existing approaches that update the policy for at least $Ω(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy. When applied to settings in \cite{wang2020reinforcement} or \cite{**2021bellman}, we improve the overall time complexity by at least a factor of $K$. Finally, we show the generality of our online sub-sampling technique by applying it to the reward-free RL setting and multi-agent RL setting. △ Less

Submitted 18 April, 2023; v1 submitted 14 June, 2021; originally announced June 2021.

arXiv:2106.06239 [pdf, ps, other]

Safe Reinforcement Learning with Linear Function Approximation

Authors: Sanae Amani, Christos Thrampoulidis, Lin F. Yang

Abstract: Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to provide regret guarantees for settings where safety constraints need to be learned. In this paper, we address both problems by first modeling safety as an unkn… ▽ More Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to provide regret guarantees for settings where safety constraints need to be learned. In this paper, we address both problems by first modeling safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold. We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for episodic Markov decision processes (MDPs) with linear function approximation. We show that SLUCB-QVI and RSLUCB-QVI, while with \emph{no safety violation}, achieve a $\tilde{\mathcal{O}}\left(κ\sqrt{d^3H^3T}\right)$ regret, nearly matching that of state-of-the-art unsafe algorithms, where $H$ is the duration of each episode, $d$ is the dimension of the feature map**, $κ$ is a constant characterizing the safety constraints, and $T$ is the total number of action plays. We further present numerical simulations that corroborate our theoretical findings. △ Less

Submitted 11 June, 2021; originally announced June 2021.

arXiv:2106.03591 [pdf, other]

Calibrating multi-dimensional complex ODE from noisy data via deep neural networks

Authors: Kexuan Li, Fangfang Wang, Ruiqi Liu, Fan Yang, Zuofeng Shang

Abstract: Ordinary differential equations (ODEs) are widely used to model complex dynamics that arises in biology, chemistry, engineering, finance, physics, etc. Calibration of a complicated ODE system using noisy data is generally very difficult. In this work, we propose a two-stage nonparametric approach to address this problem. We first extract the de-noised data and their higher order derivatives using… ▽ More Ordinary differential equations (ODEs) are widely used to model complex dynamics that arises in biology, chemistry, engineering, finance, physics, etc. Calibration of a complicated ODE system using noisy data is generally very difficult. In this work, we propose a two-stage nonparametric approach to address this problem. We first extract the de-noised data and their higher order derivatives using boundary kernel method, and then feed them into a sparsely connected deep neural network with ReLU activation function. Our method is able to recover the ODE system without being subject to the curse of dimensionality and complicated ODE structure. When the ODE possesses a general modular structure, with each modular component involving only a few input variables, and the network architecture is properly chosen, our method is proven to be consistent. Theoretical properties are corroborated by an extensive simulation study that demonstrates the validity and effectiveness of the proposed method. Finally, we use our method to simultaneously characterize the growth rate of Covid-19 infection cases from 50 states of the USA. △ Less

Submitted 18 September, 2023; v1 submitted 7 June, 2021; originally announced June 2021.

arXiv:2104.04244 [pdf, other]

How rotational invariance of common kernels prevents generalization in high dimensions

Authors: Konstantin Donhauser, Mingqi Wu, Fanny Yang

Abstract: Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings. However, its behavior in high dimensions is much less understood. Recent work establishes consistency for kernel regression under certain assumptions on the ground truth function and the distribution of the input data. In this paper, we show that the rotational invariance property of commonly studie… ▽ More Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings. However, its behavior in high dimensions is much less understood. Recent work establishes consistency for kernel regression under certain assumptions on the ground truth function and the distribution of the input data. In this paper, we show that the rotational invariance property of commonly studied kernels (such as RBF, inner product kernels and fully-connected NTK of any depth) induces a bias towards low-degree polynomials in high dimensions. Our result implies a lower bound on the generalization error for a wide range of distributions and various choices of the scaling for kernels with different eigenvalue decays. This lower bound suggests that general consistency results for kernel ridge regression in high dimensions require a more refined analysis that depends on the structure of the kernel beyond its eigenvalue decay. △ Less

Submitted 9 April, 2021; originally announced April 2021.

arXiv:2103.11559 [pdf, other]

Provably Correct Optimization and Exploration with Non-linear Policies

Authors: Fei Feng, Wotao Yin, Alekh Agarwal, Lin F. Yang

Abstract: Policy optimization methods remain a powerful workhorse in empirical Reinforcement Learning (RL), with a focus on neural policies that can easily reason over complex and continuous state and/or action spaces. Theoretical understanding of strategic exploration in policy-based methods with non-linear function approximation, however, is largely missing. In this paper, we address this question by desi… ▽ More Policy optimization methods remain a powerful workhorse in empirical Reinforcement Learning (RL), with a focus on neural policies that can easily reason over complex and continuous state and/or action spaces. Theoretical understanding of strategic exploration in policy-based methods with non-linear function approximation, however, is largely missing. In this paper, we address this question by designing ENIAC, an actor-critic method that allows non-linear function approximation in the critic. We show that under certain assumptions, e.g., a bounded eluder dimension $d$ for the critic class, the learner finds a near-optimal policy in $O(\poly(d))$ exploration rounds. The method is robust to model misspecification and strictly extends existing works on linear function approximation. We also develop some computational optimizations of our approach with slightly worse statistical guarantees and an empirical adaptation building on existing deep RL tools. We empirically evaluate this adaptation and show that it outperforms prior heuristics inspired by linear methods, establishing the value via correctly reasoning about the agent's uncertainty under non-linear function approximation. △ Less

Submitted 21 March, 2021; originally announced March 2021.

arXiv:2102.12948 [pdf, ps, other]

Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

Authors: Nived Rajaraman, Yanjun Han, Lin F. Yang, Kannan Ramchandran, Jiantao Jiao

Abstract: We study the statistical limits of Imitation Learning (IL) in episodic Markov Decision Processes (MDPs) with a state space $\mathcal{S}$. We focus on the known-transition setting where the learner is provided a dataset of $N$ length-$H$ trajectories from a deterministic expert policy and knows the MDP transition. We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using t… ▽ More We study the statistical limits of Imitation Learning (IL) in episodic Markov Decision Processes (MDPs) with a state space $\mathcal{S}$. We focus on the known-transition setting where the learner is provided a dataset of $N$ length-$H$ trajectories from a deterministic expert policy and knows the MDP transition. We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient. In contrast, we show the minimax suboptimality grows as $Ω( H^{3/2}/N)$ when $|\mathcal{S}|\geq 3$ while the unknown-transition setting suffers from a larger sharp rate $Θ(|\mathcal{S}|H^2/N)$ (Rajaraman et al (2020)). The lower bound is established by proving a two-way reduction between IL and the value estimation problem of the unknown expert policy under any given reward function, as well as building connections with linear functional estimation with subsampled observations. We further show that under the additional assumption that the expert is optimal for the true reward function, there exists an efficient algorithm, which we term as Mimic-Mixture, that provably achieves suboptimality $O(1/N)$ for arbitrary 3-state MDPs with rewards only at the terminal layer. In contrast, no algorithm can achieve suboptimality $O(\sqrt{H}/N)$ with high probability if the expert is not constrained to be optimal. Our work formally establishes the benefit of the expert optimal assumption in the known transition setting, while Rajaraman et al (2020) showed it does not help when transitions are unknown. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 30 pages, 2 figures

arXiv:2101.00494 [pdf, ps, other]

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

Authors: Minbo Gao, Tianle Xie, Simon S. Du, Lin F. Yang

Abstract: Many real-world applications, such as those in medical domains, recommendation systems, etc, can be formulated as large state space reinforcement learning problems with only a small budget of the number of policy changes, i.e., low switching cost. This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, ** et al 2020] where the linear function approxima… ▽ More Many real-world applications, such as those in medical domains, recommendation systems, etc, can be formulated as large state space reinforcement learning problems with only a small budget of the number of policy changes, i.e., low switching cost. This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, ** et al 2020] where the linear function approximation is used for generalization on the large state space. We present the first algorithm for linear MDP with a low switching cost. Our algorithm achieves an $\widetilde{O}\left(\sqrt{d^3H^4K}\right)$ regret bound with a near-optimal $O\left(d H\log K\right)$ global switching cost where $d$ is the feature dimension, $H$ is the planning horizon and $K$ is the number of episodes the agent plays. Our regret bound matches the best existing polynomial algorithm by [** et al 2020] and our switching cost is exponentially smaller than theirs. When specialized to tabular MDP, our switching cost bound improves those in [Bai et al 2019, Zhang et al 20020]. We complement our positive result with an $Ω\left(dH/\log d\right)$ global switching cost lower bound for any no-regret algorithm. △ Less

Submitted 2 January, 2021; originally announced January 2021.

arXiv:2011.14267 [pdf, ps, other]

Minimax Sample Complexity for Turn-based Stochastic Game

Authors: Qiwen Cui, Lin F. Yang

Abstract: The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing a `simulator' that allow… ▽ More The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing a `simulator' that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop absorbing TBSG and reward perturbation techniques to tackle the complex statistical dependence. The key idea is artificially introducing a suboptimality gap in TBSG and then the Nash equilibrium strategy lies in a finite set. △ Less

Submitted 28 November, 2020; originally announced November 2020.

Comments: 15 pages

arXiv:2011.13034 [pdf, other]

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Authors: **gfeng Wu, Vladimir Braverman, Lin F. Yang

Abstract: In this paper we consider multi-objective reinforcement learning where the objectives are balanced using preferences. In practice, the preferences are often given in an adversarial manner, e.g., customers can be picky in many applications. We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product… ▽ More In this paper we consider multi-objective reinforcement learning where the objectives are balanced using preferences. In practice, the preferences are often given in an adversarial manner, e.g., customers can be picky in many applications. We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product of a preference vector with pre-specified multi-objective reward functions. We consider two settings. In the online setting, the agent receives a (adversarial) preference every episode and proposes policies to interact with the environment. We provide a model-based algorithm that achieves a nearly minimax optimal regret bound $\widetilde{\mathcal{O}}\bigl(\sqrt{\min\{d,S\}\cdot H^2 SAK}\bigr)$, where $d$ is the number of objectives, $S$ is the number of states, $A$ is the number of actions, $H$ is the length of the horizon, and $K$ is the number of episodes. Furthermore, we consider preference-free exploration, i.e., the agent first interacts with the environment without specifying any preference and then is able to accommodate arbitrary preference vector up to $ε$ error. Our proposed algorithm is provably efficient with a nearly optimal trajectory complexity $\widetilde{\mathcal{O}}\bigl({\min\{d,S\}\cdot H^3 SA}/{ε^2}\bigr)$. This result partly resolves an open problem raised by \citet{**2020reward}. △ Less

Submitted 27 October, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

Comments: NeurIPS 2021 Camera Ready Version

arXiv:2010.11750 [pdf, other]

Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

Authors: Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su

Abstract: The problem of learning one task with samples from another task has received much interest recently. In this paper, we ask a fundamental question: when is combining data from two tasks better than learning one task alone? Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices. However, quantifying such a transfer effect… ▽ More The problem of learning one task with samples from another task has received much interest recently. In this paper, we ask a fundamental question: when is combining data from two tasks better than learning one task alone? Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices. However, quantifying such a transfer effect is challenging since we need to compare the risks between joint learning and single-task learning, and the comparative advantage of one over the other depends on the exact kind of dataset shift between both tasks. This paper uses random matrix theory to tackle this challenge in a linear regression setting with two tasks. We give precise asymptotics about the excess risks of some commonly used estimators in the high-dimensional regime, when the sample sizes increase proportionally with the feature dimension at fixed ratios. The precise asymptotics is provided as a function of the sample sizes and covariate/model shifts, which can be used to study transfer effects: In a random-effects model, we give conditions to determine positive and negative transfers between learning two tasks versus single-task learning; the conditions reveal intricate relations between dataset shifts and transfer effects. Simulations justify the validity of the asymptotics in finite dimensions. Our analysis examines several functions of two different sample covariance matrices, revealing some estimates that generalize classical results in the random matrix theory literature, which may be of independent interest. △ Less

Submitted 10 August, 2023; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: 64 pages, 6 figures; We thoroughly revised the paper by adding new results and reorganizing the presentation

arXiv:2009.05990 [pdf, ps, other]

Toward the Fundamental Limits of Imitation Learning

Authors: Nived Rajaraman, Lin F. Yang, Jiantao Jiao, Kannan Ramachandran

Abstract: Imitation learning (IL) aims to mimic the behavior of an expert policy in a sequential decision-making problem given only demonstrations. In this paper, we focus on understanding the minimax statistical limits of IL in episodic Markov Decision Processes (MDPs). We first consider the setting where the learner is provided a dataset of $N$ expert trajectories ahead of time, and cannot interact with t… ▽ More Imitation learning (IL) aims to mimic the behavior of an expert policy in a sequential decision-making problem given only demonstrations. In this paper, we focus on understanding the minimax statistical limits of IL in episodic Markov Decision Processes (MDPs). We first consider the setting where the learner is provided a dataset of $N$ expert trajectories ahead of time, and cannot interact with the MDP. Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy. Here $\mathcal{S}$ is the state space, and $H$ is the length of the episode. Furthermore, we establish a suboptimality lower bound of $\gtrsim |\mathcal{S}| H^2 / N$ which applies even if the expert is constrained to be deterministic, or if the learner is allowed to actively query the expert at visited states while interacting with the MDP for $N$ episodes. To our knowledge, this is the first algorithm with suboptimality having no dependence on the number of actions, under no additional assumptions. We then propose a novel algorithm based on minimum-distance functionals in the setting where the transition model is given and the expert is deterministic. The algorithm is suboptimal by $\lesssim \min \{ H \sqrt{|\mathcal{S}| / N} ,\ |\mathcal{S}| H^{3/2} / N \}$, showing that knowledge of transition improves the minimax rate by at least a $\sqrt{H}$ factor. △ Less

Submitted 13 September, 2020; originally announced September 2020.

Comments: 45 pages, 3 figures

arXiv:2008.06736 [pdf, other]

Obtaining Adjustable Regularization for Free via Iterate Averaging

Authors: **gfeng Wu, Vladimir Braverman, Lin F. Yang

Abstract: Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regress… ▽ More Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regression problems, then by averaging the SGD iterates properly, we obtain a regularized solution. It left open whether the same phenomenon can be achieved for other optimization problems and algorithms. In this paper, we establish an averaging scheme that provably converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart with an adjustable regularization parameter. Our approaches can be used for accelerated and preconditioned optimization methods as well. We further show that the same methods work empirically on more general optimization objectives including neural networks. In sum, we obtain adjustable regularization for free for a large class of optimization problems and resolve an open question raised by Neu and Rosasco. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: ICML 2020 camera ready

arXiv:2008.06615 [pdf, other]

doi 10.1002/sim.9523

A Calibration Approach to Transportability and Data-Fusion with Observational Data

Authors: Kevin P. Josey, Fan Yang, Debashis Ghosh, Sridharan Raghavan

Abstract: Two important considerations in clinical research studies are proper evaluations of internal and external validity. While randomized clinical trials can overcome several threats to internal validity, they may be prone to poor external validity. Conversely, large prospective observational studies sampled from a broadly generalizable population may be externally valid, yet susceptible to threats to… ▽ More Two important considerations in clinical research studies are proper evaluations of internal and external validity. While randomized clinical trials can overcome several threats to internal validity, they may be prone to poor external validity. Conversely, large prospective observational studies sampled from a broadly generalizable population may be externally valid, yet susceptible to threats to internal validity, particularly confounding. Thus, methods that address confounding and enhance transportability of study results across populations are essential for internally and externally valid causal inference, respectively. These issues persist for another problem closely related to transportability known as data-fusion. We develop a calibration method to generate balancing weights that address confounding and sampling bias, thereby enabling valid estimation of the target population average treatment effect. We compare the calibration approach to two additional doubly-robust methods that estimate the effect of an intervention on an outcome within a second, possibly unrelated target population. The proposed methodologies can be extended to resolve data-fusion problems that seek to evaluate the effects of an intervention using data from two related studies sampled from different populations. A simulation study is conducted to demonstrate the advantages and similarities of the different techniques. We also test the performance of the calibration approach in a motivating real data example comparing whether the effect of biguanides versus sulfonylureas - the two most common oral diabetes medication classes for initial treatment - on all-cause mortality described in a historical cohort applies to a contemporary cohort of US Veterans with diabetes. △ Less

Submitted 7 July, 2022; v1 submitted 14 August, 2020; originally announced August 2020.

arXiv:2007.07461 [pdf, ps, other]

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Authors: Kaiqing Zhang, Sham M. Kakade, Tamer Başar, Lin F. Yang

Abstract: Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the corner stones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intu… ▽ More Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the corner stones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intuitive and widely-used, the sample complexity of model-based MARL algorithms has not been fully investigated. In this paper, our goal is to address the fundamental question about its sample complexity. We study arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model. We show that model-based MARL achieves a sample complexity of $\tilde O(|S||A||B|(1-γ)^{-3}ε^{-2})$ for finding the Nash equilibrium (NE) value up to some $ε$ error, and the $ε$-NE policies with a smooth planning oracle, where $γ$ is the discount factor, and $S,A,B$ denote the state space, and the action spaces for the two agents. We further show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge, by establishing a matching lower bound. This is in contrast to the usual reward-aware setting, with a $\tildeΩ(|S|(|A|+|B|)(1-γ)^{-3}ε^{-2})$ lower bound, where this model-based approach is near-optimal with only a gap on the $|A|,|B|$ dependence. Our results not only demonstrate the sample-efficiency of this basic model-based approach in MARL, but also elaborate on the fundamental tradeoff between its power (easily handling the more challenging reward-agnostic case) and limitation (less adaptive and suboptimal in $|A|,|B|$), particularly arises in the multi-agent context. △ Less

Submitted 8 August, 2023; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: Updated version accepted to Journal of Machine Learning Research (JMLR)

arXiv:2006.13485 [pdf, other]

Fairness with Overlap** Groups

Authors: Forest Yang, Moustapha Cisse, Sanmi Koyejo

Abstract: In algorithmically fair prediction problems, a standard goal is to ensure the equality of fairness metrics across multiple overlap** groups simultaneously. We reconsider this standard fair classification problem using a probabilistic population analysis, which, in turn, reveals the Bayes-optimal classifier. Our approach unifies a variety of existing group-fair classification methods and enables… ▽ More In algorithmically fair prediction problems, a standard goal is to ensure the equality of fairness metrics across multiple overlap** groups simultaneously. We reconsider this standard fair classification problem using a probabilistic population analysis, which, in turn, reveals the Bayes-optimal classifier. Our approach unifies a variety of existing group-fair classification methods and enables extensions to a wide range of non-decomposable multiclass performance metrics and fairness measures. The Bayes-optimal classifier further inspires consistent procedures for algorithmically fair classification with overlap** groups. On a variety of real datasets, the proposed approach outperforms baselines in terms of its fairness-performance tradeoff. △ Less

Submitted 24 June, 2020; originally announced June 2020.

arXiv:2006.11274 [pdf, other]

On Reward-Free Reinforcement Learning with Linear Function Approximation

Authors: Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov

Abstract: Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to c… ▽ More Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to compute a near-optimal policy. ** et al. [2020] showed that in the tabular setting, the agent only needs to collect polynomial number of samples (in terms of the number states, the number of actions, and the planning horizon) for reward-free RL. However, in practice, the number of states and actions can be large, and thus function approximation schemes are required for generalization. In this work, we give both positive and negative results for reward-free RL with linear function approximation. We give an algorithm for reward-free RL in the linear Markov decision process setting where both the transition and the reward admit linear representations. The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions. We further give an exponential lower bound for reward-free RL in the setting where only the optimal $Q$-function admits a linear representation. Our results imply several interesting exponential separations on the sample complexity of reward-free RL. △ Less

Submitted 19 June, 2020; originally announced June 2020.

Showing 1–50 of 111 results for author: Yang, F