Search | arXiv e-print repository

Adaptive Bayesian Multivariate Spline Knot Inference with Prior Specifications on Model Complexity

Authors: Junhui He, Ying Yang, Jian Kang

Abstract: In multivariate spline regression, the number and locations of knots influence the performance and interpretability significantly. However, due to non-differentiability and varying dimensions, there is no desirable frequentist method to make inference on knots. In this article, we propose a fully Bayesian approach for knot inference in multivariate spline regression. The existing Bayesian method o… ▽ More In multivariate spline regression, the number and locations of knots influence the performance and interpretability significantly. However, due to non-differentiability and varying dimensions, there is no desirable frequentist method to make inference on knots. In this article, we propose a fully Bayesian approach for knot inference in multivariate spline regression. The existing Bayesian method often uses BIC to calculate the posterior, but BIC is too liberal and it will heavily overestimate the knot number when the candidate model space is large. We specify a new prior on the knot number to take into account the complexity of the model space and derive an analytic formula in the normal model. In the non-normal cases, we utilize the extended Bayesian information criterion to approximate the posterior density. The samples are simulated in the space with differing dimensions via reversible jump Markov chain Monte Carlo. We apply the proposed method in knot inference and manifold denoising. Experiments demonstrate the splendid capability of the algorithm, especially in function fitting with jum** discontinuity. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13342 [pdf, other]

Scalable Bayesian inference for heat kernel Gaussian processes on manifolds

Authors: Junhui He, Guoxuan Ma, Jian Kang, Ying Yang

Abstract: We develop scalable manifold learning methods and theory, motivated by the problem of estimating manifold of fMRI activation in the Human Connectome Project (HCP). We propose the Fast Graph Laplacian Estimation for Heat Kernel Gaussian Processes (FLGP) in the natural exponential family model. FLGP handles large sample sizes $ n $, preserves the intrinsic geometry of data, and significantly reduces… ▽ More We develop scalable manifold learning methods and theory, motivated by the problem of estimating manifold of fMRI activation in the Human Connectome Project (HCP). We propose the Fast Graph Laplacian Estimation for Heat Kernel Gaussian Processes (FLGP) in the natural exponential family model. FLGP handles large sample sizes $ n $, preserves the intrinsic geometry of data, and significantly reduces computational complexity from $ \mathcal{O}(n^3) $ to $ \mathcal{O}(n) $ via a novel reduced-rank approximation of the graph Laplacian's transition matrix and truncated Singular Value Decomposition for eigenpair computation. Our numerical experiments demonstrate FLGP's scalability and improved accuracy for manifold learning from large-scale complex data. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.02810 [pdf, other]

Adaptive deep density approximation for stochastic dynamical systems

Authors: Junjie He, Qifeng Liao, Xiaoliang Wan

Abstract: In this paper we consider adaptive deep neural network approximation for stochastic dynamical systems. Based on the Liouville equation associated with the stochastic dynamical systems, a new temporal KRnet (tKRnet) is proposed to approximate the probability density functions (PDFs) of the state variables. The tKRnet gives an explicit density model for the solution of the Liouville equation, which… ▽ More In this paper we consider adaptive deep neural network approximation for stochastic dynamical systems. Based on the Liouville equation associated with the stochastic dynamical systems, a new temporal KRnet (tKRnet) is proposed to approximate the probability density functions (PDFs) of the state variables. The tKRnet gives an explicit density model for the solution of the Liouville equation, which alleviates the curse of dimensionality issue that limits the application of traditional grid based numerical methods. To efficiently train the tKRnet, an adaptive procedure is developed to generate collocation points for the corresponding residual loss function, where samples are generated iteratively using the approximate density function at each iteration. A temporal decomposition technique is also employed to improve the long-time integration. Theoretical analysis of our proposed method is provided, and numerical examples are presented to demonstrate its performance. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 24 pages, 13 figures

MSC Class: 34F05; 60H35; 62M45; 65C30

arXiv:2405.00576 [pdf, other]

Calibration of the rating transition model for high and low default portfolios

Authors: Jian He, Asma Khedher, Peter Spreij

Abstract: In this paper we develop Maximum likelihood (ML) based algorithms to calibrate the model parameters in credit rating transition models. Since the credit rating transition models are not Gaussian linear models, the celebrated Kalman filter is not suitable to compute the likelihood of observed migrations. Therefore, we develop a Laplace approximation of the likelihood function and as a result the Ka… ▽ More In this paper we develop Maximum likelihood (ML) based algorithms to calibrate the model parameters in credit rating transition models. Since the credit rating transition models are not Gaussian linear models, the celebrated Kalman filter is not suitable to compute the likelihood of observed migrations. Therefore, we develop a Laplace approximation of the likelihood function and as a result the Kalman filter can be used in the end to compute the likelihood function. This approach is applied to so-called high-default portfolios, in which the number of migrations (defaults) is large enough to obtain high accuracy of the Laplace approximation. By contrast, low-default portfolios have a limited number of observed migrations (defaults). Therefore, in order to calibrate low-default portfolios, we develop a ML algorithm using a particle filter (PF) and Gaussian process regression. Experiments show that both algorithms are efficient and produce accurate approximations of the likelihood function and the ML estimates of the model parameters. △ Less

Submitted 1 May, 2024; originally announced May 2024.

MSC Class: 91G40; 91G60; 65D15

arXiv:2404.12648 [pdf, ps, other]

Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation

Authors: Jianliang He, Han Zhong, Zhuoran Yang

Abstract: We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. Specifically, we propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP), which incorporates both model-based and value-based incarnations. In particular, LOOP features a novel construction of confidence sets and a low-switching policy upda… ▽ More We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. Specifically, we propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP), which incorporates both model-based and value-based incarnations. In particular, LOOP features a novel construction of confidence sets and a low-switching policy updating scheme, which are tailored to the average-reward and function approximation setting. Moreover, for AMDPs, we propose a novel complexity measure -- average-reward generalized eluder coefficient (AGEC) -- which captures the challenge of exploration in AMDPs with general function approximation. Such a complexity measure encompasses almost all previously known tractable AMDP models, such as linear AMDPs and linear mixture AMDPs, and also includes newly identified cases such as kernel AMDPs and AMDPs with Bellman eluder dimensions. Using AGEC, we prove that LOOP achieves a sublinear $\tilde{\mathcal{O}}(\mathrm{poly}(d, \mathrm{sp}(V^*)) \sqrt{Tβ} )$ regret, where $d$ and $β$ correspond to AGEC and log-covering number of the hypothesis class respectively, $\mathrm{sp}(V^*)$ is the span of the optimal state bias function, $T$ denotes the number of steps, and $\tilde{\mathcal{O}} (\cdot) $ omits logarithmic factors. When specialized to concrete AMDP models, our regret bounds are comparable to those established by the existing algorithms designed specifically for these special cases. To the best of our knowledge, this paper presents the first comprehensive theoretical framework capable of handling nearly all AMDPs. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: ICLR 2024

arXiv:2404.08164 [pdf, other]

Language Model Prompt Selection via Simulation Optimization

Authors: Haoting Zhang, **ghai He, Rhonda Righter, Zeyu Zheng

Abstract: With the advancement in generative language models, the selection of prompts has gained significant attention in recent years. A prompt is an instruction or description provided by the user, serving as a guide for the generative language model in content generation. Despite existing methods for prompt selection that are based on human labor, we consider facilitating this selection through simulati… ▽ More With the advancement in generative language models, the selection of prompts has gained significant attention in recent years. A prompt is an instruction or description provided by the user, serving as a guide for the generative language model in content generation. Despite existing methods for prompt selection that are based on human labor, we consider facilitating this selection through simulation optimization, aiming to maximize a pre-defined score for the selected prompt. Specifically, we propose a two-stage framework. In the first stage, we determine a feasible set of prompts in sufficient numbers, where each prompt is represented by a moderate-dimensional vector. In the subsequent stage for evaluation and selection, we construct a surrogate model of the score regarding the moderate-dimensional vectors that represent the prompts. We propose sequentially selecting the prompt for evaluation based on this constructed surrogate model. We prove the consistency of the sequential evaluation procedure in our framework. We also conduct numerical experiments to demonstrate the efficacy of our proposed framework, providing practical instructions for implementation. △ Less

Submitted 19 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2402.09401 [pdf, other]

Reinforcement Learning from Human Feedback with Active Queries

Authors: Kaixuan Ji, Jiafan He, Quanquan Gu

Abstract: Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF). Despite their superior performance, current RLHF approaches often require a large amount of human-labelled preference data, which is expensive to collect. In this paper, inspired by the success of active learning,… ▽ More Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF). Despite their superior performance, current RLHF approaches often require a large amount of human-labelled preference data, which is expensive to collect. In this paper, inspired by the success of active learning, we address this problem by proposing query-efficient RLHF methods. We first formalize the alignment problem as a contextual dueling bandit problem and design an active-query-based proximal policy optimization (APPO) algorithm with an $\tilde{O}(d^2/Δ)$ regret bound and an $\tilde{O}(d^2/Δ^2)$ query complexity, where $d$ is the dimension of feature space and $Δ$ is the sub-optimality gap over all the contexts. We then propose ADPO, a practical version of our algorithm based on direct preference optimization (DPO) and apply it to fine-tuning LLMs. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 28 pages, 1 figure, 4 table

arXiv:2402.08998 [pdf, other]

Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

Authors: Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu

Abstract: We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we p… ▽ More We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions. Our algorithm is based on extended value iteration with a fine-grained variance-aware confidence set, where the variance is estimated recursively from high-order moments. Our algorithm achieves an $\tilde{\mathcal O}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature map** in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes. Our regret upper bound matches the $Ω(dB_*\sqrt{K})$ lower bound of linear mixture SSPs in Min et al. (2022), which suggests that our algorithm is nearly minimax optimal. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 28 pages, 1 figure, In ICML 2023

arXiv:2402.08991 [pdf, ps, other]

Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

Authors: Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang

Abstract: This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where the transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus on the setting of model-free RL, where robust least-square regression is often employed for value function estimation. However, these techniques cannot be directly applied to mod… ▽ More This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where the transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus on the setting of model-free RL, where robust least-square regression is often employed for value function estimation. However, these techniques cannot be directly applied to model-based RL. In this paper, we focus on model-based RL and take the maximum likelihood estimation (MLE) approach to learn transition model. Our work encompasses both online and offline settings. In the online setting, we introduce an algorithm called corruption-robust optimistic MLE (CR-OMLE), which leverages total-variation (TV)-based information ratios as uncertainty weights for MLE. We prove that CR-OMLE achieves a regret of $\tilde{\mathcal{O}}(\sqrt{T} + C)$, where $C$ denotes the cumulative corruption level after $T$ episodes. We also prove a lower bound to show that the additive dependence on $C$ is optimal. We extend our weighting technique to the offline setting, and propose an algorithm named corruption-robust pessimistic MLE (CR-PMLE). Under a uniform coverage condition, CR-PMLE exhibits suboptimality worsened by $\mathcal{O}(C/n)$, nearly matching the lower bound. To the best of our knowledge, this is the first work on corruption-robust model-based RL algorithms with provable guarantees. △ Less

Submitted 14 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.02357 [pdf, other]

Multi-modal Causal Structure Learning and Root Cause Analysis

Authors: Lecheng Zheng, Zhengzhang Chen, **grui He, Haifeng Chen

Abstract: Effective root cause analysis (RCA) is vital for swiftly restoring services, minimizing losses, and ensuring the smooth operation and management of complex systems. Previous data-driven RCA methods, particularly those employing causal discovery techniques, have primarily focused on constructing dependency or causal graphs for backtracking the root causes. However, these methods often fall short as… ▽ More Effective root cause analysis (RCA) is vital for swiftly restoring services, minimizing losses, and ensuring the smooth operation and management of complex systems. Previous data-driven RCA methods, particularly those employing causal discovery techniques, have primarily focused on constructing dependency or causal graphs for backtracking the root causes. However, these methods often fall short as they rely solely on data from a single modality, thereby resulting in suboptimal solutions. In this work, we propose Mulan, a unified multi-modal causal structure learning method for root cause localization. We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data. To explore intricate relationships across different modalities, we propose a contrastive learning-based approach to extract modality-invariant and modality-specific representations within a shared latent space. Additionally, we introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph. Finally, we employ random walk with restart to simulate system fault propagation and identify potential root causes. Extensive experiments on three real-world datasets validate the effectiveness of our proposed framework. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted by the Web Conference 2024

arXiv:2402.00152 [pdf, other]

Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss

Authors: Yahong Yang, Juncai He

Abstract: Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization e… ▽ More Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization error in Sobolev losses. Analytical investigations reveal that the architecture of a neural network can be significantly influenced by various factors, including the number of sample points, parameters within the neural networks, and the regularity of the loss function. Specifically, a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs. We ultimately apply this theory to address partial differential equations using deep Ritz and physics-informed neural network (PINN) methods, guiding the design of neural networks. △ Less

Submitted 12 May, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.10766, arXiv:2305.08466

MSC Class: 68T05

arXiv:2401.04933 [pdf, other]

Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection

Authors: Sicong Huang, Jiawei He, Kry Yik Chau Lui

Abstract: While likelihood is attractive in theory, its estimates by deep generative models (DGMs) are often broken in practice, and perform poorly for out of distribution (OOD) Detection. Various recent works started to consider alternative scores and achieved better performances. However, such recipes do not come with provable guarantees, nor is it clear that their choices extract sufficient information.… ▽ More While likelihood is attractive in theory, its estimates by deep generative models (DGMs) are often broken in practice, and perform poorly for out of distribution (OOD) Detection. Various recent works started to consider alternative scores and achieved better performances. However, such recipes do not come with provable guarantees, nor is it clear that their choices extract sufficient information. We attempt to change this by conducting a case study on variational autoencoders (VAEs). First, we introduce the likelihood path (LPath) principle, generalizing the likelihood principle. This narrows the search for informative summary statistics down to the minimal sufficient statistics of VAEs' conditional likelihoods. Second, introducing new theoretic tools such as nearly essential support, essential distance and co-Lipschitzness, we obtain non-asymptotic provable OOD detection guarantees for certain distillation of the minimal sufficient statistics. The corresponding LPath algorithm demonstrates SOTA performances, even using simple and small VAEs with poor likelihood estimates. To our best knowledge, this is the first provable unsupervised OOD method that delivers excellent empirical results, better than any other VAEs based techniques. We use the same model as \cite{xiao2020likelihood}, open sourced from: https://github.com/XavierXiao/Likelihood-Regret △ Less

Submitted 10 January, 2024; originally announced January 2024.

arXiv:2401.00085 [pdf, other]

A dimension reduction approach for loss valuation in credit risk modelling

Authors: Jian He, Asma Khedher, Peter Spreij

Abstract: This paper addresses the ``curse of dimensionality'' in the loss valuation of credit risk models. A dimension reduction methodology based on the Bayesian filter and smoother is proposed. This methodology is designed to achieve a fast and accurate loss valuation algorithm in credit risk modelling, but it can also be extended to valuation models of other risk types. The proposed methodology is gener… ▽ More This paper addresses the ``curse of dimensionality'' in the loss valuation of credit risk models. A dimension reduction methodology based on the Bayesian filter and smoother is proposed. This methodology is designed to achieve a fast and accurate loss valuation algorithm in credit risk modelling, but it can also be extended to valuation models of other risk types. The proposed methodology is generic, robust and can easily be implemented. Moreover, the accuracy of the proposed methodology in the estimation of expected loss and value-at-risk is illustrated by numerical experiments. The results suggest that, compared to the currently most used PCA approach, the proposed methodology provides more accurate estimation of expected loss and value-at-risk of a loss distribution. △ Less

Submitted 29 December, 2023; originally announced January 2024.

Comments: 43 pages

MSC Class: 62P05; 91G40

arXiv:2312.07145 [pdf, other]

Contextual Bandits with Online Neural Regression

Authors: Rohan Deb, Yikun Ban, Shiliang Zuo, **grui He, Arindam Banerjee

Abstract: Recent works have shown a reduction from contextual bandits to online regression under a realizability assumption [Foster and Rakhlin, 2020, Foster and Krishnamurthy, 2021]. In this work, we investigate the use of neural networks for such online regression and associated Neural Contextual Bandits (NeuCBs). Using existing results for wide networks, one can readily show a ${\mathcal{O}}(\sqrt{T})$ r… ▽ More Recent works have shown a reduction from contextual bandits to online regression under a realizability assumption [Foster and Rakhlin, 2020, Foster and Krishnamurthy, 2021]. In this work, we investigate the use of neural networks for such online regression and associated Neural Contextual Bandits (NeuCBs). Using existing results for wide networks, one can readily show a ${\mathcal{O}}(\sqrt{T})$ regret for online regression with square loss, which via the reduction implies a ${\mathcal{O}}(\sqrt{K} T^{3/4})$ regret for NeuCBs. Departing from this standard approach, we first show a $\mathcal{O}(\log T)$ regret for online regression with almost convex losses that satisfy QG (Quadratic Growth) condition, a generalization of the PL (Polyak-Łojasiewicz) condition, and that have a unique minima. Although not directly applicable to wide networks since they do not have unique minima, we show that adding a suitable small random perturbation to the network predictions surprisingly makes the loss satisfy QG with unique minima. Based on such a perturbed prediction, we show a ${\mathcal{O}}(\log T)$ regret for online regression with both squared loss and KL loss, and subsequently convert these respectively to $\tilde{\mathcal{O}}(\sqrt{KT})$ and $\tilde{\mathcal{O}}(\sqrt{KL^*} + K)$ regret for NeuCB, where $L^*$ is the loss of the best policy. Separately, we also show that existing regret bounds for NeuCBs are $Ω(T)$ or assume i.i.d. contexts, unlike this work. Finally, our experimental results on various datasets demonstrate that our algorithms, especially the one based on KL loss, persistently outperform existing algorithms. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.15238 [pdf, other]

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Authors: Heyang Zhao, Jiafan He, Quanquan Gu

Abstract: The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a… ▽ More The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a monotonic value function structure with carefully controlled function class complexity, and (3) a variance-weighted regression scheme that exploits historical trajectories with high data efficiency. MQL-UCB achieves minimax optimal regret of $\tilde{O}(d\sqrt{HK})$ when $K$ is sufficiently large and near-optimal policy switching cost of $\tilde{O}(dH)$, with $d$ being the eluder dimension of the function class, $H$ being the planning horizon, and $K$ being the number of episodes. Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 52 pages, 1 table

arXiv:2311.01709 [pdf, other]

Causal inference with Machine Learning-Based Covariate Representation

Authors: Yuhang Wu, **ghai He, Zeyu Zheng

Abstract: Utilizing covariate information has been a powerful approach to improve the efficiency and accuracy for causal inference, which support massive amount of randomized experiments run on data-driven enterprises. However, state-of-art approaches can become practically unreliable when the dimension of covariate increases to just 50, whereas experiments on large platforms can observe even higher dimensi… ▽ More Utilizing covariate information has been a powerful approach to improve the efficiency and accuracy for causal inference, which support massive amount of randomized experiments run on data-driven enterprises. However, state-of-art approaches can become practically unreliable when the dimension of covariate increases to just 50, whereas experiments on large platforms can observe even higher dimension of covariate. We propose a machine-learning-assisted covariate representation approach that can effectively make use of historical experiment or observational data that are run on the same platform to understand which lower dimensions can effectively represent the higher-dimensional covariate. We then propose design and estimation methods with the covariate representation. We prove statistically reliability and performance guarantees for the proposed methods. The empirical performance is demonstrated using numerical experiments. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.01380 [pdf, other]

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

Authors: Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu

Abstract: Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation has been extensively studied with optimal results achieved under certain assumptions, many works shift their interest to offline RL with non-linear function app… ▽ More Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation has been extensively studied with optimal results achieved under certain assumptions, many works shift their interest to offline RL with non-linear function approximation. However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees. In this paper, we propose an oracle-efficient algorithm, dubbed Pessimistic Nonlinear Least-Square Value Iteration (PNLSVI), for offline RL with non-linear function approximation. Our algorithmic design comprises three innovative components: (1) a variance-based weighted regression scheme that can be applied to a wide range of function classes, (2) a subroutine for variance estimation, and (3) a planning phase that utilizes a pessimistic value iteration approach. Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation. Our work extends the previous instance-dependent results within simpler function classes, such as linear and differentiable function to a more general framework. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 43 pages, 1 table

arXiv:2308.11026 [pdf, other]

Harnessing The Collective Wisdom: Fusion Learning Using Decision Sequences From Diverse Sources

Authors: Trambak Banerjee, Bowen Gang, Jianliang He

Abstract: Learning from the collective wisdom of crowds enhances the transparency of scientific findings by incorporating diverse perspectives into the decision-making process. Synthesizing such collective wisdom is related to the statistical notion of fusion learning from multiple data sources or studies. However, fusing inferences from diverse sources is challenging since cross-source heterogeneity and po… ▽ More Learning from the collective wisdom of crowds enhances the transparency of scientific findings by incorporating diverse perspectives into the decision-making process. Synthesizing such collective wisdom is related to the statistical notion of fusion learning from multiple data sources or studies. However, fusing inferences from diverse sources is challenging since cross-source heterogeneity and potential data-sharing complicate statistical inference. Moreover, studies may rely on disparate designs, employ widely different modeling techniques for inferences, and prevailing data privacy norms may forbid sharing even summary statistics across the studies for an overall analysis. In this paper, we propose an Integrative Ranking and Thresholding (IRT) framework for fusion learning in multiple testing. IRT operates under the setting where from each study a triplet is available: the vector of binary accept-reject decisions on the tested hypotheses, the study-specific False Discovery Rate (FDR) level and the hypotheses tested by the study. Under this setting, IRT constructs an aggregated, nonparametric, and discriminatory measure of evidence against each null hypotheses, which facilitates ranking the hypotheses in the order of their likelihood of being rejected. We show that IRT guarantees an overall FDR control under arbitrary dependence between the evidence measures as long as the studies control their respective FDR at the desired levels. Furthermore, IRT synthesizes inferences from diverse studies irrespective of the underlying multiple testing algorithms employed by them. While the proofs of our theoretical statements are elementary, IRT is extremely flexible, and a comprehensive numerical study demonstrates that it is a powerful framework for pooling inferences. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 29 pages and 10 figures. Under review at a journal

arXiv:2305.19185 [pdf, other]

Compression with Bayesian Implicit Neural Representations

Authors: Zongyu Guo, Gergely Flamich, Jiajun He, Zhibo Chen, José Miguel Hernández-Lobato

Abstract: Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit… ▽ More Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit precision substantially degrades the reconstruction quality. To address this issue, we propose overfitting variational Bayesian neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it. This strategy enables direct optimization of the rate-distortion performance by minimizing the $β$-ELBO, and target different rate-distortion trade-offs for a given network architecture by adjusting $β$. Moreover, we introduce an iterative algorithm for learning prior weight distributions and employ a progressive refinement process for the variational posterior that significantly enhances performance. Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity. △ Less

Submitted 29 October, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: Accepted as a Spotlight paper in NeurIPS 2023. Updated camera-ready version

arXiv:2305.08359 [pdf, other]

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Authors: Kaixuan Ji, Qingyue Zhao, Jiafan He, Weitong Zhang, Quanquan Gu

Abstract: Recent studies have shown that episodic reinforcement learning (RL) is no harder than bandits when the total reward is bounded by $1$, and proved regret bounds that have a polylogarithmic dependence on the planning horizon $H$. However, it remains an open question that if such results can be carried over to adversarial RL, where the reward is adversarially chosen at each episode. In this paper, we… ▽ More Recent studies have shown that episodic reinforcement learning (RL) is no harder than bandits when the total reward is bounded by $1$, and proved regret bounds that have a polylogarithmic dependence on the planning horizon $H$. However, it remains an open question that if such results can be carried over to adversarial RL, where the reward is adversarially chosen at each episode. In this paper, we answer this question affirmatively by proposing the first horizon-free policy search algorithm. To tackle the challenges caused by exploration and adversarially chosen reward, our algorithm employs (1) a variance-uncertainty-aware weighted least square estimator for the transition kernel; and (2) an occupancy measure-based technique for the online search of a \emph{stochastic} policy. We show that our algorithm achieves an $\tilde{O}\big((d+\log (|\mathcal{S}|^2 |\mathcal{A}|))\sqrt{K}\big)$ regret with full-information feedback, where $d$ is the dimension of a known feature map** linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|\mathcal{S}|$ and $|\mathcal{A}|$ are the cardinalities of the state and action spaces. We also provide hardness results and regret lower bounds to justify the near optimality of our algorithm and the unavoidability of $\log|\mathcal{S}|$ and $\log|\mathcal{A}|$ in the regret bound. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 34 pages

arXiv:2305.08350 [pdf, other]

Uniform-PAC Guarantees for Model-Based RL with Bounded Eluder Dimension

Authors: Yue Wu, Jiafan He, Quanquan Gu

Abstract: Recently, there has been remarkable progress in reinforcement learning (RL) with general function approximation. However, all these works only provide regret or sample complexity guarantees. It is still an open question if one can achieve stronger performance guarantees, i.e., the uniform probably approximate correctness (Uniform-PAC) guarantee that can imply both a sub-linear regret bound and a p… ▽ More Recently, there has been remarkable progress in reinforcement learning (RL) with general function approximation. However, all these works only provide regret or sample complexity guarantees. It is still an open question if one can achieve stronger performance guarantees, i.e., the uniform probably approximate correctness (Uniform-PAC) guarantee that can imply both a sub-linear regret bound and a polynomial sample complexity for any target learning accuracy. We study this problem by proposing algorithms for both nonlinear bandits and model-based episodic RL using the general function class with a bounded eluder dimension. The key idea of the proposed algorithms is to assign each action to different levels according to its width with respect to the confidence set. The achieved uniform-PAC sample complexity is tight in the sense that it matches the state-of-the-art regret bounds or sample complexity guarantees when reduced to the linear case. To the best of our knowledge, this is the first work for uniform-PAC guarantees on bandit and RL that goes beyond linear cases. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 21 pages, 1 table. To appear in UAI 2023

arXiv:2303.09390 [pdf, other]

On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

Authors: Weitong Zhang, Jiafan He, Zhiyuan Fan, Quanquan Gu

Abstract: We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $ζ>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $ζ$ is dom… ▽ More We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $ζ>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $ζ$ is dominated by $\tilde O (Δ/ \sqrt{d})$ with $Δ$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O (d^2/Δ)$ as in the well-specified setting up to logarithmic factors. In addition, we show that an existing algorithm SupLinUCB (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap $Δ$. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when $ζ\leq \tilde O(Δ/ \sqrt{d})$; and (2) it is not efficiently learnable when $ζ\geq \tilde Ω(Δ / {\sqrt{d}})$. Experiments on both synthetic and real-world datasets corroborate our theoretical results. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 28 pages, 2 figures, 2 tables

arXiv:2303.03582 [pdf, other]

Statistical inferences for complex dependence of multimodal imaging data

Authors: **yuan Chang, **g He, Jian Kang, Mingcong Wu

Abstract: Statistical analysis of multimodal imaging data is a challenging task, since the data involves high-dimensionality, strong spatial correlations and complex data structures. In this paper, we propose rigorous statistical testing procedures for making inferences on the complex dependence of multimodal imaging data. Motivated by the analysis of multi-task fMRI data in the Human Connectome Project (HC… ▽ More Statistical analysis of multimodal imaging data is a challenging task, since the data involves high-dimensionality, strong spatial correlations and complex data structures. In this paper, we propose rigorous statistical testing procedures for making inferences on the complex dependence of multimodal imaging data. Motivated by the analysis of multi-task fMRI data in the Human Connectome Project (HCP) study, we particularly address three hypothesis testing problems: (a) testing independence among imaging modalities over brain regions, (b) testing independence between brain regions within imaging modalities, and (c) testing independence between brain regions across different modalities. Considering a general form for all the three tests, we develop a global testing procedure and a multiple testing procedure controlling the false discovery rate. We study theoretical properties of the proposed tests and develop a computationally efficient distributed algorithm. The proposed methods and theory are general and relevant for many statistical problems of testing independence structure among the components of high-dimensional random vectors with arbitrary dependence structures. We also illustrate our proposed methods via extensive simulations and analysis of five task fMRI contrast maps in the HCP study. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2302.10371 [pdf, other]

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

Authors: Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

Abstract: Recently, several studies (Zhou et al., 2021a; Zhang et al., 2021b; Kim et al., 2021; Zhou and Gu, 2022) have provided variance-dependent regret bounds for linear contextual bandits, which interpolates the regret for the worst-case regime and the deterministic reward regime. However, these algorithms are either computationally intractable or unable to handle unknown variance of the noise. In this… ▽ More Recently, several studies (Zhou et al., 2021a; Zhang et al., 2021b; Kim et al., 2021; Zhou and Gu, 2022) have provided variance-dependent regret bounds for linear contextual bandits, which interpolates the regret for the worst-case regime and the deterministic reward regime. However, these algorithms are either computationally intractable or unable to handle unknown variance of the noise. In this paper, we present a novel solution to this open problem by proposing the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $\tilde{O}(d \sqrt{\sum_{k = 1}^K σ_k^2} + d)$ regret, where $σ_k^2$ is the variance of the noise at the round $k$, $d$ is the dimension of the contexts and $K$ is the total number of rounds. Our results are based on an adaptive variance-aware confidence set enabled by a new Freedman-type concentration inequality for self-normalized martingales and a multi-layer structure to stratify the context vectors into different layers with different uniform upper bounds on the uncertainty. Furthermore, our approach can be extended to linear mixture Markov decision processes (MDPs) in reinforcement learning. We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs. Unlike existing nearly minimax optimal algorithms for linear mixture MDPs, our algorithm does not require explicit variance estimation of the transitional probabilities or the use of high-order moment estimators to attain horizon-free regret. We believe the techniques developed in this paper can have independent value for general online decision making problems. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: 43 pages, 2 tables

arXiv:2301.01107 [pdf]

Computing the Performance of A New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards

Authors: James K. He, Sofía S. Villar, Lida Mavrogonatou

Abstract: Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP tha… ▽ More Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards. △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: Accepted by Computing Conference, London 2023

arXiv:2212.06132 [pdf, ps, other]

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

Authors: Jiafan He, Heyang Zhao, Dongruo Zhou, Quanquan Gu

Abstract: We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature map**, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the d… ▽ More We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature map**, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the dimension of the feature map**, $H$ is the planning horizon, and $K$ is the number of episodes. Our algorithm is based on a weighted linear regression scheme with a carefully designed weight, which depends on a new variance estimator that (1) directly estimates the variance of the optimal value function, (2) monotonically decreases with respect to the number of episodes to ensure a better estimation accuracy, and (3) uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest. △ Less

Submitted 3 November, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

Comments: 33 pages, 1 table. In ICML 2023

arXiv:2212.01539 [pdf, other]

Exploring the Limits of Differentially Private Deep Learning with Group-wise Clip**

Authors: Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian

Abstract: Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clip**}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clip**}, where the… ▽ More Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clip**}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clip**}, where the gradient of each neural network layer is clipped separately, allows clip** to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many workflows of interest. While per-layer clip** with constant thresholds tends to underperform standard flat clip**, per-layer clip** with adaptive thresholds matches or outperforms flat clip** under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately fine-tune the 175 billion-parameter GPT-3. We bypass scaling challenges associated with clip** gradients that are distributed across multiple devices with \emph{per-device clip**} that clips the gradient of each model piece separately on its host device. Privately fine-tuning GPT-3 with per-device clip** achieves a task performance at $ε=1$ better than what is attainable by non-privately fine-tuning the largest GPT-2 on a summarization task. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: 25 pages

arXiv:2210.00423 [pdf, other]

Improved Algorithms for Neural Active Learning

Authors: Yikun Ban, Yuheng Zhang, Hanghang Tong, Arindam Banerjee, **grui He

Abstract: We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. In particular, we introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work. Then, the proposed algorithm leverages the powerful representation o… ▽ More We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. In particular, we introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work. Then, the proposed algorithm leverages the powerful representation of NNs for both exploitation and exploration, has the query decision-maker tailored for $k$-class classification problems with the performance guarantee, utilizes the full feedback, and updates parameters in a more practical and efficient manner. These careful designs lead to an instance-dependent regret upper bound, roughly improving by a multiplicative factor $O(\log T)$ and removing the curse of input dimensionality. Furthermore, we show that the algorithm can achieve the same performance as the Bayes-optimal classifier in the long run under the hard-margin setting in classification problems. In the end, we use extensive experiments to evaluate the proposed algorithm and SOTA baselines, to show the improved empirical performance. △ Less

Submitted 16 January, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

Comments: Published on NeurIPS 2022

arXiv:2209.06998 [pdf, other]

Stochastic Tree Ensembles for Estimating Heterogeneous Effects

Authors: Nikolay Krantsevich, **gyu He, P. Richard Hahn

Abstract: Determining subgroups that respond especially well (or poorly) to specific interventions (medical or policy) requires new supervised learning methods tailored specifically for causal inference. Bayesian Causal Forest (BCF) is a recent method that has been documented to perform well on data generating processes with strong confounding of the sort that is plausible in many applications. This paper d… ▽ More Determining subgroups that respond especially well (or poorly) to specific interventions (medical or policy) requires new supervised learning methods tailored specifically for causal inference. Bayesian Causal Forest (BCF) is a recent method that has been documented to perform well on data generating processes with strong confounding of the sort that is plausible in many applications. This paper develops a novel algorithm for fitting the BCF model, which is more efficient than the previously available Gibbs sampler. The new algorithm can be used to initialize independent chains of the existing Gibbs sampler leading to better posterior exploration and coverage of the associated interval estimates in simulation studies. The new algorithm is compared to related approaches via simulation studies as well as an empirical analysis. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: 12 pages, 1 figure

arXiv:2207.03106 [pdf, other]

A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits

Authors: Jiafan He, Tianhao Wang, Yifei Min, Quanquan Gu

Abstract: We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We propose a simple algorithm named \texttt{FedLin… ▽ More We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We propose a simple algorithm named \texttt{FedLinUCB} based on the principle of optimism. We prove that the regret of \texttt{FedLinUCB} is bounded by $\tilde{O}(d\sqrt{\sum_{m=1}^M T_m})$ and the communication complexity is $\tilde{O}(dM^2)$, where $d$ is the dimension of the contextual vector and $T_m$ is the total number of interactions with the environment by $m$-th agent. To the best of our knowledge, this is the first provably efficient algorithm that allows fully asynchronous communication for federated contextual linear bandits, while achieving the same regret guarantee as in the single-agent setting. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: 25 pages, 1 figure, 2 tables

arXiv:2206.03644 [pdf, other]

doi 10.1145/3534678.3539312

Neural Bandit with Arm Group Graph

Authors: Yunzhe Qi, Yikun Ban, **grui He

Abstract: Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based on their contextual information. Motivated by the fact that the arms usually exhibit group behaviors and the mutual impacts exist among groups, we introduce a new model, Arm Group Graph (AGG), where the nodes represent the groups of arms and the weighted edges formulate the correlations among group… ▽ More Contextual bandits aim to identify among a set of arms the optimal one with the highest reward based on their contextual information. Motivated by the fact that the arms usually exhibit group behaviors and the mutual impacts exist among groups, we introduce a new model, Arm Group Graph (AGG), where the nodes represent the groups of arms and the weighted edges formulate the correlations among groups. To leverage the rich information in AGG, we propose a bandit algorithm, AGG-UCB, where the neural networks are designed to estimate rewards, and we propose to utilize graph neural networks (GNN) to learn the representations of arm groups with correlations. To solve the exploitation-exploration dilemma in bandits, we derive a new upper confidence bound (UCB) built on neural networks (exploitation) for exploration. Furthermore, we prove that AGG-UCB can achieve a near-optimal regret bound with over-parameterized neural networks, and provide the convergence analysis of GNN with fully-connected layers which may be of independent interest. In the end, we conduct extensive experiments against state-of-the-art baselines on multiple public data sets, showing the effectiveness of the proposed algorithm. △ Less

Submitted 9 June, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

Comments: Accepted to SIGKDD 2022

arXiv:2205.06811 [pdf, other]

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

Authors: Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

Abstract: We study the linear contextual bandit problem in the presence of adversarial corruption, where the reward at each round is corrupted by an adversary, and the corruption level (i.e., the sum of corruption magnitudes over the horizon) is $C\geq 0$. The best-known algorithms in this setting are limited in that they either are computationally inefficient or require a strong assumption on the corruptio… ▽ More We study the linear contextual bandit problem in the presence of adversarial corruption, where the reward at each round is corrupted by an adversary, and the corruption level (i.e., the sum of corruption magnitudes over the horizon) is $C\geq 0$. The best-known algorithms in this setting are limited in that they either are computationally inefficient or require a strong assumption on the corruption, or their regret is at least $C$ times worse than the regret without corruption. In this paper, to overcome these limitations, we propose a new algorithm based on the principle of optimism in the face of uncertainty. At the core of our algorithm is a weighted ridge regression where the weight of each chosen action depends on its confidence up to some threshold. We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds. Thus, our algorithm is nearly optimal up to logarithmic factors for both cases. Notably, our algorithm achieves the near-optimal regret for both corrupted and uncorrupted cases ($C=0$) simultaneously. △ Less

Submitted 9 July, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

Comments: 25 pages, 1 table. This version simplifies the proof of the regret upper bound in Version 1, and provides a stronger result for the lower bound

arXiv:2204.10963 [pdf, other]

Local Gaussian process extrapolation for BART models with applications to causal inference

Authors: Meijiang Wang, **gyu He, P. Richard Hahn

Abstract: Bayesian additive regression trees (BART) is a semi-parametric regression model offering state-of-the-art performance on out-of-sample prediction. Despite this success, standard implementations of BART typically provide inaccurate prediction and overly narrow prediction intervals at points outside the range of the training data. This paper proposes a novel extrapolation strategy that grafts Gaussi… ▽ More Bayesian additive regression trees (BART) is a semi-parametric regression model offering state-of-the-art performance on out-of-sample prediction. Despite this success, standard implementations of BART typically provide inaccurate prediction and overly narrow prediction intervals at points outside the range of the training data. This paper proposes a novel extrapolation strategy that grafts Gaussian processes to the leaf nodes in BART for predicting points outside the range of the observed data. The new method is compared to standard BART implementations and recent frequentist resampling-based methods for predictive inference. We apply the new approach to a challenging problem from causal inference, wherein for some regions of predictor space, only treated or untreated units are observed (but not both). In simulation studies, the new approach boasts superior performance compared to popular alternatives, such as Jackknife+. △ Less

Submitted 24 February, 2023; v1 submitted 22 April, 2022; originally announced April 2022.

arXiv:2203.14140 [pdf]

doi 10.1016/j.atmosenv.2022.119244

Network of Low-cost Air Quality Sensor for Monitoring Indoor, Outdoor, and Personal PM2.5 Exposure in Seattle during the 2020 Wildfire Season

Authors: Jiayang He, Ching-Hsuan Huang, Nanhsun Yuan, Elena Austin, Edmund Seto, Igor Novosselov

Abstract: The increased frequency of wildfires in the Western United States has raised public concerns. Exposure to wildfire smoke has been linked to an increased risk of cancer and cardiorespiratory morbidity. Evidence-driven interventions can alleviate the adverse health impact of wildfire smoke. Public health guidance during wildfires is based on regional air quality data with limited spatiotemporal reso… ▽ More The increased frequency of wildfires in the Western United States has raised public concerns. Exposure to wildfire smoke has been linked to an increased risk of cancer and cardiorespiratory morbidity. Evidence-driven interventions can alleviate the adverse health impact of wildfire smoke. Public health guidance during wildfires is based on regional air quality data with limited spatiotemporal resolution. We demonstrate the use of a network of low-cost particulate matter (PM) sensors to gather indoor, outdoor, and personal PM2.5 exposure data from seven locations in the urban Seattle area, along with a personal exposure monitor worn by a resident living in one of these locations during the 2020 Washington wildfire event. The data were used to determine PM concentration indoor/outdoor (I/O) ratios, PM reduction, and personal exposure levels. The result shows that locations equipped with high-efficiency particulate air (HEPA) filters and HVAC filtration systems had significantly lower I/O ratios (median I/O = 0.43) than those without air filtration (median I/O = 0.82). The median PM2.5 reduction for the locations with HEPA is 58 % compared to 20% for the locations without HEPA. The outdoor PM sensors showed a high correlation to the nearby regional air quality monitoring stations (R2 = 0.93). The personal monitor showed high variance in PM measurements as the user moved through different microenvironments and could not be fully characterized by the network of indoor or outdoor monitors. The findings imply evidence-based interventions can be developed for reducing pollution exposure based on the combination of indoor, outdoor sensors. Personal exposure monitoring in individuals' breathing zones provided the highest fidelity data capturing temporal spikes in PM exposure. △ Less

Submitted 26 March, 2022; originally announced March 2022.

arXiv:2203.00614 [pdf, other]

Side Effects of Learning from Low-dimensional Data Embedded in a Euclidean Space

Authors: Juncai He, Richard Tsai, Rachel Ward

Abstract: The low-dimensional manifold hypothesis posits that the data found in many applications, such as those involving natural images, lie (approximately) on low-dimensional manifolds embedded in a high-dimensional Euclidean space. In this setting, a typical neural network defines a function that takes a finite number of vectors in the embedding space as input. However, one often needs to consider evalu… ▽ More The low-dimensional manifold hypothesis posits that the data found in many applications, such as those involving natural images, lie (approximately) on low-dimensional manifolds embedded in a high-dimensional Euclidean space. In this setting, a typical neural network defines a function that takes a finite number of vectors in the embedding space as input. However, one often needs to consider evaluating the optimized network at points outside the training distribution. This paper considers the case in which the training data is distributed in a linear subspace of $\mathbb R^d$. We derive estimates on the variation of the learning function, defined by a neural network, in the direction transversal to the subspace. We study the potential regularization effects associated with the network's depth and noise in the codimension of the data manifold. We also present additional side effects in training due to the presence of noise. △ Less

Submitted 4 February, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: 53 pages (11 pages for Appendix), 24 figures

arXiv:2202.13603 [pdf, other]

Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits

Authors: Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

Abstract: We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise. We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise. More specifically, for $σ$-sub-Gaussian label noise, our analysis provides a regret upper… ▽ More We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise. We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise. More specifically, for $σ$-sub-Gaussian label noise, our analysis provides a regret upper bound of $O(σ^2 d \log T) + o(\log T)$, where $d$ is the dimension of the input vector, $T$ is the total number of rounds. We also prove a $Ω(σ^2d\log(T/d))$ lower bound for stochastic online linear regression, which indicates that our upper bound is nearly optimal. In addition, we extend our analysis to a more refined Bernstein noise condition. As an application, we study generalized linear bandits with heteroscedastic noise and propose an algorithm based on FTRL to achieve the first variance-aware regret bound. △ Less

Submitted 27 March, 2023; v1 submitted 28 February, 2022; originally announced February 2022.

Comments: 27 pages, 3 figures. In this updated version, we have changed the paper title, added new theoretical results on the FTRL algorithm and mainly focused on stochastic online regression. Refer to arXiv:2202.13603v1 for the previous version, which contains more results on heteroscedastic nonlinear bandits

arXiv:2201.05759 [pdf, other]

doi 10.1145/3616855.3635844

FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes

Authors: Haonan Wang, Ziwei Wu, **grui He

Abstract: Most fair machine learning methods either highly rely on the sensitive information of the training samples or require a large modification on the target models, which hinders their practical application. To address this issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set (second stage) where the sample weights are computed to balance th… ▽ More Most fair machine learning methods either highly rely on the sensitive information of the training samples or require a large modification on the target models, which hinders their practical application. To address this issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set (second stage) where the sample weights are computed to balance the model performance across different demographic groups (first stage). FAIRIF can be applied on a wide range of models trained by stochastic gradient descent without changing the model, while only requiring group annotations on a small validation set to compute sample weights. Theoretically, we show that, in the classification setting, three notions of disparity among different groups can be mitigated by training with the weights. Experiments on synthetic data sets demonstrate that FAIRIF yields models with better fairness-utility trade-offs against various types of bias; and on real-world data sets, we show the effectiveness and scalability of FAIRIF. Moreover, as evidenced by the experiments with pretrained models, FAIRIF is able to alleviate the unfairness issue of pretrained models without hurting their performance. △ Less

Submitted 23 December, 2023; v1 submitted 15 January, 2022; originally announced January 2022.

arXiv:2201.01051 [pdf]

doi 10.1038/s41597-022-01836-y

Open Access Dataset for Electromyography based Multi-code Biometric Authentication

Authors: Ashirbad Pradhan, Jiayuan He, Ning Jiang

Abstract: Recently, surface electromyogram (EMG) has been proposed as a novel biometric trait for addressing some key limitations of current biometrics, such as spoofing and liveness. The EMG signals possess a unique characteristic: they are inherently different for individuals (biometrics), and they can be customized to realize multi-length codes or passwords (for example, by performing different gestures)… ▽ More Recently, surface electromyogram (EMG) has been proposed as a novel biometric trait for addressing some key limitations of current biometrics, such as spoofing and liveness. The EMG signals possess a unique characteristic: they are inherently different for individuals (biometrics), and they can be customized to realize multi-length codes or passwords (for example, by performing different gestures). However, current EMG-based biometric research has two critical limitations: 1) a small subject pool, compared to other more established biometric traits, and 2) limited to single-session or single-day data sets. In this study, forearm and wrist EMG data were collected from 43 participants over three different days with long separation while they performed static hand and wrist gestures. The multi-day biometric authentication resulted in a median EER of 0.017 for the forearm setup and 0.025 for the wrist setup, comparable to well-established biometric traits suggesting consistent performance over multiple days. The presented large-sample multi-day data set and findings could facilitate further research on EMG-based biometrics and other gesture recognition-based applications. △ Less

Submitted 5 January, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

Comments: manuscript for open access dataset (paper and appendix)

Journal ref: Sci Data 9, 733 (2022)

arXiv:2112.15423 [pdf, other]

doi 10.1093/jrsssb/qkac011

Modelling matrix time series via a tensor CP-decomposition

Authors: **yuan Chang, **g He, Lin Yang, Qiwei Yao

Abstract: We consider to model matrix time series based on a tensor CP-decomposition. Instead of using an iterative algorithm which is the standard practice for estimating CP-decompositions, we propose a new and one-pass estimation procedure based on a generalized eigenanalysis constructed from the serial dependence structure of the underlying process. To overcome the intricacy of solving a rank-reduced gen… ▽ More We consider to model matrix time series based on a tensor CP-decomposition. Instead of using an iterative algorithm which is the standard practice for estimating CP-decompositions, we propose a new and one-pass estimation procedure based on a generalized eigenanalysis constructed from the serial dependence structure of the underlying process. To overcome the intricacy of solving a rank-reduced generalized eigenequation, we propose a further refined approach which projects it into a lower-dimensional full-ranked eigenequation. This refined method improves significantly the finite-sample performance of the estimation. The asymptotic theory has been established under a general setting without the stationarity. It shows, for example, that all the component coefficient vectors in the CP-decomposition are estimated consistently with certain convergence rates. The proposed model and the estimation method are also illustrated with both simulated and real data; showing effective dimension-reduction in modelling and forecasting matrix time series. △ Less

Submitted 25 July, 2022; v1 submitted 31 December, 2021; originally announced December 2021.

Journal ref: Journal of the Royal Statistical Society Series B 2023, Vol. 85, pp. 127-148

arXiv:2110.12727 [pdf, other]

Learning Stochastic Shortest Path with Linear Function Approximation

Authors: Yifei Min, Jiafan He, Tianhao Wang, Quanquan Gu

Abstract: We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an… ▽ More We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an $\tilde{\mathcal{O}}(d B_{\star}^{1.5}\sqrt{K/c_{\min}})$ regret. Here $K$ is the number of episodes, $d$ is the dimension of the feature map** in the mixture model, $B_{\star}$ bounds the expected cumulative cost of the optimal policy, and $c_{\min}>0$ is the lower bound of the cost function. Our algorithm also applies to the case when $c_{\min} = 0$, and an $\tilde{\mathcal{O}}(K^{2/3})$ regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. Moreover, we design a refined Bernstein-type confidence set and propose an improved algorithm, which provably achieves an $\tilde{\mathcal{O}}(d B_{\star}\sqrt{K/c_{\min}})$ regret. In complement to the regret upper bounds, we also prove a lower bound of $Ω(dB_{\star} \sqrt{K})$. Hence, our improved algorithm matches the lower bound up to a $1/\sqrt{c_{\min}}$ factor and poly-logarithmic factors, achieving a near-optimal regret guarantee. △ Less

Submitted 5 July, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: 46 pages, 1 figure. In ICML 2022

arXiv:2110.10133 [pdf, other]

Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

Authors: Chonghua Liao, Jiafan He, Quanquan Gu

Abstract: Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel $(\varepsilon, δ)$-LDP algorithm for learning a class of… ▽ More Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel $(\varepsilon, δ)$-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains an $\tilde{\mathcal{O}}( d^{5/4}H^{7/4}T^{3/4}\left(\log(1/δ)\right)^{1/4}\sqrt{1/\varepsilon})$ regret, where $d$ is the dimension of feature map**, $H$ is the length of the planning horizon, and $T$ is the number of interactions with the environment. We also prove a lower bound $Ω(dH\sqrt{T}/\left(e^{\varepsilon}(e^{\varepsilon}-1)\right))$ for learning linear mixture MDPs under $\varepsilon$-LDP constraint. Experiments on synthetic datasets verify the effectiveness of our algorithm. To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: 25 pages, 2 figures

arXiv:2110.03833 [pdf, other]

A Maximum Weighted Logrank Test in Detecting Crossing Hazards

Authors: Huan Cheng, Jianghua He

Abstract: In practice, the logrank test is the most widely used method for testing the equality of survival distributions. It is the optimal method under the proportional hazard assumption. However, since non-proportional hazards are often encountered in oncology trials, alternative tests have been proposed. The maximum weighted logrank test was shown to be robust in general situations. In this manuscript,… ▽ More In practice, the logrank test is the most widely used method for testing the equality of survival distributions. It is the optimal method under the proportional hazard assumption. However, since non-proportional hazards are often encountered in oncology trials, alternative tests have been proposed. The maximum weighted logrank test was shown to be robust in general situations. In this manuscript, we propose a new maximum test that incorporates the weight for detecting crossing hazards. The new weight is a function of the crossing time-point. Extensive simulation studies are conducted to compare our methods with other methods proposed in the literature under scenarios with various hazard ratio patterns, sample sizes, censoring rates, and censoring patterns. For crossing hazards, the proposed test is shown to be the most powerful one with a known crossing time-point. It has a similar performance as the Maxcombo test in the misspecified crossing time-point scenario. Under other alternative situations, the new test remains comparatively powerful as the Maxcombo test. Finally, we illustrate the test in a real data example and discuss the procedures to extend the test to detect crossing hazards specifically. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: 22 pages, 6 figures

arXiv:2110.03177 [pdf, other]

EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits

Authors: Yikun Ban, Yuchen Yan, Arindam Banerjee, **grui He

Abstract: In this paper, we propose a novel neural exploration strategy in contextual bandits, EE-Net, distinct from the standard UCB-based and TS-based approaches. Contextual multi-armed bandits have been studied for decades with various applications. To solve the exploitation-exploration tradeoff in bandits, there are three main techniques: epsilon-greedy, Thompson Sampling (TS), and Upper Confidence Boun… ▽ More In this paper, we propose a novel neural exploration strategy in contextual bandits, EE-Net, distinct from the standard UCB-based and TS-based approaches. Contextual multi-armed bandits have been studied for decades with various applications. To solve the exploitation-exploration tradeoff in bandits, there are three main techniques: epsilon-greedy, Thompson Sampling (TS), and Upper Confidence Bound (UCB). In recent literature, linear contextual bandits have adopted ridge regression to estimate the reward function and combine it with TS or UCB strategies for exploration. However, this line of works explicitly assumes the reward is based on a linear function of arm vectors, which may not be true in real-world datasets. To overcome this challenge, a series of neural bandit algorithms have been proposed, where a neural network is used to learn the underlying reward function and TS or UCB are adapted for exploration. Instead of calculating a large-deviation based statistical bound for exploration like previous methods, we propose "EE-Net", a novel neural-based exploration strategy. In addition to using a neural network (Exploitation network) to learn the reward function, EE-Net uses another neural network (Exploration network) to adaptively learn potential gains compared to the currently estimated reward for exploration. Then, a decision-maker is constructed to combine the outputs from the Exploitation and Exploration networks. We prove that EE-Net can achieve $\mathcal{O}(\sqrt{T\log T})$ regret and show that EE-Net outperforms existing linear and neural contextual bandit baselines on real-world datasets. △ Less

Submitted 12 May, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: Published on ICLR 2022

arXiv:2109.00190 [pdf, ps, other]

doi 10.1007/s40687-022-00336-0

Approximation Properties of Deep ReLU CNNs

Authors: Juncai He, Lin Li, **chao Xu

Abstract: This paper focuses on establishing $L^2$ approximation properties for deep ReLU convolutional neural networks (CNNs) in two-dimensional space. The analysis is based on a decomposition theorem for convolutional kernels with a large spatial size and multi-channels. Given the decomposition result, the property of the ReLU activation function, and a specific structure for channels, a universal approxi… ▽ More This paper focuses on establishing $L^2$ approximation properties for deep ReLU convolutional neural networks (CNNs) in two-dimensional space. The analysis is based on a decomposition theorem for convolutional kernels with a large spatial size and multi-channels. Given the decomposition result, the property of the ReLU activation function, and a specific structure for channels, a universal approximation theorem of deep ReLU CNNs with classic structure is obtained by showing its connection with one-hidden-layer ReLU neural networks (NNs). Furthermore, approximation properties are obtained for one version of neural networks with ResNet, pre-act ResNet, and MgNet architecture based on connections between these networks. △ Less

Submitted 26 June, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

Comments: 30 pages

MSC Class: 41A30; 68T07; 65D40

Journal ref: Research in the Mathematical Sciences, 2022

arXiv:2106.11935 [pdf, other]

Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL

Authors: Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu

Abstract: The success of deep reinforcement learning (DRL) lies in its ability to learn a representation that is well-suited for the exploration and exploitation task. To understand how the choice of representation can improve the efficiency of reinforcement learning (RL), we study representation selection for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represente… ▽ More The success of deep reinforcement learning (DRL) lies in its ability to learn a representation that is well-suited for the exploration and exploitation task. To understand how the choice of representation can improve the efficiency of reinforcement learning (RL), we study representation selection for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose an efficient algorithm, called ReLEX, for representation learning in both online and offline RL. Specifically, we show that the online version of ReLEX, called ReLEX-UCB, always performs no worse than the state-of-the-art algorithm without representation selection, and achieves a strictly better constant regret if the representation function class has a "coverage" property over the entire state-action space. For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity. This is the first result with constant sample complexity for representation learning in offline RL. △ Less

Submitted 14 February, 2024; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: 32 pages, 2 figures, 7 tables, In UAI 2023

arXiv:2106.11612 [pdf, ps, other]

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

Authors: Jiafan He, Dongruo Zhou, Quanquan Gu

Abstract: We study reinforcement learning (RL) with linear function approximation. Existing algorithms for this problem only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees, which cannot guarantee the convergence to the optimal policy. In this paper, in order to overcome the limitation of existing algorithms, we propose a new algorithm called FLUTE, whic… ▽ More We study reinforcement learning (RL) with linear function approximation. Existing algorithms for this problem only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees, which cannot guarantee the convergence to the optimal policy. In this paper, in order to overcome the limitation of existing algorithms, we propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability. The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation. At the core of our algorithm is a novel minimax value function estimator and a multi-level partition scheme to select the training samples from historical observations. Both of these techniques are new and of independent interest. △ Less

Submitted 31 December, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: 27 pages. In NeurIPS 2021

arXiv:2106.01906

Bayesian Inference for Gamma Models

Authors: **gyu He, Nicholas Polson, Jianeng Xu

Abstract: We use the theory of normal variance-mean mixtures to derive a data augmentation scheme for models that include gamma functions. Our methodology applies to many situations in statistics and machine learning, including Multinomial-Dirichlet distributions, Negative binomial regression, Poisson-Gamma hierarchical models, Extreme value models, to name but a few. All of those models include a gamma fun… ▽ More We use the theory of normal variance-mean mixtures to derive a data augmentation scheme for models that include gamma functions. Our methodology applies to many situations in statistics and machine learning, including Multinomial-Dirichlet distributions, Negative binomial regression, Poisson-Gamma hierarchical models, Extreme value models, to name but a few. All of those models include a gamma function which does not admit a natural conjugate prior distribution providing a significant challenge to inference and prediction. To provide a data augmentation strategy, we construct and develop the theory of the class of Exponential Reciprocal Gamma distributions. This allows scalable EM and MCMC algorithms to be developed. We illustrate our methodology on a number of examples, including gamma shape inference, negative binomial regression and Dirichlet allocation. Finally, we conclude with directions for future research. △ Less

Submitted 21 June, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: Duplicate submission of arXiv:1905.12141 Please check arXiv:1905.12141 for future update

arXiv:2102.12679 [pdf, other]

Variational Selective Autoencoder: Learning from Partially-Observed Heterogeneous Data

Authors: Yu Gong, Hossein Hajimirsadeghi, Jiawei He, Thibaut Durand, Greg Mori

Abstract: Learning from heterogeneous data poses challenges such as combining data from various sources and of different types. Meanwhile, heterogeneous data are often associated with missingness in real-world applications due to heterogeneity and noise of input sources. In this work, we propose the variational selective autoencoder (VSAE), a general framework to learn representations from partially-observe… ▽ More Learning from heterogeneous data poses challenges such as combining data from various sources and of different types. Meanwhile, heterogeneous data are often associated with missingness in real-world applications due to heterogeneity and noise of input sources. In this work, we propose the variational selective autoencoder (VSAE), a general framework to learn representations from partially-observed heterogeneous data. VSAE learns the latent dependencies in heterogeneous data by modeling the joint distribution of observed data, unobserved data, and the imputation mask which represents how the data are missing. It results in a unified model for various downstream tasks including data generation and imputation. Evaluation on both low-dimensional and high-dimensional heterogeneous datasets for these two tasks shows improvement over state-of-the-art models. △ Less

Submitted 24 February, 2021; originally announced February 2021.

Comments: International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

arXiv:2102.08940 [pdf, other]

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Authors: Jiafan He, Dongruo Zhou, Quanquan Gu

Abstract: Learning Markov decision processes (MDPs) in the presence of the adversary is a challenging problem in reinforcement learning (RL). In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature map**, and the reward function can change arbitrarily episode by episode. We… ▽ More Learning Markov decision processes (MDPs) in the presence of the adversary is a challenging problem in reinforcement learning (RL). In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature map**, and the reward function can change arbitrarily episode by episode. We propose an optimistic policy optimization algorithm POWERS and show that it can achieve $\tilde{O}(dH\sqrt{T})$ regret, where $H$ is the length of the episode, $T$ is the number of interactions with the MDP, and $d$ is the dimension of the feature map**. Furthermore, we also prove a matching lower bound of $\tildeΩ(dH\sqrt{T})$ up to logarithmic factors. Our key technical contributions are two-fold: (1) a new value function estimator based on importance weighting; and (2) a tighter confidence set for the transition kernel. They together lead to the nearly minimax optimal regret. △ Less

Submitted 20 April, 2022; v1 submitted 17 February, 2021; originally announced February 2021.

Comments: 22 pages, 1 figure. In AISTATS 2022

arXiv:2102.06539 [pdf, other]

Jacobian Determinant of Normalizing Flows

Authors: Huadong Liao, Jiawei He

Abstract: Normalizing flows learn a diffeomorphic map** between the target and base distribution, while the Jacobian determinant of that map** forms another real-valued function. In this paper, we show that the Jacobian determinant map** is unique for the given distributions, hence the likelihood objective of flows has a unique global optimum. In particular, the likelihood for a class of flows is expl… ▽ More Normalizing flows learn a diffeomorphic map** between the target and base distribution, while the Jacobian determinant of that map** forms another real-valued function. In this paper, we show that the Jacobian determinant map** is unique for the given distributions, hence the likelihood objective of flows has a unique global optimum. In particular, the likelihood for a class of flows is explicitly expressed by the eigenvalues of the auto-correlation matrix of individual data point, and independent of the parameterization of neural network, which provides a theoretical optimal value of likelihood objective and relates to probabilistic PCA. Additionally, Jacobian determinant is a measure of local volume change and is maximized when MLE is used for optimization. To stabilize normalizing flows training, it is required to maintain a balance between the expansiveness and contraction of volume, meaning Lipschitz constraint on the diffeomorphic map** and its inverse. With these theoretical results, several principles of designing normalizing flow were proposed. And numerical experiments on highdimensional datasets (such as CelebA-HQ 1024x1024) were conducted to show the improved stability of training. △ Less

Submitted 17 February, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: 14 pages

Showing 1–50 of 108 results for author: He, J