Search | arXiv e-print repository

Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps

Authors: Junchi Yang, Murat Yildirim, Qiu Feng

Abstract: In distributed machine learning, efficient training across multiple agents with different data distributions poses significant challenges. Even with a centralized coordinator, current algorithms that achieve optimal communication complexity typically require either large minibatches or compromise on gradient complexity. In this work, we tackle both centralized and decentralized settings across str… ▽ More In distributed machine learning, efficient training across multiple agents with different data distributions poses significant challenges. Even with a centralized coordinator, current algorithms that achieve optimal communication complexity typically require either large minibatches or compromise on gradient complexity. In this work, we tackle both centralized and decentralized settings across strongly convex, convex, and nonconvex objectives. We first demonstrate that a basic primal-dual method, (Accelerated) Gradient Ascent Multiple Stochastic Gradient Descent (GA-MSGD), applied to the Lagrangian of distributed optimization inherently incorporates local updates, because the inner loops of running Stochastic Gradient Descent on the primal variable require no inter-agent communication. Notably, for strongly convex objectives, we show (Accelerated) GA-MSGD achieves linear convergence in communication rounds despite the Lagrangian being only linear in the dual variables. This is due to a unique structural property where the dual variable is confined to the span of the coupling matrix, rendering the dual problem strongly concave. When integrated with the Catalyst framework, our approach achieves nearly optimal communication complexity across various settings without the need for minibatches. Moreover, in stochastic decentralized problems, it attains communication complexities comparable to those in deterministic settings, improving over existing algorithms. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.12588 [pdf, other]

UIFV: Data Reconstruction Attack in Vertical Federated Learning

Authors: Jirui Yang, Peng Chen, Zhihui Lu, Qiang Duan, Yubing Bao

Abstract: Vertical Federated Learning (VFL) facilitates collaborative machine learning without the need for participants to share raw private data. However, recent studies have revealed privacy risks where adversaries might reconstruct sensitive features through data leakage during the learning process. Although data reconstruction methods based on gradient or model information are somewhat effective, they… ▽ More Vertical Federated Learning (VFL) facilitates collaborative machine learning without the need for participants to share raw private data. However, recent studies have revealed privacy risks where adversaries might reconstruct sensitive features through data leakage during the learning process. Although data reconstruction methods based on gradient or model information are somewhat effective, they reveal limitations in VFL application scenarios. This is because these traditional methods heavily rely on specific model structures and/or have strict limitations on application scenarios. To address this, our study introduces the Unified InverNet Framework into VFL, which yields a novel and flexible approach (dubbed UIFV) that leverages intermediate feature data to reconstruct original data, instead of relying on gradients or model details. The intermediate feature data is the feature exchanged by different participants during the inference phase of VFL. Experiments on four datasets demonstrate that our methods significantly outperform state-of-the-art techniques in attack precision. Our work exposes severe privacy vulnerabilities within VFL systems that pose real threats to practical VFL applications and thus confirms the necessity of further enhancing privacy protection in the VFL architecture. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.10262 [pdf, other]

Fast solution to the fair ranking problem using the Sinkhorn algorithm

Authors: Yuki Uehara, Shunnosuke Ikeda, Naoki Nishimura, Koya Ohashi, Yilin Li, Jie Yang, Deddy Jobson, Xingxia Zha, Takeshi Matsumoto, Noriyoshi Sukegawa, Yuichi Takano

Abstract: In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impac… ▽ More In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impact-based fair ranking method for maximizing the Nash social welfare based on fair division; however, this method, which requires solving a large-scale constrained nonlinear optimization problem, is very difficult to apply to practical-scale recommender systems. We thus propose a fast solution to the impact-based fair ranking problem. We first transform the fair ranking problem into an unconstrained optimization problem and then design a gradient ascent method that repeatedly executes the Sinkhorn algorithm. Experimental results demonstrate that our algorithm provides fair rankings of high quality and is about 1000 times faster than application of commercial optimization software. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.09253 [pdf, other]

Deep Sketched Output Kernel Regression for Structured Prediction

Authors: Tamim El Ahmad, Junjie Yang, Pierre Laforgue, Florence d'Alché-Buc

Abstract: By leveraging the kernel trick in the output space, kernel-induced losses provide a principled way to define structured output prediction tasks for a wide variety of output modalities. In particular, they have been successfully used in the context of surrogate non-parametric regression, where the kernel trick is typically exploited in the input space as well. However, when inputs are images or tex… ▽ More By leveraging the kernel trick in the output space, kernel-induced losses provide a principled way to define structured output prediction tasks for a wide variety of output modalities. In particular, they have been successfully used in the context of surrogate non-parametric regression, where the kernel trick is typically exploited in the input space as well. However, when inputs are images or texts, more expressive models such as deep neural networks seem more suited than non-parametric methods. In this work, we tackle the question of how to train neural networks to solve structured output prediction tasks, while still benefiting from the versatility and relevance of kernel-induced losses. We design a novel family of deep neural architectures, whose last layer predicts in a data-dependent finite-dimensional subspace of the infinite-dimensional output feature space deriving from the kernel-induced loss. This subspace is chosen as the span of the eigenfunctions of a randomly-approximated version of the empirical kernel covariance operator. Interestingly, this approach unlocks the use of gradient descent algorithms (and consequently of any neural architecture) for structured prediction. Experiments on synthetic tasks as well as real-world supervised graph prediction problems show the relevance of our method. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.13785 [pdf, other]

Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

Authors: Shifan Zhao, Jiaying Lu, Ji Yang, Edmond Chow, Yuanzhe Xi

Abstract: Gaussian Process Regression (GPR) is widely used in statistics and machine learning for prediction tasks requiring uncertainty measures. Its efficacy depends on the appropriate specification of the mean function, covariance kernel function, and associated hyperparameters. Severe misspecifications can lead to inaccurate results and problematic consequences, especially in safety-critical application… ▽ More Gaussian Process Regression (GPR) is widely used in statistics and machine learning for prediction tasks requiring uncertainty measures. Its efficacy depends on the appropriate specification of the mean function, covariance kernel function, and associated hyperparameters. Severe misspecifications can lead to inaccurate results and problematic consequences, especially in safety-critical applications. However, a systematic approach to handle these misspecifications is lacking in the literature. In this work, we propose a general framework to address these issues. Firstly, we introduce a flexible two-stage GPR framework that separates mean prediction and uncertainty quantification (UQ) to prevent mean misspecification, which can introduce bias into the model. Secondly, kernel function misspecification is addressed through a novel automatic kernel search algorithm, supported by theoretical analysis, that selects the optimal kernel from a candidate set. Additionally, we propose a subsampling-based warm-start strategy for hyperparameter initialization to improve efficiency and avoid hyperparameter misspecification. With much lower computational cost, our subsampling-based strategy can yield competitive or better performance than training exclusively on the full dataset. Combining all these components, we recommend two GPR methods-exact and scalable-designed to match available computational resources and specific UQ requirements. Extensive evaluation on real-world datasets, including UCI benchmarks and a safety-critical medical case study, demonstrates the robustness and precision of our methods. △ Less

Submitted 22 May, 2024; originally announced May 2024.

ACM Class: G.3; J.3

arXiv:2405.11046 [pdf, other]

Temporal and spatial downscaling for solar radiation

Authors: Maggie Bailey, Doug Nychka, Manajit Sengupta, Jaemo Yang, Soutir Bandyopadhyay

Abstract: Global and regional climate model projections are useful for gauging future patterns of climate variables, including solar radiation, but data from these models is often too coarse to assess local impacts. Within the context of solar radiation, the changing climate may have an effect on photovoltaic (PV) production, especially as the PV industry moves to extend plant lifetimes to 50 years. Predict… ▽ More Global and regional climate model projections are useful for gauging future patterns of climate variables, including solar radiation, but data from these models is often too coarse to assess local impacts. Within the context of solar radiation, the changing climate may have an effect on photovoltaic (PV) production, especially as the PV industry moves to extend plant lifetimes to 50 years. Predicting PV production while taking into account a changing climate requires data at a resolution that is useful for building PV plants. Although temporal and spatial downscaling of solar radiation data is widely studied, we present a novel method to downscale solar radiation data from daily averages to hourly profiles, while maintaining spatial correlation of parameters characterizing the diurnal profile of solar radiation. The method focuses on the use of a diurnal template which can be shifted and scaled according to the time or year and location and the use of thin plate splines for spatial downscaling. This analysis is applied to data from the National Solar Radiation Database housed at the National Renewable Energy Lab and a case study of the mentioned methods over several sub-regions of continental United States is presented. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 35 pages, 14 figures

arXiv:2405.08631 [pdf, other]

A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent

Authors: James Yang, Trevor Hastie

Abstract: We develop fast and scalable algorithms based on block-coordinate descent to solve the group lasso and the group elastic net for generalized linear models along a regularization path. Special attention is given when the loss is the usual least squares loss (Gaussian loss). We show that each block-coordinate update can be solved efficiently using Newton's method and further improved using an adapti… ▽ More We develop fast and scalable algorithms based on block-coordinate descent to solve the group lasso and the group elastic net for generalized linear models along a regularization path. Special attention is given when the loss is the usual least squares loss (Gaussian loss). We show that each block-coordinate update can be solved efficiently using Newton's method and further improved using an adaptive bisection method, solving these updates with a quadratic convergence rate. Our benchmarks show that our package adelie performs 3 to 10 times faster than the next fastest package on a wide array of both simulated and real datasets. Moreover, we demonstrate that our package is a competitive lasso solver as well, matching the performance of the popular lasso package glmnet. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.07791 [pdf, ps, other]

Decentralized Kernel Ridge Regression Based on Data-dependent Random Feature

Authors: Ruikai Yang, Fan He, Mingzhen He, Jie Yang, Xiaolin Huang

Abstract: Random feature (RF) has been widely used for node consistency in decentralized kernel ridge regression (KRR). Currently, the consistency is guaranteed by imposing constraints on coefficients of features, necessitating that the random features on different nodes are identical. However, in many applications, data on different nodes varies significantly on the number or distribution, which calls for… ▽ More Random feature (RF) has been widely used for node consistency in decentralized kernel ridge regression (KRR). Currently, the consistency is guaranteed by imposing constraints on coefficients of features, necessitating that the random features on different nodes are identical. However, in many applications, data on different nodes varies significantly on the number or distribution, which calls for adaptive and data-dependent methods that generate different RFs. To tackle the essential difficulty, we propose a new decentralized KRR algorithm that pursues consensus on decision functions, which allows great flexibility and well adapts data on nodes. The convergence is rigorously given and the effectiveness is numerically verified: by capturing the characteristics of the data on each node, while maintaining the same communication costs as other methods, we achieved an average regression accuracy improvement of 25.5\% across six real-world data sets. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.18980 [pdf, other]

The Impact of COVID-19 on Co-authorship and Economics Scholars' Productivity

Authors: Hanqiao Zhang, Joy D. Xiuyao Yang

Abstract: The COVID-19 pandemic has disrupted traditional academic collaboration patterns, prompting a unique opportunity to analyze the influence of peer effects and coauthorship dynamics on research output. Using a novel dataset, this paper endeavors to make a first cut at investigating the role of peer effects on the productivity of economics scholars, measured by the number of publications, in both pre-… ▽ More The COVID-19 pandemic has disrupted traditional academic collaboration patterns, prompting a unique opportunity to analyze the influence of peer effects and coauthorship dynamics on research output. Using a novel dataset, this paper endeavors to make a first cut at investigating the role of peer effects on the productivity of economics scholars, measured by the number of publications, in both pre-pandemic and pandemic times. Results show that peer effect is significant for the pre-pandemic time but not for the pandemic time. The findings contribute to our understanding of how research collaboration influences knowledge production and may help guide policies aimed at fostering collaboration and enhancing research productivity in the academic community. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.07457 [pdf, ps, other]

From Poisson Observations to Fitted Negative Binomial Distribution

Authors: Yingying Yang, Niloufar Dousti Mousavi, Zhou Yu, Jie Yang

Abstract: The Kolmogorov-Smirnov (KS) test has been widely used for testing whether a random sample comes from a specific distribution, possibly with estimated parameters. If the data come from a Poisson distribution, however, one can hardly tell that they do not come from a negative binomial distribution by running a KS test, even with a large sample size. In this paper, we rigorously justify that the KS t… ▽ More The Kolmogorov-Smirnov (KS) test has been widely used for testing whether a random sample comes from a specific distribution, possibly with estimated parameters. If the data come from a Poisson distribution, however, one can hardly tell that they do not come from a negative binomial distribution by running a KS test, even with a large sample size. In this paper, we rigorously justify that the KS test statistic converges to zero almost surely, as the sample size goes to infinity. To prove this result, we demonstrate a notable finding that in this case the maximum likelihood estimates (MLE) for the parameters of the negative binomial distribution converge to infinity and one, respectively and almost surely. Our result highlights a potential limitation of the KS test, as well as other tests based on empirical distribution functions (EDF), in efficiently identifying the true underlying distribution. Our findings and justifications also underscore the importance of careful interpretation and further investigation when identifying the most appropriate distributions in practice. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.01730 [pdf, other]

Asymptotics of Language Model Alignment

Authors: Joy Qi** Yang, Salman Salamatian, Ziteng Sun, Ananda Theertha Suresh, Ahmad Beirami

Abstract: Let $p$ denote a generative language model. Let $r$ denote a reward model that returns a scalar that captures the degree at which a draw from $p$ is preferred. The goal of language model alignment is to alter $p$ to a new distribution $φ$ that results in a higher expected reward while kee** $φ$ close to $p.$ A popular alignment method is the KL-constrained reinforcement learning (RL), which choo… ▽ More Let $p$ denote a generative language model. Let $r$ denote a reward model that returns a scalar that captures the degree at which a draw from $p$ is preferred. The goal of language model alignment is to alter $p$ to a new distribution $φ$ that results in a higher expected reward while kee** $φ$ close to $p.$ A popular alignment method is the KL-constrained reinforcement learning (RL), which chooses a distribution $φ_Δ$ that maximizes $E_{φ_Δ} r(y)$ subject to a relative entropy constraint $KL(φ_Δ|| p) \leq Δ.$ Another simple alignment method is best-of-$N$, where $N$ samples are drawn from $p$ and one with highest reward is selected. In this paper, we offer a closed-form characterization of the optimal KL-constrained RL solution. We demonstrate that any alignment method that achieves a comparable trade-off between KL divergence and reward must approximate the optimal KL-constrained RL solution in terms of relative entropy. To further analyze the properties of alignment methods, we introduce two simplifying assumptions: we let the language model be memoryless, and the reward model be linear. Although these assumptions may not reflect complex real-world scenarios, they enable a precise characterization of the asymptotic behavior of both the best-of-$N$ alignment, and the KL-constrained RL method, in terms of information-theoretic quantities. We prove that the reward of the optimal KL-constrained RL solution satisfies a large deviation principle, and we fully characterize its rate function. We also show that the rate of growth of the scaled cumulants of the reward is characterized by a proper Renyi cross entropy. Finally, we show that best-of-$N$ is asymptotically equivalent to KL-constrained RL solution by proving that their expected rewards are asymptotically equal, and concluding that the two distributions must be close in KL divergence. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2402.15127 [pdf, other]

Multi-Armed Bandits with Abstention

Authors: Junwen Yang, Tianyuan **, Vincent Y. F. Tan

Abstract: We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the stochastic instantaneous reward before observing it. When opting for abstention, the agent either suffers a fi… ▽ More We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the stochastic instantaneous reward before observing it. When opting for abstention, the agent either suffers a fixed regret or gains a guaranteed reward. Given this added layer of complexity, we ask whether we can develop efficient algorithms that are both asymptotically and minimax optimal. We answer this question affirmatively by designing and analyzing algorithms whose regrets meet their corresponding information-theoretic lower bounds. Our results offer valuable quantitative insights into the benefits of the abstention option, laying the groundwork for further exploration in other online decision-making problems with such an option. Numerical results further corroborate our theoretical findings. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: Preprint

arXiv:2402.09723 [pdf, other]

Efficient Prompt Optimization Through the Lens of Best Arm Identification

Authors: Chengshuai Shi, Kun Yang, Zihan Chen, Jundong Li, **g Yang, Cong Shen

Abstract: The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically finding good prompts, i.e., prompt optimization. Most existing works follow the scheme of selecting from a pre-generated pool of candidate prompts. However, these designs mainly focus on the generation strategy, while limited attention has been paid to the selection metho… ▽ More The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically finding good prompts, i.e., prompt optimization. Most existing works follow the scheme of selecting from a pre-generated pool of candidate prompts. However, these designs mainly focus on the generation strategy, while limited attention has been paid to the selection method. Especially, the cost incurred during the selection (e.g., accessing LLM and evaluating the responses) is rarely explicitly considered. To overcome this limitation, this work provides a principled framework, TRIPLE, to efficiently perform prompt selection under an explicit budget constraint. TRIPLE is built on a novel connection established between prompt optimization and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB); thus, it is capable of leveraging the rich toolbox from BAI-FB systematically and also incorporating unique characteristics of prompt optimization. Extensive experiments on multiple well-adopted tasks using various LLMs demonstrate the remarkable performance improvement of TRIPLE over baselines while satisfying the limited budget constraints. As an extension, variants of TRIPLE are proposed to efficiently select examples for few-shot prompts, also achieving superior empirical performance. △ Less

Submitted 30 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.04691 [pdf, ps, other]

Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces

Authors: Lei Shi, Jia-Qi Yang

Abstract: This study investigates leveraging stochastic gradient descent (SGD) to learn operators between general Hilbert spaces. We propose weak and strong regularity conditions for the target operator to depict its intrinsic structure and complexity. Under these conditions, we establish upper bounds for convergence rates of the SGD algorithm and conduct a minimax lower bound analysis, further illustrating… ▽ More This study investigates leveraging stochastic gradient descent (SGD) to learn operators between general Hilbert spaces. We propose weak and strong regularity conditions for the target operator to depict its intrinsic structure and complexity. Under these conditions, we establish upper bounds for convergence rates of the SGD algorithm and conduct a minimax lower bound analysis, further illustrating that our convergence analysis and regularity conditions quantitatively characterize the tractability of solving operator learning problems using the SGD algorithm. It is crucial to highlight that our convergence analysis is still valid for nonlinear operator learning. We show that the SGD estimator will converge to the best linear approximation of the nonlinear target operator. Moreover, applying our analysis to operator learning problems based on vector-valued and real-valued reproducing kernel Hilbert spaces yields new convergence results, thereby refining the conclusions of existing literature. △ Less

Submitted 13 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: 56 pages

arXiv:2402.02949 [pdf, other]

Kernel PCA for Out-of-Distribution Detection

Authors: Kun Fang, Qinghua Tao, Kexin Lv, Mingzhen He, Xiaolin Huang, Jie Yang

Abstract: Out-of-Distribution (OoD) detection is vital for the reliability of Deep Neural Networks (DNNs). Existing works have shown the insufficiency of Principal Component Analysis (PCA) straightforwardly applied on the features of DNNs in detecting OoD data from In-Distribution (InD) data. The failure of PCA suggests that the network features residing in OoD and InD are not well separated by simply proce… ▽ More Out-of-Distribution (OoD) detection is vital for the reliability of Deep Neural Networks (DNNs). Existing works have shown the insufficiency of Principal Component Analysis (PCA) straightforwardly applied on the features of DNNs in detecting OoD data from In-Distribution (InD) data. The failure of PCA suggests that the network features residing in OoD and InD are not well separated by simply proceeding in a linear subspace, which instead can be resolved through proper nonlinear map**s. In this work, we leverage the framework of Kernel PCA (KPCA) for OoD detection, seeking subspaces where OoD and InD features are allocated with significantly different patterns. We devise two feature map**s that induce non-linear kernels in KPCA to advocate the separability between InD and OoD data in the subspace spanned by the principal components. Given any test sample, the reconstruction error in such subspace is then used to efficiently obtain the detection result with $\mathcal{O}(1)$ time complexity in inference. Extensive empirical results on multiple OoD data sets and network structures verify the superiority of our KPCA-based detector in efficiency and efficacy with state-of-the-art OoD detection performances. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.01460 [pdf, other]

Deep conditional distribution learning via conditional Föllmer flow

Authors: **yuan Chang, Zhao Ding, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang

Abstract: We introduce an ordinary differential equation (ODE) based deep generative method for learning conditional distributions, named Conditional Föllmer Flow. Starting from a standard Gaussian distribution, the proposed flow could approximate the target conditional distribution very well when the time is close to 1. For effective implementation, we discretize the flow with Euler's method where we estim… ▽ More We introduce an ordinary differential equation (ODE) based deep generative method for learning conditional distributions, named Conditional Föllmer Flow. Starting from a standard Gaussian distribution, the proposed flow could approximate the target conditional distribution very well when the time is close to 1. For effective implementation, we discretize the flow with Euler's method where we estimate the velocity field nonparametrically using a deep neural network. Furthermore, we also establish the convergence result for the Wasserstein-2 distance between the distribution of the learned samples and the target conditional distribution, providing the first comprehensive end-to-end error analysis for conditional distribution learning via ODE flow. Our numerical experiments showcase its effectiveness across a range of scenarios, from standard nonparametric conditional density estimation problems to more intricate challenges involving image data, illustrating its superiority over various existing conditional density estimation methods. △ Less

Submitted 13 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: The original title of this paper is "Deep Conditional Generative Learning: Model and Error Analysis"

arXiv:2401.06403 [pdf, other]

Fourier analysis of spatial point processes

Authors: Junho Yang, Yongtao Guan

Abstract: In this article, we develop comprehensive frequency domain methods for estimating and inferring the second-order structure of spatial point processes. The main element here is on utilizing the discrete Fourier transform (DFT) of the point pattern and its tapered counterpart. Under second-order stationarity, we show that both the DFTs and the tapered DFTs are asymptotically jointly independent Gaus… ▽ More In this article, we develop comprehensive frequency domain methods for estimating and inferring the second-order structure of spatial point processes. The main element here is on utilizing the discrete Fourier transform (DFT) of the point pattern and its tapered counterpart. Under second-order stationarity, we show that both the DFTs and the tapered DFTs are asymptotically jointly independent Gaussian even when the DFTs share the same limiting frequencies. Based on these results, we establish an $α$-mixing central limit theorem for a statistic formulated as a quadratic form of the tapered DFT. As applications, we derive the asymptotic distribution of the kernel spectral density estimator and establish a frequency domain inferential method for parametric stationary point processes. For the latter, the resulting model parameter estimator is computationally tractable and yields meaningful interpretations even in the case of model misspecification. We investigate the finite sample performance of our estimator through simulations, considering scenarios of both correctly specified and misspecified models. Furthermore, we extend our proposed DFT-based frequency domain methods to a class of non-stationary spatial point processes. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.04535 [pdf, other]

Semi-Supervised Deep Sobolev Regression: Estimation, Variable Selection and Beyond

Authors: Zhao Ding, Chenguang Duan, Yuling Jiao, Jerry Zhijian Yang

Abstract: We propose SDORE, a semi-supervised deep Sobolev regressor, for the nonparametric estimation of the underlying regression function and its gradient. SDORE employs deep neural networks to minimize empirical risk with gradient norm regularization, allowing computation of the gradient norm on unlabeled data. We conduct a comprehensive analysis of the convergence rates of SDORE and establish a minimax… ▽ More We propose SDORE, a semi-supervised deep Sobolev regressor, for the nonparametric estimation of the underlying regression function and its gradient. SDORE employs deep neural networks to minimize empirical risk with gradient norm regularization, allowing computation of the gradient norm on unlabeled data. We conduct a comprehensive analysis of the convergence rates of SDORE and establish a minimax optimal rate for the regression function. Crucially, we also derive a convergence rate for the associated plug-in gradient estimator, even in the presence of significant domain shift. These theoretical findings offer valuable prior guidance for selecting regularization parameters and determining the size of the neural network, while showcasing the provable advantage of leveraging unlabeled data in semi-supervised learning. To the best of our knowledge, SDORE is the first provable neural network-based approach that simultaneously estimates the regression function and its gradient, with diverse applications including nonparametric variable selection and inverse problems. The effectiveness of SDORE is validated through an extensive range of numerical simulations and real data analysis. △ Less

Submitted 9 January, 2024; originally announced January 2024.

MSC Class: 62G05; 62G08; 65N21

arXiv:2401.02529 [pdf, other]

Simulation-based transition density approximation for the inference of SDE models

Authors: Xin Cai, **gyu Yang, Zhibao Li, Hongqiao Wang, Miao Huang

Abstract: Stochastic Differential Equations (SDEs) serve as a powerful modeling tool in various scientific domains, including systems science, engineering, and ecological science. While the specific form of SDEs is typically known for a given problem, certain model parameters remain unknown. Efficiently inferring these unknown parameters based on observations of the state in discrete time series represents… ▽ More Stochastic Differential Equations (SDEs) serve as a powerful modeling tool in various scientific domains, including systems science, engineering, and ecological science. While the specific form of SDEs is typically known for a given problem, certain model parameters remain unknown. Efficiently inferring these unknown parameters based on observations of the state in discrete time series represents a vital practical subject. The challenge arises in nonlinear SDEs, where maximum likelihood estimation of parameters is generally unfeasible due to the absence of closed-form expressions for transition and stationary probability density functions of the states. In response to this limitation, we propose a novel two-step parameter inference mechanism. This approach involves a global-search phase followed by a local-refining procedure. The global-search phase is dedicated to identifying the domain of high-value likelihood functions, while the local-refining procedure is specifically designed to enhance the surrogate likelihood within this localized domain. Additionally, we present two simulation-based approximations for the transition density, aiming to efficiently or accurately approximate the likelihood function. Numerical examples illustrate the efficacy of our proposed methodology in achieving posterior parameter estimation. △ Less

Submitted 25 February, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

MSC Class: 62M20

arXiv:2312.16260 [pdf, other]

Multinomial Link Models

Authors: Tianmeng Wang, Li** Tong, Jie Yang

Abstract: We propose a unified multinomial link model for analyzing categorical responses. It not only covers the existing multinomial logistic models and their extensions as special cases, but also includes new models that can incorporate the observations with NA or Unknown responses in the data analysis. We provide explicit formulae and detailed algorithms for finding the maximum likelihood estimates of t… ▽ More We propose a unified multinomial link model for analyzing categorical responses. It not only covers the existing multinomial logistic models and their extensions as special cases, but also includes new models that can incorporate the observations with NA or Unknown responses in the data analysis. We provide explicit formulae and detailed algorithms for finding the maximum likelihood estimates of the model parameters and computing the Fisher information matrix. Our algorithms solve the infeasibility issue of existing statistical software on estimating parameters of cumulative link models. The applications to real datasets show that the new models can fit the data significantly better, and the corresponding data analysis may correct the misleading conclusions due to missing responses. △ Less

Submitted 18 June, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: 39 pages, 5 figures

arXiv:2312.15489 [pdf, other]

Browsing behavior exposes identities on the Web

Authors: Marcos Oliveira, Junran Yang, Daniel Griffiths, Denis Bonnay, Juhi Kulshrestha

Abstract: How easy is it to uniquely identify a person based solely on their web browsing behavior? Here we show that when people navigate the Web, their online traces produce fingerprints that identify them. Merely the four most visited web domains are enough to identify 95% of the individuals. These digital fingerprints are stable and render high re-identifiability. We demonstrate that we can re-identify… ▽ More How easy is it to uniquely identify a person based solely on their web browsing behavior? Here we show that when people navigate the Web, their online traces produce fingerprints that identify them. Merely the four most visited web domains are enough to identify 95% of the individuals. These digital fingerprints are stable and render high re-identifiability. We demonstrate that we can re-identify 80% of the individuals in separate time slices of data. Such a privacy threat persists even with limited information about individuals' browsing behavior, reinforcing existing concerns around online privacy. △ Less

Submitted 14 June, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

Comments: 13 pages, 1 figure

arXiv:2312.15023 [pdf, other]

Federated Q-Learning: Linear Regret Speedup with Low Communication Cost

Authors: Zhong Zheng, Fengyu Gao, Lingzhou Xue, **g Yang

Abstract: In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. While linear speedup in the number of agents has been achieved for some metrics, such as convergence rate and sample com… ▽ More In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. While linear speedup in the number of agents has been achieved for some metrics, such as convergence rate and sample complexity, in similar settings, it is unclear whether it is possible to design a model-free algorithm to achieve linear regret speedup with low communication cost. We propose two federated Q-Learning algorithms termed as FedQ-Hoeffding and FedQ-Bernstein, respectively, and show that the corresponding total regrets achieve a linear speedup compared with their single-agent counterparts when the time horizon is sufficiently large, while the communication cost scales logarithmically in the total number of time steps $T$. Those results rely on an event-triggered synchronization mechanism between the agents and the server, a novel step size selection when the server aggregates the local estimates of the state-action values to form the global estimates, and a set of new concentration inequalities to bound the sum of non-martingale differences. This is the first work showing that linear regret speedup and logarithmic communication cost can be achieved by model-free algorithms in federated reinforcement learning. △ Less

Submitted 7 May, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: 51 pages

arXiv:2312.02447 [pdf, other]

Fast non-autoregressive inverse folding with discrete diffusion

Authors: John J. Yang, Jason Yim, Regina Barzilay, Tommi Jaakkola

Abstract: Generating protein sequences that fold into a intended 3D structure is a fundamental step in de novo protein design. De facto methods utilize autoregressive generation, but this eschews higher order interactions that could be exploited to improve inference speed. We describe a non-autoregressive alternative that performs inference using a constant number of calls resulting in a 23 times speed up w… ▽ More Generating protein sequences that fold into a intended 3D structure is a fundamental step in de novo protein design. De facto methods utilize autoregressive generation, but this eschews higher order interactions that could be exploited to improve inference speed. We describe a non-autoregressive alternative that performs inference using a constant number of calls resulting in a 23 times speed up without a loss in performance on the CATH benchmark. Conditioned on the 3D structure, we fine-tune ProteinMPNN to perform discrete diffusion with a purity prior over the index sampling order. Our approach gives the flexibility in trading off inference speed and accuracy by modulating the diffusion speed. Code: https://github.com/johnyang101/pmpnndiff △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: NeurIPS Machine learning for Stuctural Biology workshop

arXiv:2312.01260 [pdf, other]

Rethinking PGD Attack: Is Sign Function Necessary?

Authors: Junjie Yang, Tianlong Chen, Xuxi Chen, Zhangyang Wang, Yingbin Liang

Abstract: Neural networks have demonstrated success in various domains, yet their performance can be significantly degraded by even a small input perturbation. Consequently, the construction of such perturbations, known as adversarial attacks, has gained significant attention, many of which fall within "white-box" scenarios where we have full access to the neural network. Existing attack algorithms, such as… ▽ More Neural networks have demonstrated success in various domains, yet their performance can be significantly degraded by even a small input perturbation. Consequently, the construction of such perturbations, known as adversarial attacks, has gained significant attention, many of which fall within "white-box" scenarios where we have full access to the neural network. Existing attack algorithms, such as the projected gradient descent (PGD), commonly take the sign function on the raw gradient before updating adversarial inputs, thereby neglecting gradient magnitude information. In this paper, we present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance, as well as its caveat. We also interpret why previous attempts of directly using raw gradients failed. Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign. Specifically, we convert the constrained optimization problem into an unconstrained one, by introducing a new hidden variable of non-clipped perturbation that can move beyond the constraint. The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments, outperforming PGD and other competitors in various settings, without incurring any additional computational overhead. The codes is available in https://github.com/JunjieYang97/RGD. △ Less

Submitted 20 May, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.04408 [pdf, other]

Bayesian modelling of response to therapy and drug-sensitivity in acute lymphoblastic leukemia

Authors: Andrea Cremaschi, Wenjian Yang, Maria De Iorio, William E. Evans, Jun J. Yang, Gary L. Rosner

Abstract: Acute lymphoblastic leukemia (ALL) is a heterogeneous hematologic malignancy involving the abnormal proliferation of immature lymphocytes, accounting for most pediatric cancer cases. ALL management in children has seen great improvement in the last decades thanks to better understanding of the disease leading to improved treatment strategies evidenced through clinical trials. Commonly a first cour… ▽ More Acute lymphoblastic leukemia (ALL) is a heterogeneous hematologic malignancy involving the abnormal proliferation of immature lymphocytes, accounting for most pediatric cancer cases. ALL management in children has seen great improvement in the last decades thanks to better understanding of the disease leading to improved treatment strategies evidenced through clinical trials. Commonly a first course of chemotherapy (induction phase) is administered, followed by treatment with a combination of anti-leukemia drugs. A measure of the efficacy early in the course of therapy is minimal residual disease (MRD). MRD quantifies residual tumor cells and indicates the effectiveness of the treatment over the course of therapy. MRD positivity is defined for values of MRD greater than 0.01%, yielding left-censored observations. We propose a Bayesian model to study the relationship between patient features and MRD observed at two time points during the induction phase. Specifically, we model the observed MRD values via an auto-regressive model, accounting for left-censoring of the data and for the fact that some patients are already in remission after the induction phase. Patient characteristics are included in the model via linear regression terms. In particular, patient-specific drug sensitivity based on ex-vivo assays of patient samples is exploited to identify groups of subjects with similar profiles. We include this information as a covariate in the model for MRD. We adopt horseshoe priors for the regression coefficients to perform variable selection to identify important covariates. We fit the proposed approach to data from three prospective pediatric ALL clinical trials carried out at the St. Jude Children's Research Hospital. Our results highlight that drug sensitivity profiles and leukemic subtypes play an important role in the response to induction therapy as measured by serial MRD measures. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.03252 [pdf, other]

Parameter-Agnostic Optimization under Relaxed Smoothness

Authors: Florian Hübler, Junchi Yang, Xiang Li, Niao He

Abstract: Tuning hyperparameters, such as the stepsize, presents a major challenge of training machine learning models. To address this challenge, numerous adaptive optimization algorithms have been developed that achieve near-optimal complexities, even when stepsizes are independent of problem-specific parameters, provided that the loss function is $L$-smooth. However, as the assumption is relaxed to the m… ▽ More Tuning hyperparameters, such as the stepsize, presents a major challenge of training machine learning models. To address this challenge, numerous adaptive optimization algorithms have been developed that achieve near-optimal complexities, even when stepsizes are independent of problem-specific parameters, provided that the loss function is $L$-smooth. However, as the assumption is relaxed to the more realistic $(L_0, L_1)$-smoothness, all existing convergence results still necessitate tuning of the stepsize. In this study, we demonstrate that Normalized Stochastic Gradient Descent with Momentum (NSGD-M) can achieve a (nearly) rate-optimal complexity without prior knowledge of any problem parameter, though this comes at the cost of introducing an exponential term dependent on $L_1$ in the complexity. We further establish that this exponential term is inevitable to such schemes by introducing a theoretical framework of lower bounds tailored explicitly for parameter-agnostic algorithms. Interestingly, in deterministic settings, the exponential factor can be neutralized by employing Gradient Descent with a Backtracking Line Search. To the best of our knowledge, these findings represent the first parameter-agnostic convergence results under the generalized smoothness condition. Our empirical experiments further confirm our theoretical insights. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.19360 [pdf, other]

Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective

Authors: Yifei Wang, Liangchen Li, Jiansheng Yang, Zhouchen Lin, Yisen Wang

Abstract: Adversarial Training (AT) has become arguably the state-of-the-art algorithm for extracting robust features. However, researchers recently notice that AT suffers from severe robust overfitting problems, particularly after learning rate (LR) decay. In this paper, we explain this phenomenon by viewing adversarial training as a dynamic minimax game between the model trainer and the attacker. Specific… ▽ More Adversarial Training (AT) has become arguably the state-of-the-art algorithm for extracting robust features. However, researchers recently notice that AT suffers from severe robust overfitting problems, particularly after learning rate (LR) decay. In this paper, we explain this phenomenon by viewing adversarial training as a dynamic minimax game between the model trainer and the attacker. Specifically, we analyze how LR decay breaks the balance between the minimax game by empowering the trainer with a stronger memorization ability, and show such imbalance induces robust overfitting as a result of memorizing non-robust features. We validate this understanding with extensive experiments, and provide a holistic view of robust overfitting from the dynamics of both the two game players. This understanding further inspires us to alleviate robust overfitting by rebalancing the two players by either regularizing the trainer's capacity or improving the attack strength. Experiments show that the proposed ReBalanced Adversarial Training (ReBAT) can attain good robustness and does not suffer from robust overfitting even after very long training. Code is available at https://github.com/PKU-ML/ReBAT. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2310.17817 [pdf, other]

Bayesian imaging inverse problem with SA-Roundtrip prior via HMC-pCN sampler

Authors: Jiayu Qian, Yuanyuan Liu, **gya Yang, Qing** Zhou

Abstract: Bayesian inference with deep generative prior has received considerable interest for solving imaging inverse problems in many scientific and engineering fields. The selection of the prior distribution is learned from, and therefore an important representation learning of, available prior measurements. The SA-Roundtrip, a novel deep generative prior, is introduced to enable controlled sampling gene… ▽ More Bayesian inference with deep generative prior has received considerable interest for solving imaging inverse problems in many scientific and engineering fields. The selection of the prior distribution is learned from, and therefore an important representation learning of, available prior measurements. The SA-Roundtrip, a novel deep generative prior, is introduced to enable controlled sampling generation and identify the data's intrinsic dimension. This prior incorporates a self-attention structure within a bidirectional generative adversarial network. Subsequently, Bayesian inference is applied to the posterior distribution in the low-dimensional latent space using the Hamiltonian Monte Carlo with preconditioned Crank-Nicolson (HMC-pCN) algorithm, which is proven to be ergodic under specific conditions. Experiments conducted on computed tomography (CT) reconstruction with the MNIST and TomoPhantom datasets reveal that the proposed method outperforms state-of-the-art comparisons, consistently yielding a robust and superior point estimator along with precise uncertainty quantification. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.17759 [pdf, other]

Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization

Authors: Liang Zhang, Junchi Yang, Amin Karbasi, Niao He

Abstract: Algorithmic reproducibility measures the deviation in outputs of machine learning algorithms upon minor changes in the training process. Previous work suggests that first-order methods would need to trade-off convergence rate (gradient complexity) for better reproducibility. In this work, we challenge this perception and demonstrate that both optimal reproducibility and near-optimal convergence gu… ▽ More Algorithmic reproducibility measures the deviation in outputs of machine learning algorithms upon minor changes in the training process. Previous work suggests that first-order methods would need to trade-off convergence rate (gradient complexity) for better reproducibility. In this work, we challenge this perception and demonstrate that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems under various error-prone oracle settings. Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization. With the inexact gradient oracle, the near-optimal guarantees also hold for minimax optimization. Additionally, with the stochastic gradient oracle, we show that stochastic gradient descent ascent is optimal in terms of both reproducibility and gradient complexity. We believe our results contribute to an enhanced understanding of the reproducibility-convergence trade-off in the context of convex optimization. △ Less

Submitted 9 January, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023 Spotlight

arXiv:2310.16238 [pdf, other]

Efficient GPU-accelerated fitting of observational health-scaled stratified and time-varying Cox models

Authors: Jianxiao Yang, Martijn J. Schuemie, Marc A. Suchard

Abstract: The Cox proportional hazards model stands as a widely-used semi-parametric approach for survival analysis in medical research and many other fields. Numerous extensions of the Cox model have further expanded its versatility. Statistical computing challenges arise, however, when applying many of these extensions with the increasing complexity and volume of modern observational health datasets. To a… ▽ More The Cox proportional hazards model stands as a widely-used semi-parametric approach for survival analysis in medical research and many other fields. Numerous extensions of the Cox model have further expanded its versatility. Statistical computing challenges arise, however, when applying many of these extensions with the increasing complexity and volume of modern observational health datasets. To address these challenges, we demonstrate how to employ massive parallelization through graphics processing units (GPU) to enhance the scalability of the stratified Cox model, the Cox model with time-varying covariates, and the Cox model with time-varying coefficients. First we establish how the Cox model with time-varying coefficients can be transformed into the Cox model with time-varying covariates when using discrete time-to-event data. We then demonstrate how to recast both of these into a stratified Cox model and identify their shared computational bottleneck that results when evaluating the now segmented partial likelihood and its gradient with respect to regression coefficients at scale. These computations mirror a highly transformed segmented scan operation. While this bottleneck is not an immediately obvious target for multi-core parallelization, we convert it into an un-segmented operation to leverage the efficient many-core parallel scan algorithm. Our massively parallel implementation significantly accelerates model fitting on large-scale and high-dimensional Cox models with stratification or time-varying effect, delivering an order of magnitude speedup over traditional central processing unit-based implementations. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.13550 [pdf, other]

Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes

Authors: Ruiquan Huang, Yuan Cheng, **g Yang, Vincent Tan, Yingbin Liang

Abstract: In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures among multiple MDPs has been shown to yield significant benefits to the sample efficiency compared to single-task RL. In this paper, we investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs)… ▽ More In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures among multiple MDPs has been shown to yield significant benefits to the sample efficiency compared to single-task RL. In this paper, we investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs). The main challenge here is that the large and complex model space makes it hard to identify what types of common latent structure of multi-task PSRs can reduce the model complexity and improve sample efficiency. To this end, we posit a joint model class for tasks and use the notion of $η$-bracketing number to quantify its complexity; this number also serves as a general metric to capture the similarity of tasks and thus determines the benefit of multi-task over single-task RL. We first study upstream multi-task learning over PSRs, in which all tasks share the same observation and action spaces. We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSRs has a smaller $η$-bracketing number compared to that of individual single-task learning. We also provide several example multi-task PSRs with small $η$-bracketing numbers, which reap the benefits of multi-task learning. We further investigate downstream learning, in which the agent needs to learn a new target task that shares some commonalities with the upstream tasks via a similarity constraint. By exploiting the learned PSRs from the upstream, we develop a sample-efficient algorithm that provably finds a near-optimal policy. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.10393 [pdf, other]

Statistical and Causal Robustness for Causal Null Hypothesis Tests

Authors: Junhui Yang, Rohit Bhattacharya, You** Lee, Ted Westling

Abstract: Prior work applying semiparametric theory to causal inference has primarily focused on deriving estimators that exhibit statistical robustness under a prespecified causal model that permits identification of a desired causal parameter. However, a fundamental challenge is correct specification of such a model, which usually involves making untestable assumptions. Evidence factors is an approach to… ▽ More Prior work applying semiparametric theory to causal inference has primarily focused on deriving estimators that exhibit statistical robustness under a prespecified causal model that permits identification of a desired causal parameter. However, a fundamental challenge is correct specification of such a model, which usually involves making untestable assumptions. Evidence factors is an approach to combining hypothesis tests of a common causal null hypothesis under two or more candidate causal models. Under certain conditions, this yields a test that is valid if at least one of the underlying models is correct, which is a form of causal robustness. We propose a method of combining semiparametric theory with evidence factors. We develop a causal null hypothesis test based on joint asymptotic normality of K asymptotically linear semiparametric estimators, where each estimator is based on a distinct identifying functional derived from each of K candidate causal models. We show that this test provides both statistical and causal robustness in the sense that it is valid if at least one of the K proposed causal models is correct, while also allowing for slower than parametric rates of convergence in estimating nuisance functions. We demonstrate the effectiveness of our method via simulations and applications to the Framingham Heart Study and Wisconsin Longitudinal Study. △ Less

Submitted 29 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.06713 [pdf, other]

Interpretable Traffic Event Analysis with Bayesian Networks

Authors: Tong Yuan, Jian Yang, Zeyi Wen

Abstract: Although existing machine learning-based methods for traffic accident analysis can provide good quality results to downstream tasks, they lack interpretability which is crucial for this critical problem. This paper proposes an interpretable framework based on Bayesian Networks for traffic accident prediction. To enable the ease of interpretability, we design a dataset construction pipeline to feed… ▽ More Although existing machine learning-based methods for traffic accident analysis can provide good quality results to downstream tasks, they lack interpretability which is crucial for this critical problem. This paper proposes an interpretable framework based on Bayesian Networks for traffic accident prediction. To enable the ease of interpretability, we design a dataset construction pipeline to feed the traffic data into the framework while retaining the essential traffic data information. With a concrete case study, our framework can derive a Bayesian Network from a dataset based on the causal relationships between weather and traffic events across the United States. Consequently, our framework enables the prediction of traffic accidents with competitive accuracy while examining how the probability of these events changes under different conditions, thus illustrating transparent relationships between traffic and weather events. Additionally, the visualization of the network simplifies the analysis of relationships between different variables, revealing the primary causes of traffic accidents and ultimately providing a valuable reference for reducing traffic accidents. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 11 pages, 7 figures

MSC Class: 62F15 ACM Class: G.3

arXiv:2310.06333 [pdf, ps, other]

Learning bounded-degree polytrees with known skeleton

Authors: Davin Choo, Joy Qi** Yang, Arnab Bhattacharyya, Clément L. Canonne

Abstract: We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results… ▽ More We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight. △ Less

Submitted 21 January, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Fixed some typos. Added some discussions. Accepted to ALT 2024

arXiv:2310.00551 [pdf, other]

Derivative based global sensitivity analysis and its entropic link

Authors: Jiannan Yang

Abstract: Variance-based Sobol' sensitivity is one of the most well known measures in global sensitivity analysis (GSA). However, uncertainties with certain distributions, such as highly skewed distributions or those with a heavy tail, cannot be adequately characterised using the second central moment only. Entropy-based GSA can consider the entire probability density function, but its application has been… ▽ More Variance-based Sobol' sensitivity is one of the most well known measures in global sensitivity analysis (GSA). However, uncertainties with certain distributions, such as highly skewed distributions or those with a heavy tail, cannot be adequately characterised using the second central moment only. Entropy-based GSA can consider the entire probability density function, but its application has been limited because it is difficult to estimate. Here we present a novel derivative-based upper bound for conditional entropies, to efficiently rank uncertain variables and to work as a proxy for entropy-based total effect indices. To overcome the non-desirable issue of negativity for differential entropies as sensitivity indices, we discuss an exponentiation of the total effect entropy and its proxy. We found that the proposed new entropy proxy is equivalent to the proxy for variance-based GSA for linear functions with Gaussian inputs, but outperforms the latter for a river flood physics model with 8 inputs of different distributions. We expect the new entropy proxy to increase the variable screening power of derivative-based GSA and to complement Sobol'-index proxy for a more diverse type of distributions. △ Less

Submitted 9 May, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

Comments: 17 page, 4 figures, 8 tables

arXiv:2309.16604 [pdf, other]

Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein Distance

Authors: Junjie Yang, Matthieu Labeau, Florence d'Alché-Buc

Abstract: Pairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graph… ▽ More Pairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.12658 [pdf, other]

Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes

Authors: Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng

Abstract: Deep Gaussian Process (DGP) models offer a powerful nonparametric approach for Bayesian inference, but exact inference is typically intractable, motivating the use of various approximations. However, existing approaches, such as mean-field Gaussian assumptions, limit the expressiveness and efficacy of DGP models, while stochastic approximation can be computationally expensive. To tackle these chal… ▽ More Deep Gaussian Process (DGP) models offer a powerful nonparametric approach for Bayesian inference, but exact inference is typically intractable, motivating the use of various approximations. However, existing approaches, such as mean-field Gaussian assumptions, limit the expressiveness and efficacy of DGP models, while stochastic approximation can be computationally expensive. To tackle these challenges, we introduce Neural Operator Variational Inference (NOVI) for Deep Gaussian Processes. NOVI uses a neural generator to obtain a sampler and minimizes the Regularized Stein Discrepancy in L2 space between the generated distribution and true posterior. We solve the minimax problem using Monte Carlo estimation and subsampling stochastic optimization techniques. We demonstrate that the bias introduced by our method can be controlled by multiplying the Fisher divergence with a constant, which leads to robust error control and ensures the stability and precision of the algorithm. Our experiments on datasets ranging from hundreds to tens of thousands demonstrate the effectiveness and the faster convergence rate of the proposed method. We achieve a classification accuracy of 93.56 on the CIFAR10 dataset, outperforming SOTA Gaussian process methods. Furthermore, our method guarantees theoretically controlled prediction error for DGP models and demonstrates remarkable performance on various datasets. We are optimistic that NOVI has the potential to enhance the performance of deep Bayesian nonparametric models and could have significant implications for various practical applications △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.09367 [pdf, other]

ForLion: A New Algorithm for D-optimal Designs under General Parametric Statistical Models with Mixed Factors

Authors: Yifei Huang, Keren Li, Abhyuday Mandal, Jie Yang

Abstract: In this paper, we address the problem of designing an experimental plan with both discrete and continuous factors under fairly general parametric statistical models. We propose a new algorithm, named ForLion, to search for locally optimal approximate designs under the D-criterion. The algorithm performs an exhaustive search in a design space with mixed factors while kee** high efficiency and red… ▽ More In this paper, we address the problem of designing an experimental plan with both discrete and continuous factors under fairly general parametric statistical models. We propose a new algorithm, named ForLion, to search for locally optimal approximate designs under the D-criterion. The algorithm performs an exhaustive search in a design space with mixed factors while kee** high efficiency and reducing the number of distinct experimental settings. Its optimality is guaranteed by the general equivalence theorem. We present the relevant theoretical results for multinomial logit models (MLM) and generalized linear models (GLM), and demonstrate the superiority of our algorithm over state-of-the-art design algorithms using real-life experiments under MLM and GLM. Our simulation studies show that the ForLion algorithm could reduce the number of experimental settings by 25% or improve the relative efficiency of the designs by 17.5% on average. Our algorithm can help the experimenters reduce the time cost, the usage of experimental devices, and thus the total cost of their experiments while preserving high efficiencies of the designs. △ Less

Submitted 22 May, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

Comments: 36 pages, 7 tables, 5 figures

arXiv:2309.09222 [pdf, other]

Data-driven Modeling and Inference for Bayesian Gaussian Process ODEs via Double Normalizing Flows

Authors: Jian Xu, Shian Du, Junmei Yang, Xinghao Ding, John Paisley, Delu Zeng

Abstract: Recently, Gaussian processes have been used to model the vector field of continuous dynamical systems, referred to as GPODEs, which are characterized by a probabilistic ODE equation. Bayesian inference for these models has been extensively studied and applied in tasks such as time series prediction. However, the use of standard GPs with basic kernels like squared exponential kernels has been commo… ▽ More Recently, Gaussian processes have been used to model the vector field of continuous dynamical systems, referred to as GPODEs, which are characterized by a probabilistic ODE equation. Bayesian inference for these models has been extensively studied and applied in tasks such as time series prediction. However, the use of standard GPs with basic kernels like squared exponential kernels has been common in GPODE research, limiting the model's ability to represent complex scenarios. To address this limitation, we introduce normalizing flows to reparameterize the ODE vector field, resulting in a data-driven prior distribution, thereby increasing flexibility and expressive power. We develop a data-driven variational learning algorithm that utilizes analytically tractable probability density functions of normalizing flows, enabling simultaneous learning and inference of unknown continuous dynamics. Additionally, we also apply normalizing flows to the posterior inference of GP ODEs to resolve the issue of strong mean-field assumptions in posterior inference. By applying normalizing flows in both these ways, our model improves accuracy and uncertainty estimates for Bayesian Gaussian Process ODEs. We validate the effectiveness of our approach on simulated dynamical systems and real-world human motion data, including time series prediction and missing data recovery tasks. Experimental results show that our proposed method effectively captures model uncertainty while improving accuracy. △ Less

Submitted 2 January, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

arXiv:2308.16816 [pdf, other]

A General Equivalence Theorem for Crossover Designs under Generalized Linear Models

Authors: Jeevan Jankar, Jie Yang, Abhyuday Mandal

Abstract: With the help of Generalized Estimating Equations, we identify locally D-optimal crossover designs for generalized linear models. We adopt the variance of parameters of interest as the objective function, which is minimized using constrained optimization to obtain optimal crossover designs. In this case, the traditional general equivalence theorem could not be used directly to check the optimality… ▽ More With the help of Generalized Estimating Equations, we identify locally D-optimal crossover designs for generalized linear models. We adopt the variance of parameters of interest as the objective function, which is minimized using constrained optimization to obtain optimal crossover designs. In this case, the traditional general equivalence theorem could not be used directly to check the optimality of obtained designs. In this manuscript, we derive a corresponding general equivalence theorem for crossover designs under generalized linear models. △ Less

Submitted 7 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.08858 [pdf, ps, other]

Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games

Authors: Songtao Feng, Ming Yin, Yu-Xiang Wang, **g Yang, Yingbin Liang

Abstract: The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $ε$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/ε^2)$, which is optimal in the d… ▽ More The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $ε$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/ε^2)$, which is optimal in the dependence of the horizon $H$ and the number of states $S$ (where $A$ and $B$ denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the $H$ dependence as model-based algorithms. The main improvement of the dependency on $H$ arises by leveraging the popular variance reduction technique based on the reference-advantage decomposition previously used only for single-agent RL. However, such a technique relies on a critical monotonicity property of the value function, which does not hold in Markov games due to the update of the policy via the coarse correlated equilibrium (CCE) oracle. Thus, to extend such a technique to Markov games, our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions whose value difference is the smallest in the history in order to achieve the desired improvement in the sample efficiency. △ Less

Submitted 5 June, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

arXiv:2307.09295 [pdf, other]

Nested Elimination: A Simple Algorithm for Best-Item Identification from Choice-Based Feedback

Authors: Junwen Yang, Yifan Feng

Abstract: We study the problem of best-item identification from choice-based feedback. In this problem, a company sequentially and adaptively shows display sets to a population of customers and collects their choices. The objective is to identify the most preferred item with the least number of samples and at a high confidence level. We propose an elimination-based algorithm, namely Nested Elimination (NE),… ▽ More We study the problem of best-item identification from choice-based feedback. In this problem, a company sequentially and adaptively shows display sets to a population of customers and collects their choices. The objective is to identify the most preferred item with the least number of samples and at a high confidence level. We propose an elimination-based algorithm, namely Nested Elimination (NE), which is inspired by the nested structure implied by the information-theoretic lower bound. NE is simple in structure, easy to implement, and has a strong theoretical guarantee for sample complexity. Specifically, NE utilizes an innovative elimination criterion and circumvents the need to solve any complex combinatorial optimization problem. We provide an instance-specific and non-asymptotic bound on the expected sample complexity of NE. We also show NE achieves high-order worst-case asymptotic optimality. Finally, numerical experiments from both synthetic and real data corroborate our theoretical findings. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: Accepted to ICML 2023

arXiv:2307.00405 [pdf, ps, other]

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

Authors: Ruiquan Huang, Yingbin Liang, **g Yang

Abstract: The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-… ▽ More The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). Despite these advancements, existing approaches typically involve oracles or steps that are computationally intractable. On the other hand, the upper confidence bound (UCB) based approaches, which have served successfully as computationally efficient methods in bandits and MDPs, have not been investigated for more general PSRs, due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy. △ Less

Submitted 6 February, 2024; v1 submitted 1 July, 2023; originally announced July 2023.

Comments: Accepted by ICLR 2024

arXiv:2306.11058 [pdf, other]

Reciprocal hydrodynamic response estimation in a random spreading sea

Authors: Jiannan Yang, Robin Langley, Richard Lines

Abstract: Direct estimation of the hydrodynamic response of an offshore structure in a random spreading sea can lead to large computational costs. In this paper the actual spreading sea is replaced by an idealised diffuse wave field and the diffuse field reciprocity (DFR) relationship is derived analytically and verified against diffraction analysis for offshore application. The DFR approach provides an ana… ▽ More Direct estimation of the hydrodynamic response of an offshore structure in a random spreading sea can lead to large computational costs. In this paper the actual spreading sea is replaced by an idealised diffuse wave field and the diffuse field reciprocity (DFR) relationship is derived analytically and verified against diffraction analysis for offshore application. The DFR approach provides an analytical expression for the estimation of the wave loading spectrum in a spreading sea. It is very efficient because only the added dam** coefficients are required. Furthermore, if normalised to the peak amplitude of a spreading sea, an upper bound response can be obtained using the reciprocal approach. And this is demonstrated using a spar type floating wind turbine. Given that the hydrodynamic coefficients are routine outputs for offshore structural design, engineers would obtain the upper bound response without additional computational cost using this new approach. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: 5 figures; 1 table

arXiv:2306.08364 [pdf, other]

Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources

Authors: Chengshuai Shi, Wei Xiong, Cong Shen, **g Yang

Abstract: Existing theoretical studies on offline reinforcement learning (RL) mostly consider a dataset sampled directly from the target task. In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of… ▽ More Existing theoretical studies on offline reinforcement learning (RL) mostly consider a dataset sampled directly from the target task. In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself. An information-theoretic lower bound is derived, which reveals a necessary requirement on the number of involved sources in addition to that on the number of data samples. Then, a novel HetPEVI algorithm is proposed, which simultaneously considers the sample uncertainties from a finite number of data samples per data source and the source uncertainties due to a finite number of available data sources. Theoretical analyses demonstrate that HetPEVI can solve the target task as long as the data sources collectively provide a good data coverage. Moreover, HetPEVI is demonstrated to be optimal up to a polynomial factor of the horizon length. Finally, the study is extended to offline Markov games and offline robust RL, which demonstrates the generality of the proposed designs and theoretical analyses. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: ICML 2023

arXiv:2306.08280 [pdf, other]

Differentially Private Wireless Federated Learning Using Orthogonal Sequences

Authors: Xizixiang Wei, Tianhao Wang, Ruiquan Huang, Cong Shen, **g Yang, H. Vincent Poor

Abstract: We propose a privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the perspective of communication designs, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we… ▽ More We propose a privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the perspective of communication designs, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we prove that FLORAS offers both item-level and client-level differential privacy (DP) guarantees. Moreover, by properly adjusting the system parameters, FLORAS can flexibly achieve different DP levels at no additional cost. A new FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between the achieved convergence rate and differential privacy levels. Experimental results demonstrate the advantages of FLORAS compared with the baseline AirComp method, and validate that the analytical results can guide the design of privacy-preserving FL with different tradeoff requirements on the model convergence and privacy levels. △ Less

Submitted 21 November, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 33 pages, 5 figures

arXiv:2306.07652 [pdf]

Inactivated COVID-19 Vaccination did not affect In vitro fertilization (IVF) / Intra-Cytoplasmic Sperm Injection (ICSI) cycle outcomes

Authors: Qi Wan, Ying Ling Yao, XingYu Lv, Li Hong Geng, Yue Wang, Enoch Appiah Adu-Gyamfi, Xue Jiao Wang, Yue Qian, Juan Yang, Ming Xing Chend, Zhao Hui Zhong, Yuan Li, Yu Bin Ding

Abstract: Background: The objective of this study is to evaluate the impact of COVID-19 inactivated vaccine administration on the outcomes of in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) cycles in infertile couples in China. Methods: We collected data from the CYART prospective cohort, which included couples undergoing IVF treatment from January 2021 to September 2022 at Sichuan… ▽ More Background: The objective of this study is to evaluate the impact of COVID-19 inactivated vaccine administration on the outcomes of in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) cycles in infertile couples in China. Methods: We collected data from the CYART prospective cohort, which included couples undergoing IVF treatment from January 2021 to September 2022 at Sichuan **xin Xinan Women & Children's Hospital. Based on whether they received vaccination before ovarian stimulation, the couples were divided into the vaccination group and the non-vaccination group. We compared the laboratory parameters and pregnancy outcomes between the two groups. Findings: After performing propensity score matching (PSM), the analysis demonstrated similar clinical pregnancy rates, biochemical pregnancy and ongoing pregnancy rates between vaccinated and unvaccinated women. No significant disparities were found in terms of embryo development and laboratory parameters among the groups. Moreover, male vaccination had no impact on patient performance or pregnancy outcomes in assisted reproductive technology treatments. Additionally, there were no significant differences observed in the effects of vaccination on embryo development and pregnancy outcomes among couples undergoing ART. Interpretation: The findings suggest that COVID-19 vaccination did not have a significant effect on patients undergoing IVF/ICSI with fresh embryo transfer. Therefore, it is recommended that couples should receive COVID-19 vaccination as scheduled to help mitigate the COVID-19 pandemic. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: 26 pages, 4 figures and 5 tables

arXiv:2306.07464 [pdf, other]

Unlocking Sales Growth: Account Prioritization Engine with Explainable AI

Authors: Suvendu Jena, Jilei Yang, Fangfang Tan

Abstract: B2B sales requires effective prediction of customer growth, identification of upsell potential, and mitigation of churn risks. LinkedIn sales representatives traditionally relied on intuition and fragmented data signals to assess customer performance. This resulted in significant time investment in data understanding as well as strategy formulation and under-investment in active selling. To overco… ▽ More B2B sales requires effective prediction of customer growth, identification of upsell potential, and mitigation of churn risks. LinkedIn sales representatives traditionally relied on intuition and fragmented data signals to assess customer performance. This resulted in significant time investment in data understanding as well as strategy formulation and under-investment in active selling. To overcome this challenge, we developed a data product called Account Prioritizer, an intelligent sales account prioritization engine. It uses machine learning recommendation models and integrated account-level explanation algorithms within the sales CRM to automate the manual process of sales book prioritization. A successful A/B test demonstrated that the Account Prioritizer generated a substantial +8.08% increase in renewal bookings for the LinkedIn Business. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: 9 pages, 11 figures, 2 tables

arXiv:2306.06265 [pdf, other]

Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints

Authors: Donghao Li, Ruiquan Huang, Cong Shen, **g Yang

Abstract: This paper investigates conservative exploration in reinforcement learning where the performance of the learning agent is guaranteed to be above a certain threshold throughout the learning process. It focuses on the tabular episodic Markov Decision Process (MDP) setting that has finite states and actions. With the knowledge of an existing safe baseline policy, an algorithm termed as StepMix is pro… ▽ More This paper investigates conservative exploration in reinforcement learning where the performance of the learning agent is guaranteed to be above a certain threshold throughout the learning process. It focuses on the tabular episodic Markov Decision Process (MDP) setting that has finite states and actions. With the knowledge of an existing safe baseline policy, an algorithm termed as StepMix is proposed to balance the exploitation and exploration while ensuring that the conservative constraint is never violated in each episode with high probability. StepMix features a unique design of a mixture policy that adaptively and smoothly interpolates between the baseline policy and the optimistic policy. Theoretical analysis shows that StepMix achieves near-optimal regret order as in the constraint-free setting, indicating that obeying the stringent episode-wise conservative constraint does not compromise the learning performance. Besides, a randomization-based EpsMix algorithm is also proposed and shown to achieve the same performance as StepMix. The algorithm design and theoretical analysis are further extended to the setting where the baseline policy is not given a priori but must be learned from an offline dataset, and it is proved that similar conservative guarantee and regret can be achieved if the offline dataset is sufficiently large. Experiment results corroborate the theoretical analysis and demonstrate the effectiveness of the proposed conservative exploration strategies. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: Accepted by ICML2023

arXiv:2306.05275 [pdf, ps, other]

Federated Linear Contextual Bandits with User-level Differential Privacy

Authors: Ruiquan Huang, Huanyu Zhang, Luca Melis, Milan Shen, Meisam Hajzinia, **g Yang

Abstract: This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamenta… ▽ More This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamental trade-offs between the learning regrets and the corresponding DP guarantees in a federated linear contextual bandits model. For CDP, we propose a federated algorithm termed as $\texttt{ROBIN}$ and show that it is near-optimal in terms of the number of clients $M$ and the privacy budget $\varepsilon$ by deriving nearly-matching upper and lower regret bounds when user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating that learning under user-level $(\varepsilon,δ)$-LDP must suffer a regret blow-up factor at least $\min\{1/\varepsilon,M\}$ or $\min\{1/\sqrt{\varepsilon},\sqrt{M}\}$ under different conditions. △ Less

Submitted 9 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: Accepted by ICML 2023

Showing 1–50 of 302 results for author: Yang, J