Search | arXiv e-print repository

arXiv:2406.19619 [pdf, other]

ScoreFusion: fusing score-based generative models via Kullback-Leibler barycenters

Authors: Hao Liu, Junze, Ye, Jose Blanchet, Nian Si

Abstract: We study the problem of fusing pre-trained (auxiliary) generative models to enhance the training of a target generative model. We propose using KL-divergence weighted barycenters as an optimal fusion mechanism, in which the barycenter weights are optimally trained to minimize a suitable loss for the target population. While computing the optimal KL-barycenter weights can be challenging, we demonst… ▽ More We study the problem of fusing pre-trained (auxiliary) generative models to enhance the training of a target generative model. We propose using KL-divergence weighted barycenters as an optimal fusion mechanism, in which the barycenter weights are optimally trained to minimize a suitable loss for the target population. While computing the optimal KL-barycenter weights can be challenging, we demonstrate that this process can be efficiently executed using diffusion score training when the auxiliary generative models are also trained based on diffusion score methods. Moreover, we show that our fusion method has a dimension-free sample complexity in total variation distance provided that the auxiliary models are well fitted for their own task and the auxiliary tasks combined capture the target well. The main takeaway of our method is that if the auxiliary models are well-trained and can borrow features from each other that are present in the target, our fusion method significantly improves the training of generative models. We provide a concise computational implementation of the fusion algorithm, and validate its efficiency in the low-data regime with numerical experiments involving mixtures models and image datasets. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 40 pages, 6 figures

arXiv:2405.20435 [pdf, other]

Deep Learning for Computing Convergence Rates of Markov Chains

Authors: Yanlin Qu, Jose Blanchet, Peter Glynn

Abstract: Convergence rate analysis for general state-space Markov chains is fundamentally important in areas such as Markov chain Monte Carlo and algorithmic analysis (for computing explicit convergence bounds). This problem, however, is notoriously difficult because traditional analytical methods often do not generate practically useful convergence bounds for realistic Markov chains. We propose the Deep C… ▽ More Convergence rate analysis for general state-space Markov chains is fundamentally important in areas such as Markov chain Monte Carlo and algorithmic analysis (for computing explicit convergence bounds). This problem, however, is notoriously difficult because traditional analytical methods often do not generate practically useful convergence bounds for realistic Markov chains. We propose the Deep Contractive Drift Calculator (DCDC), the first general-purpose sample-based algorithm for bounding the convergence of Markov chains to stationarity in Wasserstein distance. The DCDC has two components. First, inspired by the new convergence analysis framework in (Qu et.al, 2023), we introduce the Contractive Drift Equation (CDE), the solution of which leads to an explicit convergence bound. Second, we develop an efficient neural-network-based CDE solver. Equipped with these two components, DCDC solves the CDE and converts the solution into a convergence bound. We analyze the sample complexity of the algorithm and further demonstrate the effectiveness of the DCDC by generating convergence bounds for realistic Markov chains arising from stochastic processing networks as well as constant step-size stochastic optimization. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.03198 [pdf, other]

Stability Evaluation via Distributional Perturbation Analysis

Authors: Jose Blanchet, Peng Cui, Jia** Li, Jiashuo Liu

Abstract: The performance of learning models often deteriorates when deployed in out-of-sample environments. To ensure reliable deployment, we propose a stability evaluation criterion based on distributional perturbations. Conceptually, our stability evaluation criterion is defined as the minimal perturbation required on our observed dataset to induce a prescribed deterioration in risk evaluation. In this p… ▽ More The performance of learning models often deteriorates when deployed in out-of-sample environments. To ensure reliable deployment, we propose a stability evaluation criterion based on distributional perturbations. Conceptually, our stability evaluation criterion is defined as the minimal perturbation required on our observed dataset to induce a prescribed deterioration in risk evaluation. In this paper, we utilize the optimal transport (OT) discrepancy with moment constraints on the \textit{(sample, density)} space to quantify this perturbation. Therefore, our stability evaluation criterion can address both \emph{data corruptions} and \emph{sub-population shifts} -- the two most common types of distribution shifts in real-world scenarios. To further realize practical benefits, we present a series of tractable convex formulations and computational methods tailored to different classes of loss functions. The key technical tool to achieve this is the strong duality theorem provided in this paper. Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications. These empirical studies showcase the criterion's ability not only to compare the stability of different learning models and features but also to provide valuable guidelines and strategies to further improve models. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted by ICML 2024

arXiv:2404.19145 [pdf, other]

Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty

Authors: Kaizhao Liu, Jose Blanchet, Lexing Ying, Yi** Lu

Abstract: Bootstrap is a popular methodology for simulating input uncertainty. However, it can be computationally expensive when the number of samples is large. We propose a new approach called \textbf{Orthogonal Bootstrap} that reduces the number of required Monte Carlo replications. We decomposes the target being simulated into two parts: the \textit{non-orthogonal part} which has a closed-form result kno… ▽ More Bootstrap is a popular methodology for simulating input uncertainty. However, it can be computationally expensive when the number of samples is large. We propose a new approach called \textbf{Orthogonal Bootstrap} that reduces the number of required Monte Carlo replications. We decomposes the target being simulated into two parts: the \textit{non-orthogonal part} which has a closed-form result known as Infinitesimal Jackknife and the \textit{orthogonal part} which is easier to be simulated. We theoretically and numerically show that Orthogonal Bootstrap significantly reduces the computational cost of Bootstrap while improving empirical accuracy and maintaining the same width of the constructed interval. △ Less

Submitted 30 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.09064 [pdf, ps, other]

On the First Passage Times of Branching Random Walks in $\mathbb R^d$

Authors: Jose Blanchet, Wei Cai, Shaswat Mohanty, Zhenyuan Zhang

Abstract: We study the first passage times of discrete-time branching random walks in ${\mathbb R}^d$ where $d\geq 1$. Here, the genealogy of the particles follows a supercritical Galton-Watson process. We provide asymptotics of the first passage times to a ball of radius one with a distance $x$ from the origin, conditioned upon survival. We provide explicitly the linear dominating term and the logarithmic… ▽ More We study the first passage times of discrete-time branching random walks in ${\mathbb R}^d$ where $d\geq 1$. Here, the genealogy of the particles follows a supercritical Galton-Watson process. We provide asymptotics of the first passage times to a ball of radius one with a distance $x$ from the origin, conditioned upon survival. We provide explicitly the linear dominating term and the logarithmic correction term as a function of $x$. The asymptotics are precise up to an order of $o_{\mathbb P}(\log x)$ for general jump distributions and up to $O_{\mathbb P}(\log\log x)$ for spherically symmetric jumps. A crucial ingredient of both results is the tightness of first passage times. We also discuss an extension of the first passage time analysis to a modified branching random walk model that has been proven to successfully capture shortest path statistics in polymer networks. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 40 pages, 8 figures

MSC Class: 60G70; 60J80; 60J85; 60G50

arXiv:2404.01431 [pdf, other]

When are Unbiased Monte Carlo Estimators More Preferable than Biased Ones?

Authors: Guanyang Wang, Jose Blanchet, Peter W. Glynn

Abstract: Due to the potential benefits of parallelization, designing unbiased Monte Carlo estimators, primarily in the setting of randomized multilevel Monte Carlo, has recently become very popular in operations research and computational statistics. However, existing work primarily substantiates the benefits of unbiased estimators at an intuitive level or using empirical evaluations. The intuition being t… ▽ More Due to the potential benefits of parallelization, designing unbiased Monte Carlo estimators, primarily in the setting of randomized multilevel Monte Carlo, has recently become very popular in operations research and computational statistics. However, existing work primarily substantiates the benefits of unbiased estimators at an intuitive level or using empirical evaluations. The intuition being that unbiased estimators can be replicated in parallel enabling fast estimation in terms of wall-clock time. This intuition ignores that, typically, bias will be introduced due to impatience because most unbiased estimators necesitate random completion times. This paper provides a mathematical framework for comparing these methods under various metrics, such as completion time and overall computational cost. Under practical assumptions, our findings reveal that unbiased methods typically have superior completion times - the degree of superiority being quantifiable through the tail behavior of their running time distribution - but they may not automatically provide substantial savings in overall computational costs. We apply our findings to Markov Chain Monte Carlo and Multilevel Monte Carlo methods to identify the conditions and scenarios where unbiased methods have an advantage, thus assisting practitioners in making informed choices between unbiased and biased methods. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 35 pages

arXiv:2403.14067 [pdf, other]

Automatic Outlier Rectification via Optimal Transport

Authors: Jose Blanchet, Jia** Li, Markus Pelger, Greg Zanotti

Abstract: In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for… ▽ More In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for improvement. To address this limitation, we propose an automatic outlier rectification mechanism that integrates rectification and estimation within a joint optimization framework. We take the first step to utilize an optimal transport distance with a concave cost function to construct a rectification set in the space of probability distributions. Then, we select the best distribution within the rectification set to perform the estimation task. Notably, the concave cost function we introduced in this paper is the key to making our estimator effectively identify the outlier during the optimization process. We discuss the fundamental differences between our estimator and optimal transport-based distributionally robust optimization estimator. finally, we demonstrate the effectiveness and superiority of our approach over conventional approaches in extensive simulation and empirical analyses for mean estimation, least absolute regression, and the fitting of option implied volatility surfaces. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2401.12197 [pdf, other]

Empirical martingale projections via the adapted Wasserstein distance

Authors: Jose Blanchet, Johannes Wiesel, Erica Zhang, Zhenyuan Zhang

Abstract: Given a collection of multidimensional pairs $\{(X_i,Y_i):1 \leq i\leq n\}$, we study the problem of projecting the associated suitably smoothed empirical measure onto the space of martingale couplings (i.e. distributions satisfying $\mathbb{E}[Y|X]=X$) using the adapted Wasserstein distance. We call the resulting distance the smoothed empirical martingale projection distance (SE-MPD), for which w… ▽ More Given a collection of multidimensional pairs $\{(X_i,Y_i):1 \leq i\leq n\}$, we study the problem of projecting the associated suitably smoothed empirical measure onto the space of martingale couplings (i.e. distributions satisfying $\mathbb{E}[Y|X]=X$) using the adapted Wasserstein distance. We call the resulting distance the smoothed empirical martingale projection distance (SE-MPD), for which we obtain an explicit characterization. We also show that the space of martingale couplings remains invariant under the smoothing operation. We study the asymptotic limit of the SE-MPD, which converges at a parametric rate as the sample size increases if the pairs are either i.i.d. or satisfy appropriate mixing assumptions. Additional finite-sample results are also investigated. Using these results, we introduce a novel consistent martingale coupling hypothesis test, which we apply to test the existence of arbitrage opportunities in recently introduced neural network-based generative models for asset pricing calibration. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 55 pages, 7 figures

arXiv:2401.05016 [pdf, other]

Exploring first and second-order spatio-temporal structures of lightning strike impacts in the French Alps using subsampling

Authors: Jean-François Coeurjolly, J Blanchet, Alexis Pellerin

Abstract: We model cloud-to-ground lightning strike impacts in the French Alps over the period 2011-2021 (approximately 1.4 million of events) using spatio-temporal point processes. We investigate first and higher-order structure for this point pattern and address the questions of homogeneity of the intensity function, first-order separability and dependence between events. The tuning of nonparametric metho… ▽ More We model cloud-to-ground lightning strike impacts in the French Alps over the period 2011-2021 (approximately 1.4 million of events) using spatio-temporal point processes. We investigate first and higher-order structure for this point pattern and address the questions of homogeneity of the intensity function, first-order separability and dependence between events. The tuning of nonparametric methods and the different tests we consider in this study make the computational cost very expensive. We therefore suggest different subsampling strategies to achieve these tasks. △ Less

Submitted 11 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2312.09862 [pdf, other]

Wasserstein-based Minimax Estimation of Dependence in Multivariate Regularly Varying Extremes

Authors: Xuhui Zhang, Jose Blanchet, Youssef Marzouk, Viet Anh Nguyen, Sven Wang

Abstract: We study minimax risk bounds for estimators of the spectral measure in multivariate linear factor models, where observations are linear combinations of regularly varying latent factors. Non-asymptotic convergence rates are derived for the multivariate Peak-over-Threshold estimator in terms of the $p$-th order Wasserstein distance, and information-theoretic lower bounds for the minimax risks are es… ▽ More We study minimax risk bounds for estimators of the spectral measure in multivariate linear factor models, where observations are linear combinations of regularly varying latent factors. Non-asymptotic convergence rates are derived for the multivariate Peak-over-Threshold estimator in terms of the $p$-th order Wasserstein distance, and information-theoretic lower bounds for the minimax risks are established. The convergence rate of the estimator is shown to be minimax optimal under a class of Pareto-type models analogous to the standard class used in the setting of one-dimensional observations known as the Hall-Welsh class. When the estimator is minimax inefficient, a novel two-step estimator is introduced and demonstrated to attain the minimax lower bound. Our analysis bridges the gaps in understanding trade-offs between estimation bias and variance in multivariate extreme value theory. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2311.09018 [pdf, ps, other]

On the Foundation of Distributionally Robust Reinforcement Learning

Authors: Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou

Abstract: Motivated by the need for a robust policy in the face of environment shifts between training and the deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs). This framework obliges the decision maker to… ▽ More Motivated by the need for a robust policy in the face of environment shifts between training and the deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs). This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary. By unifying and extending existing formulations, we rigorously construct DRMDPs that embraces various modeling attributes for both the decision maker and the adversary. These attributes include adaptability granularity, exploring history-dependent, Markov, and Markov time-homogeneous decision maker and adversary dynamics. Additionally, we delve into the flexibility of shifts induced by the adversary, examining SA and S-rectangularity. Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP). From an algorithmic standpoint, the existence of DPP holds significant implications, as the vast majority of existing data and computationally efficiency RL algorithms are reliant on the DPP. To study its existence, we comprehensively examine combinations of controller and adversary attributes, providing streamlined proofs grounded in a unified methodology. We also offer counterexamples for settings in which a DPP with full generality is absent. △ Less

Submitted 19 January, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.02423 [pdf, other]

Payoff-based learning with matrix multiplicative weights in quantum games

Authors: Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos, Jose Blanchet

Abstract: In this paper, we study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback. For concreteness, we focus on the widely used matrix multiplicative weights (MMW) algorithm and, instead of requiring players to have full knowledge of the game (and/or each other's chosen states), we introduce a suite of minimal-information matrix multip… ▽ More In this paper, we study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback. For concreteness, we focus on the widely used matrix multiplicative weights (MMW) algorithm and, instead of requiring players to have full knowledge of the game (and/or each other's chosen states), we introduce a suite of minimal-information matrix multiplicative weights (3MW) methods tailored to different information frameworks. The main difficulty to attaining convergence in this setting is that, in contrast to classical finite games, quantum games have an infinite continuum of pure states (the quantum equivalent of pure strategies), so standard importance-weighting techniques for estimating payoff vectors cannot be employed. Instead, we borrow ideas from bandit convex optimization and we design a zeroth-order gradient sampler adapted to the semidefinite geometry of the problem at hand. As a first result, we show that the 3MW method with deterministic payoff feedback retains the $\mathcal{O}(1/\sqrt{T})$ convergence rate of the vanilla, full information MMW algorithm in quantum min-max games, even though the players only observe a single scalar. Subsequently, we relax the algorithm's information requirements even further and we provide a 3MW method that only requires players to observe a random realization of their payoff observable, and converges to equilibrium at an $\mathcal{O}(T^{-1/4})$ rate. Finally, going beyond zero-sum games, we show that a regularized variant of the proposed 3MW method guarantees local convergence with high probability to all equilibria that satisfy a certain first-order stability condition. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 39 pages, 21 figures, 2 tables

MSC Class: Primary 91A10; 91A26; 37N40; secondary 68Q32; 81Q93

arXiv:2310.18551 [pdf, ps, other]

Modeling Shortest Paths in Polymeric Networks using Spatial Branching Processes

Authors: Zhenyuan Zhang, Shaswat Mohanty, Jose Blanchet, Wei Cai

Abstract: Recent studies have established a connection between the macroscopic mechanical response of polymeric materials and the statistics of the shortest path (SP) length between distant nodes in the polymer network. Since these statistics can be costly to compute and difficult to study theoretically, we introduce a branching random walk (BRW) model to describe the SP statistics from the coarse-grained m… ▽ More Recent studies have established a connection between the macroscopic mechanical response of polymeric materials and the statistics of the shortest path (SP) length between distant nodes in the polymer network. Since these statistics can be costly to compute and difficult to study theoretically, we introduce a branching random walk (BRW) model to describe the SP statistics from the coarse-grained molecular dynamics (CGMD) simulations of polymer networks. We postulate that the first passage time (FPT) of the BRW to a given termination site can be used to approximate the statistics of the SP between distant nodes in the polymer network. We develop a theoretical framework for studying the FPT of spatial branching processes and obtain an analytical expression for estimating the FPT distribution as a function of the cross-link density. We demonstrate by extensive numerical calculations that the distribution of the FPT of the BRW model agrees well with the SP distribution from the CGMD simulations. The theoretical estimate and the corresponding numerical implementations of BRW provide an efficient way of approximating the SP distribution in a polymer network. Our results have the physical meaning that by accounting for the realistic topology of polymer networks, extensive bond-breaking is expected to occur at a much smaller stretch than that expected from idealized models assuming periodic network structures. Our work presents the first analysis of polymer networks as a BRW and sets the framework for develo** a generalizable spatial branching model for studying the macroscopic evolution of polymeric systems. △ Less

Submitted 30 March, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: 37 pages, 17 figures

arXiv:2310.08833 [pdf, other]

Optimal Sample Complexity for Average Reward Markov Decision Processes

Authors: Shengbo Wang, Jose Blanchet, Peter Glynn

Abstract: We resolve the open question regarding the sample complexity of policy learning for maximizing the long-run average reward associated with a uniformly ergodic Markov decision process (MDP), assuming a generative model. In this context, the existing literature provides a sample complexity upper bound of $\widetilde O(|S||A|t_{\text{mix}}^2 ε^{-2})$ and a lower bound of… ▽ More We resolve the open question regarding the sample complexity of policy learning for maximizing the long-run average reward associated with a uniformly ergodic Markov decision process (MDP), assuming a generative model. In this context, the existing literature provides a sample complexity upper bound of $\widetilde O(|S||A|t_{\text{mix}}^2 ε^{-2})$ and a lower bound of $Ω(|S||A|t_{\text{mix}} ε^{-2})$. In these expressions, $|S|$ and $|A|$ denote the cardinalities of the state and action spaces respectively, $t_{\text{mix}}$ serves as a uniform upper limit for the total variation mixing times, and $ε$ signifies the error tolerance. Therefore, a notable gap of $t_{\text{mix}}$ still remains to be bridged. Our primary contribution is the development of an estimator for the optimal policy of average reward MDPs with a sample complexity of $\widetilde O(|S||A|t_{\text{mix}}ε^{-2})$. This marks the first algorithm and analysis to reach the literature's lower bound. Our new algorithm draws inspiration from ideas in Li et al. (2020), ** and Sidford (2021), and Wang et al. (2023). Additionally, we conduct numerical experiments to validate our theoretical findings. △ Less

Submitted 12 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2308.10341 [pdf, ps, other]

Computable Bounds on Convergence of Markov Chains in Wasserstein Distance

Authors: Yanlin Qu, Jose Blanchet, Peter Glynn

Abstract: We introduce a unified framework to estimate the convergence of Markov chains to equilibrium using Wasserstein distance. The framework provides convergence bounds with various rates, ranging from polynomial to exponential, all derived from a single contractive drift condition. This approach removes the need for finding a specific set with drift outside and contraction inside. The convergence bound… ▽ More We introduce a unified framework to estimate the convergence of Markov chains to equilibrium using Wasserstein distance. The framework provides convergence bounds with various rates, ranging from polynomial to exponential, all derived from a single contractive drift condition. This approach removes the need for finding a specific set with drift outside and contraction inside. The convergence bounds are explicit, as they can be estimated based on one-step expectations and do not rely on equilibrium-related quantities. To enhance the applicability of the framework, we introduce the large M technique and the boundary removal technique. We illustrate these methods in queueing models and algorithms in stochastic optimization. △ Less

Submitted 20 August, 2023; originally announced August 2023.

MSC Class: 60J05

arXiv:2308.05414 [pdf, other]

Unifying Distributionally Robust Optimization via Optimal Transport Theory

Authors: Jose Blanchet, Daniel Kuhn, Jia** Li, Bahar Taskesen

Abstract: In the past few years, there has been considerable interest in two prominent approaches for Distributionally Robust Optimization (DRO): Divergence-based and Wasserstein-based methods. The divergence approach models misspecification in terms of likelihood ratios, while the latter models it through a measure of distance or cost in actual outcomes. Building upon these advances, this paper introduces… ▽ More In the past few years, there has been considerable interest in two prominent approaches for Distributionally Robust Optimization (DRO): Divergence-based and Wasserstein-based methods. The divergence approach models misspecification in terms of likelihood ratios, while the latter models it through a measure of distance or cost in actual outcomes. Building upon these advances, this paper introduces a novel approach that unifies these methods into a single framework based on optimal transport (OT) with conditional moment constraints. Our proposed approach, for example, makes it possible for optimal adversarial distributions to simultaneously perturb likelihood and outcomes, while producing an optimal (in an optimal transport sense) coupling between the baseline model and the adversarial model.Additionally, the paper investigates several duality results and presents tractable reformulations that enhance the practical applicability of this unified framework. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2305.18420 [pdf, other]

Sample Complexity of Variance-reduced Distributionally Robust Q-learning

Authors: Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou

Abstract: Dynamic decision making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment on which the data is collected can differ from that of the environment on which the model is deployed. This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced coun… ▽ More Dynamic decision making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment on which the data is collected can differ from that of the environment on which the model is deployed. This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced counterpart, that can effectively learn a robust policy despite distributional shifts. These algorithms are designed to efficiently approximate the $q$-function of an infinite-horizon $γ$-discounted robust Markov decision process with Kullback-Leibler uncertainty set to an entry-wise $ε$-degree of precision. Further, the variance-reduced distributionally robust Q-learning combines the synchronous Q-learning with variance-reduction techniques to enhance its performance. Consequently, we establish that it attains a minmax sample complexity upper bound of $\tilde O(|S||A|(1-γ)^{-4}ε^{-2})$, where $S$ and $A$ denote the state and action spaces. This is the first complexity result that is independent of the uncertainty size $δ$, thereby providing new complexity theoretic insights. Additionally, a series of numerical experiments confirm the theoretical findings and the efficiency of the algorithms in handling distributional shifts. △ Less

Submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.16527 [pdf, other]

When can Regression-Adjusted Control Variates Help? Rare Events, Sobolev Embedding and Minimax Optimality

Authors: Jose Blanchet, Haoxuan Chen, Yi** Lu, Lexing Ying

Abstract: This paper studies the use of a machine learning-based estimator as a control variate for mitigating the variance of Monte Carlo sampling. Specifically, we seek to uncover the key factors that influence the efficiency of control variates in reducing variance. We examine a prototype estimation problem that involves simulating the moments of a Sobolev function based on observations obtained from (ra… ▽ More This paper studies the use of a machine learning-based estimator as a control variate for mitigating the variance of Monte Carlo sampling. Specifically, we seek to uncover the key factors that influence the efficiency of control variates in reducing variance. We examine a prototype estimation problem that involves simulating the moments of a Sobolev function based on observations obtained from (random) quadrature nodes. Firstly, we establish an information-theoretic lower bound for the problem. We then study a specific quadrature rule that employs a nonparametric regression-adjusted control variate to reduce the variance of the Monte Carlo simulation. We demonstrate that this kind of quadrature rule can improve the Monte Carlo rate and achieve the minimax optimal rate under a sufficient smoothness assumption. Due to the Sobolev Embedding Theorem, the sufficient smoothness assumption eliminates the existence of rare and extreme events. Finally, we show that, in the presence of rare and extreme events, a truncated version of the Monte Carlo algorithm can achieve the minimax optimal rate while the control variate cannot improve the convergence rate. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.09659 [pdf, ps, other]

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

Authors: Jose Blanchet, Miao Lu, Tong Zhang, Han Zhong

Abstract: In this paper, we study distributionally robust offline reinforcement learning (robust offline RL), which seeks to find an optimal policy purely from an offline dataset that can perform well in perturbed environments. In specific, we propose a generic algorithm framework called Doubly Pessimistic Model-based Policy Optimization ($P^2MPO$), which features a novel combination of a flexible model est… ▽ More In this paper, we study distributionally robust offline reinforcement learning (robust offline RL), which seeks to find an optimal policy purely from an offline dataset that can perform well in perturbed environments. In specific, we propose a generic algorithm framework called Doubly Pessimistic Model-based Policy Optimization ($P^2MPO$), which features a novel combination of a flexible model estimation subroutine and a doubly pessimistic policy optimization step. Notably, the double pessimism principle is crucial to overcome the distributional shifts incurred by (i) the mismatch between the behavior policy and the target policies; and (ii) the perturbation of the nominal model. Under certain accuracy conditions on the model estimation subroutine, we prove that $P^2MPO$ is sample-efficient with robust partial coverage data, which only requires the offline data to have good coverage of the distributions induced by the optimal robust policy and the perturbed models around the nominal model. By tailoring specific model estimation subroutines for concrete examples of RMDPs, including tabular RMDPs, factored RMDPs, kernel and neural RMDPs, we prove that $P^2MPO$ enjoys a $\tilde{\mathcal{O}}(n^{-1/2})$ convergence rate, where $n$ is the dataset size. We highlight that all these examples, except tabular RMDPs, are first identified and proven tractable by this work. Furthermore, we continue our study of robust offline RL in the robust Markov games (RMGs). By extending the double pessimism principle identified for single-agent RMDPs, we propose another algorithm framework that can efficiently find the robust Nash equilibria among players using only robust unilateral (partial) coverage data. To our best knowledge, this work proposes the first general learning principle -- double pessimism -- for robust offline RL and shows that it is provably efficient with general function approximation. △ Less

Submitted 22 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: V2 adds results on robust offline Markov games

arXiv:2303.14867 [pdf, ps, other]

Statistical Limit Theorems in Distributionally Robust Optimization

Authors: Jose Blanchet, Alexander Shapiro

Abstract: The goal of this paper is to develop methodology for the systematic analysis of asymptotic statistical properties of data driven DRO formulations based on their corresponding non-DRO counterparts. We illustrate our approach in various settings, including both phi-divergence and Wasserstein uncertainty sets. Different types of asymptotic behaviors are obtained depending on the rate at which the unc… ▽ More The goal of this paper is to develop methodology for the systematic analysis of asymptotic statistical properties of data driven DRO formulations based on their corresponding non-DRO counterparts. We illustrate our approach in various settings, including both phi-divergence and Wasserstein uncertainty sets. Different types of asymptotic behaviors are obtained depending on the rate at which the uncertainty radius decreases to zero as a function of the sample size and the geometry of the uncertainty sets. △ Less

Submitted 26 March, 2023; originally announced March 2023.

MSC Class: 90C15

arXiv:2303.06595 [pdf, other]

A Convergent Single-Loop Algorithm for Relaxation of Gromov-Wasserstein in Graph Data

Authors: Jia** Li, Jianheng Tang, Lemin Kong, Huikang Liu, Jia Li, Anthony Man-Cho So, Jose Blanchet

Abstract: In this work, we present the Bregman Alternating Projected Gradient (BAPG) method, a single-loop algorithm that offers an approximate solution to the Gromov-Wasserstein (GW) distance. We introduce a novel relaxation technique that balances accuracy and computational efficiency, albeit with some compromises in the feasibility of the coupling map. Our analysis is based on the observation that the GW… ▽ More In this work, we present the Bregman Alternating Projected Gradient (BAPG) method, a single-loop algorithm that offers an approximate solution to the Gromov-Wasserstein (GW) distance. We introduce a novel relaxation technique that balances accuracy and computational efficiency, albeit with some compromises in the feasibility of the coupling map. Our analysis is based on the observation that the GW problem satisfies the Luo-Tseng error bound condition, which relates to estimating the distance of a point to the critical point set of the GW problem based on the optimality residual. This observation allows us to provide an approximation bound for the distance between the fixed-point set of BAPG and the critical point set of GW. Moreover, under a mild technical assumption, we can show that BAPG converges to its fixed point set. The effectiveness of BAPG has been validated through comprehensive numerical experiments in graph alignment and partition tasks, where it outperforms existing methods in terms of both solution quality and wall-clock time. △ Less

Submitted 12 March, 2023; originally announced March 2023.

Comments: Accepted by ICLR 2023

arXiv:2302.07477 [pdf, ps, other]

Optimal Sample Complexity of Reinforcement Learning for Mixing Discounted Markov Decision Processes

Authors: Shengbo Wang, Jose Blanchet, Peter Glynn

Abstract: We consider the optimal sample complexity theory of tabular reinforcement learning (RL) for maximizing the infinite horizon discounted reward in a Markov decision process (MDP). Optimal worst-case complexity results have been developed for tabular RL problems in this setting, leading to a sample complexity dependence on $γ$ and $ε$ of the form $\tilde Θ((1-γ)^{-3}ε^{-2})$, where $γ$ denotes the di… ▽ More We consider the optimal sample complexity theory of tabular reinforcement learning (RL) for maximizing the infinite horizon discounted reward in a Markov decision process (MDP). Optimal worst-case complexity results have been developed for tabular RL problems in this setting, leading to a sample complexity dependence on $γ$ and $ε$ of the form $\tilde Θ((1-γ)^{-3}ε^{-2})$, where $γ$ denotes the discount factor and $ε$ is the solution error tolerance. However, in many applications of interest, the optimal policy (or all policies) induces mixing. We establish that in such settings, the optimal sample complexity dependence is $\tilde Θ(t_{\text{mix}}(1-γ)^{-2}ε^{-2})$, where $t_{\text{mix}}$ is the total variation mixing time. Our analysis is grounded in regeneration-type ideas, which we believe are of independent interest, as they can be used to study RL problems for general state space MDPs. △ Less

Submitted 30 September, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2212.12978 [pdf, other]

Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization

Authors: Taoli Zheng, Linglingzhi Zhu, Anthony Man-Cho So, Jose Blanchet, Jia** Li

Abstract: Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-Łojasiewicz (PŁ) and Kurdyka-Łojasiewicz (KŁ) conditions. However, verif… ▽ More Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-Łojasiewicz (PŁ) and Kurdyka-Łojasiewicz (KŁ) conditions. However, verifying these regularity conditions is challenging in practice. To meet this challenge, we propose a novel universally applicable single-loop algorithm, the doubly smoothed gradient descent ascent method (DS-GDA), which naturally balances the primal and dual updates. That is, DS-GDA with the same hyperparameters is able to uniformly solve nonconvex-concave, convex-nonconcave, and nonconvex-nonconcave problems with one-sided KŁ properties, achieving convergence with $\mathcal{O}(ε^{-4})$ complexity. Sharper (even optimal) iteration complexity can be obtained when the KŁ exponent is known. Specifically, under the one-sided KŁ condition with exponent $θ\in(0,1)$, DS-GDA converges with an iteration complexity of $\mathcal{O}(ε^{-2\max\{2θ,1\}})$. They all match the corresponding best results in the literature. Moreover, we show that DS-GDA is practically applicable to general nonconvex-nonconcave problems even without any regularity conditions, such as the PŁ condition, KŁ condition, or weak Minty variational inequalities condition. For various challenging nonconvex-nonconcave examples in the literature, including ``Forsaken'', ``Bilinearly-coupled minimax'', ``Sixth-order polynomial'', and ``PolarGame'', the proposed DS-GDA can all get rid of limit cycles. To the best of our knowledge, this is the first first-order algorithm to achieve convergence on all of these formidable problems. △ Less

Submitted 30 October, 2023; v1 submitted 25 December, 2022; originally announced December 2022.

arXiv:2211.15241 [pdf, other]

Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls

Authors: Yi** Lu, Jia** Li, Lexing Ying, Jose Blanchet

Abstract: The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the dif… ▽ More The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the difference between the weighted average outcomes of the treated and control units, where the weights are learned from the observed data. {Under this setting, we surprisingly observed that the optimal experimental design problem could be reduced to a so-called \textit{phase synchronization} problem.} We solve this problem via a normalized variant of the generalized power method with spectral initialization. On the theoretical side, we establish the first global optimality guarantee for experiment design when pre-treatment data is sampled from certain data-generating processes. Empirically, we conduct extensive experiments to demonstrate the effectiveness of our method on both the US Bureau of Labor Statistics and the Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean square error, our algorithm surpasses the random design by a large margin. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2210.01413 [pdf, other]

Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints

Authors: Jia** Li, Sirui Lin, Jose Blanchet, Viet Anh Nguyen

Abstract: Distributionally robust optimization has been shown to offer a principled way to regularize learning models. In this paper, we find that Tikhonov regularization is distributionally robust in an optimal transport sense (i.e., if an adversary chooses distributions in a suitable optimal transport neighborhood of the empirical measure), provided that suitable martingale constraints are also imposed. F… ▽ More Distributionally robust optimization has been shown to offer a principled way to regularize learning models. In this paper, we find that Tikhonov regularization is distributionally robust in an optimal transport sense (i.e., if an adversary chooses distributions in a suitable optimal transport neighborhood of the empirical measure), provided that suitable martingale constraints are also imposed. Further, we introduce a relaxation of the martingale constraints which not only provides a unified viewpoint to a class of existing robust methods but also leads to new regularization tools. To realize these novel tools, tractable computational algorithms are proposed. As a byproduct, the strong duality theorem proved in this paper can be potentially applied to other problems of independent interest. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Comments: Accepted by NeurIPS 2022

arXiv:2209.14430 [pdf, other]

Minimax Optimal Kernel Operator Learning via Multilevel Training

Authors: Jikai **, Yi** Lu, Jose Blanchet, Lexing Ying

Abstract: Learning map**s between infinite-dimensional function spaces has achieved empirical success in many disciplines of machine learning, including generative modeling, functional data analysis, causal inference, and multi-agent reinforcement learning. In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbe… ▽ More Learning map**s between infinite-dimensional function spaces has achieved empirical success in many disciplines of machine learning, including generative modeling, functional data analysis, causal inference, and multi-agent reinforcement learning. In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbert spaces. We establish the information-theoretic lower bound in terms of the Sobolev Hilbert-Schmidt norm and show that a regularization that learns the spectral components below the bias contour and ignores the ones that are above the variance contour can achieve the optimal learning rate. At the same time, the spectral components between the bias and variance contours give us flexibility in designing computationally feasible machine learning algorithms. Based on this observation, we develop a multilevel kernel operator learning algorithm that is optimal when learning linear operators between infinite-dimensional function spaces. △ Less

Submitted 24 July, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: ICLR 2023 spotlight

arXiv:2205.13111 [pdf, other]

Distributionally Robust Gaussian Process Regression and Bayesian Inverse Problems

Authors: Xuhui Zhang, Jose Blanchet, Youssef Marzouk, Viet Anh Nguyen, Sven Wang

Abstract: We study a distributionally robust optimization formulation (i.e., a min-max game) for two representative problems in Bayesian nonparametric estimation: Gaussian process regression and, more generally, linear inverse problems. Our formulation seeks the best mean-squared error predictor, in an infinite-dimensional space, against an adversary who chooses the worst-case model in a Wasserstein ball ar… ▽ More We study a distributionally robust optimization formulation (i.e., a min-max game) for two representative problems in Bayesian nonparametric estimation: Gaussian process regression and, more generally, linear inverse problems. Our formulation seeks the best mean-squared error predictor, in an infinite-dimensional space, against an adversary who chooses the worst-case model in a Wasserstein ball around a nominal infinite-dimensional Bayesian model. The transport cost is chosen to control features such as the degree of roughness of the sample paths that the adversary is allowed to inject. We show that the game has a well-defined value (i.e., strong duality holds in the sense that max-min equals min-max) and that there exists a unique Nash equilibrium which can be computed by a sequence of finite-dimensional approximations. Crucially, the worst-case distribution is itself Gaussian. We explore properties of the Nash equilibrium and the effects of hyperparameters through a set of numerical experiments, demonstrating the versatility of our modeling framework. △ Less

Submitted 20 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.07331 [pdf, other]

Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent

Authors: Yi** Lu, Jose Blanchet, Lexing Ying

Abstract: In this paper, we study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations using a general class of objective functions. Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN) for solving elliptic partial differential… ▽ More In this paper, we study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations using a general class of objective functions. Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN) for solving elliptic partial differential equations (PDEs) as special cases. We consider a potentially infinite-dimensional parameterization of our model using a suitable Reproducing Kernel Hilbert Space and a continuous parameterization of problem hardness through the definition of kernel integral operators. We prove that gradient descent over this objective function can also achieve statistical optimality and the optimal number of passes over the data increases with sample size. Based on our theory, we explain an implicit acceleration of using a Sobolev norm as the objective function for training, inferring that the optimal number of epochs of DRM becomes larger than the number of PINN when both the data size and the hardness of tasks increase, although both DRM and PINN can achieve statistical optimality. △ Less

Submitted 19 September, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

arXiv:2202.10799 [pdf, other]

Large deviations asymptotics for unbounded additive functionals of diffusion processes

Authors: Mihail Bazhba, Jose Blanchet, Roger J. A. Laeven, Bert Zwart

Abstract: We study large deviations asymptotics for a class of unbounded additive functionals, interpreted as normalized accumulated areas, of one-dimensional Langevin diffusions with sub-linear gradient drifts. Our results provide parametric insights on the speed and the rate functions in terms of the growth rate of the drift and the growth rate of the additive functional. We find a critical value in terms… ▽ More We study large deviations asymptotics for a class of unbounded additive functionals, interpreted as normalized accumulated areas, of one-dimensional Langevin diffusions with sub-linear gradient drifts. Our results provide parametric insights on the speed and the rate functions in terms of the growth rate of the drift and the growth rate of the additive functional. We find a critical value in terms of these growth parameters that dictates regions of sub-linear speed for our large deviations asymptotics. Our approach is based upon various constructions of independent interest, including a decomposition of the diffusion process in terms of alternating renewal cycles and a detailed analysis of the paths during a cycle using suitable time and spatial scales. The key to the sub-linear behavior is a heavy-tailed large deviations phenomenon arising from the principle of a single big jump coupled with the result that at each regeneration cycle the upper-tail asymptotic behavior of the accumulated area of the diffusion process is proven to be semi-exponential (i.e., of heavy-tailed Weibull type). △ Less

Submitted 20 October, 2023; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: In this revision, we have: fixed a mistake in the proof of Lemma 4.3; suppressed some elementary technical details; fixed some typos; and further improved the presentation

MSC Class: 60F10 (Primary); 60J60 (Secondary)

arXiv:2110.06897 [pdf, other]

Machine Learning For Elliptic PDEs: Fast Rate Generalization Bound, Neural Scaling Law and Minimax Optimality

Authors: Yi** Lu, Haoxuan Chen, Jianfeng Lu, Lexing Ying, Jose Blanchet

Abstract: In this paper, we study the statistical limits of deep learning techniques for solving elliptic partial differential equations (PDEs) from random samples using the Deep Ritz Method (DRM) and Physics-Informed Neural Networks (PINNs). To simplify the problem, we focus on a prototype elliptic PDE: the Schrödinger equation on a hypercube with zero Dirichlet boundary condition, which has wide applicati… ▽ More In this paper, we study the statistical limits of deep learning techniques for solving elliptic partial differential equations (PDEs) from random samples using the Deep Ritz Method (DRM) and Physics-Informed Neural Networks (PINNs). To simplify the problem, we focus on a prototype elliptic PDE: the Schrödinger equation on a hypercube with zero Dirichlet boundary condition, which has wide application in the quantum-mechanical systems. We establish upper and lower bounds for both methods, which improves upon concurrently developed upper bounds for this problem via a fast rate generalization bound. We discover that the current Deep Ritz Methods is sub-optimal and propose a modified version of it. We also prove that PINN and the modified version of DRM can achieve minimax optimal bounds over Sobolev spaces. Empirically, following recent work which has shown that the deep model accuracy will improve with growing training sets according to a power law, we supply computational experiments to show a similar behavior of dimension dependent power law for deep PDE solvers. △ Less

Submitted 12 November, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: add a proof Proof Sketch in section 4.1

arXiv:2109.14875 [pdf, other]

Adversarial Regression with Doubly Non-negative Weighting Matrices

Authors: Tam Le, Truyen Nguyen, Makoto Yamada, Jose Blanchet, Viet Anh Nguyen

Abstract: Many machine learning tasks that involve predicting an output response can be solved by training a weighted regression model. Unfortunately, the predictive power of this type of models may severely deteriorate under low sample sizes or under covariate perturbations. Reweighting the training samples has aroused as an effective mitigation strategy to these problems. In this paper, we propose a novel… ▽ More Many machine learning tasks that involve predicting an output response can be solved by training a weighted regression model. Unfortunately, the predictive power of this type of models may severely deteriorate under low sample sizes or under covariate perturbations. Reweighting the training samples has aroused as an effective mitigation strategy to these problems. In this paper, we propose a novel and coherent scheme for kernel-reweighted regression by reparametrizing the sample weights using a doubly non-negative matrix. When the weighting matrix is confined in an uncertainty set using either the log-determinant divergence or the Bures-Wasserstein distance, we show that the adversarially reweighted estimate can be solved efficiently using first-order methods. Numerical experiments show that our reweighting strategy delivers promising results on numerous datasets. △ Less

Submitted 30 September, 2021; originally announced September 2021.

Comments: Accepted to the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS2021)

arXiv:2108.02120 [pdf, other]

Statistical Analysis of Wasserstein Distributionally Robust Estimators

Authors: Jose Blanchet, Karthyek Murthy, Viet Anh Nguyen

Abstract: We consider statistical methods which invoke a min-max distributionally robust formulation to extract good out-of-sample performance in data-driven optimization and learning problems. Acknowledging the distributional uncertainty in learning from limited samples, the min-max formulations introduce an adversarial inner player to explore unseen covariate data. The resulting Distributionally Robust Op… ▽ More We consider statistical methods which invoke a min-max distributionally robust formulation to extract good out-of-sample performance in data-driven optimization and learning problems. Acknowledging the distributional uncertainty in learning from limited samples, the min-max formulations introduce an adversarial inner player to explore unseen covariate data. The resulting Distributionally Robust Optimization (DRO) formulations, which include Wasserstein DRO formulations (our main focus), are specified using optimal transportation phenomena. Upon describing how these infinite-dimensional min-max problems can be approached via a finite-dimensional dual reformulation, the tutorial moves into its main component, namely, explaining a generic recipe for optimally selecting the size of the adversary's budget. This is achieved by studying the limit behavior of an optimal transport projection formulation arising from an inquiry on the smallest confidence region that includes the unknown population risk minimizer. Incidentally, this systematic prescription coincides with those in specific examples in high-dimensional statistics and results in error bounds that are free from the curse of dimensions. Equipped with this prescription, we present a central limit theorem for the DRO estimator and provide a recipe for constructing compatible confidence regions that are useful for uncertainty quantification. The rest of the tutorial is devoted to insights into the nature of the optimizers selected by the min-max formulations and additional applications of optimal transport projections. △ Less

Submitted 4 August, 2021; originally announced August 2021.

arXiv:2106.07191 [pdf, ps, other]

Distributionally Robust Martingale Optimal Transport

Authors: Zhengqing Zhou, Jose Blanchet, Peter W. Glynn

Abstract: We study the problem of bounding path-dependent expectations (within any finite time horizon $d$) over the class of discrete-time martingales whose marginal distributions lie within a prescribed tolerance of a given collection of benchmark marginal distributions. This problem is a relaxation of the martingale optimal transport (MOT) problem and is motivated by applications to super-hedging in fina… ▽ More We study the problem of bounding path-dependent expectations (within any finite time horizon $d$) over the class of discrete-time martingales whose marginal distributions lie within a prescribed tolerance of a given collection of benchmark marginal distributions. This problem is a relaxation of the martingale optimal transport (MOT) problem and is motivated by applications to super-hedging in financial markets. We show that the empirical version of our relaxed MOT problem can be approximated within $O\left( n^{-1/2}\right)$ error where $n$ is the number of samples of each of the individual marginal distributions (generated independently) and using a suitably constructed finite-dimensional linear programming problem. △ Less

Submitted 29 November, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

arXiv:2106.02263 [pdf, other]

doi 10.1016/j.spa.2022.12.007

Unbiased Optimal Stop** via the MUSE

Authors: Zhengqing Zhou, Guanyang Wang, Jose Blanchet, Peter W. Glynn

Abstract: We propose a new unbiased estimator for estimating the utility of the optimal stop** problem. The MUSE, short for Multilevel Unbiased Stop** Estimator, constructs the unbiased Multilevel Monte Carlo (MLMC) estimator at every stage of the optimal stop** problem in a backward recursive way. In contrast to traditional sequential methods, the MUSE can be implemented in parallel. We prove the MUS… ▽ More We propose a new unbiased estimator for estimating the utility of the optimal stop** problem. The MUSE, short for Multilevel Unbiased Stop** Estimator, constructs the unbiased Multilevel Monte Carlo (MLMC) estimator at every stage of the optimal stop** problem in a backward recursive way. In contrast to traditional sequential methods, the MUSE can be implemented in parallel. We prove the MUSE has finite variance, finite computational complexity, and achieves $ε$-accuracy with $O(1/ε^2)$ computational cost under mild conditions. We demonstrate MUSE empirically in an option pricing problem involving a high-dimensional input and the use of many parallel processors. △ Less

Submitted 26 December, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: 39 pages, add several numerical experiments and technical results, accepted by Stochastic Processes and their Applications

MSC Class: 62C05; 60G40; 62L15

arXiv:2106.01070 [pdf, ps, other]

Testing Group Fairness via Optimal Transport Projections

Authors: Nian Si, Karthyek Murthy, Jose Blanchet, Viet Anh Nguyen

Abstract: We present a statistical testing framework to detect if a given machine learning classifier fails to satisfy a wide range of group fairness notions. The proposed test is a flexible, interpretable, and statistically rigorous tool for auditing whether exhibited biases are intrinsic to the algorithm or due to the randomness in the data. The statistical challenges, which may arise from multiple impact… ▽ More We present a statistical testing framework to detect if a given machine learning classifier fails to satisfy a wide range of group fairness notions. The proposed test is a flexible, interpretable, and statistically rigorous tool for auditing whether exhibited biases are intrinsic to the algorithm or due to the randomness in the data. The statistical challenges, which may arise from multiple impact criteria that define group fairness and which are discontinuous on model parameters, are conveniently tackled by projecting the empirical measure onto the set of group-fair probability models using optimal transport. This statistic is efficiently computed using linear programming and its asymptotic distribution is explicitly obtained. The proposed framework can also be used to test for testing composite fairness hypotheses and fairness with multiple sensitive attributes. The optimal transport testing formulation improves interpretability by characterizing the minimal covariate perturbations that eliminate the bias observed in the audit. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Journal ref: International Conference on Machine Learning 2021

arXiv:2106.00322 [pdf, other]

Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts

Authors: Bahar Taskesen, Man-Chung Yue, Jose Blanchet, Daniel Kuhn, Viet Anh Nguyen

Abstract: Least squares estimators, when trained on a few target domain samples, may predict poorly. Supervised domain adaptation aims to improve the predictive accuracy by exploiting additional labeled training samples from a source distribution that is close to the target distribution. Given available data, we investigate novel strategies to synthesize a family of least squares estimator experts that are… ▽ More Least squares estimators, when trained on a few target domain samples, may predict poorly. Supervised domain adaptation aims to improve the predictive accuracy by exploiting additional labeled training samples from a source distribution that is close to the target distribution. Given available data, we investigate novel strategies to synthesize a family of least squares estimator experts that are robust with regard to moment conditions. When these moment conditions are specified using Kullback-Leibler or Wasserstein-type divergences, we can find the robust estimators efficiently using convex optimization. We use the Bernstein online aggregation algorithm on the proposed family of robust experts to generate predictions for the sequential stream of target test samples. Numerical experiments on real data show that the robust strategies may outperform non-robust interpolations of the empirical least squares estimators. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2103.16451 [pdf, other]

Robustifying Conditional Portfolio Decisions via Optimal Transport

Authors: Viet Anh Nguyen, Fan Zhang, Shanshan Wang, Jose Blanchet, Erick Delage, Yinyu Ye

Abstract: We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariat… ▽ More We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariate-return probability distribution in an optimal transport ambiguity set. Despite the non-linearity of the objective function in the probability measure, we show that the distributionally robust portfolio allocation with side information problem can be reformulated as a finite-dimensional optimization problem. If portfolio decisions are made based on either the mean-variance or the mean-Conditional Value-at-Risk criterion, the resulting reformulation can be further simplified to second-order or semi-definite cone programs. Empirical studies in the US equity market demonstrate the advantage of our integrative framework against other benchmarks. △ Less

Submitted 9 April, 2024; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: 1 figure

arXiv:2010.05373 [pdf, other]

Distributionally Robust Local Non-parametric Conditional Estimation

Authors: Viet Anh Nguyen, Fan Zhang, Jose Blanchet, Erick Delage, Yinyu Ye

Abstract: Conditional estimation given specific covariate values (i.e., local conditional estimation or functional estimation) is ubiquitously useful with applications in engineering, social and natural sciences. Existing data-driven non-parametric estimators mostly focus on structured homogeneous data (e.g., weakly independent and stationary data), thus they are sensitive to adversarial noise and may perfo… ▽ More Conditional estimation given specific covariate values (i.e., local conditional estimation or functional estimation) is ubiquitously useful with applications in engineering, social and natural sciences. Existing data-driven non-parametric estimators mostly focus on structured homogeneous data (e.g., weakly independent and stationary data), thus they are sensitive to adversarial noise and may perform poorly under a low sample size. To alleviate these issues, we propose a new distributionally robust estimator that generates non-parametric local estimates by minimizing the worst-case conditional expected loss over all adversarial distributions in a Wasserstein ambiguity set. We show that despite being generally intractable, the local estimator can be efficiently found via convex optimization under broadly applicable settings, and it is robust to the corruption and heterogeneity of the data. Experiments with synthetic and MNIST datasets show the competitive performance of this new class of estimators. △ Less

Submitted 11 October, 2020; originally announced October 2020.

arXiv:2010.05321 [pdf, ps, other]

Distributionally Robust Parametric Maximum Likelihood Estimation

Authors: Viet Anh Nguyen, Xuhui Zhang, Jose Blanchet, Angelos Georghiou

Abstract: We consider the parameter estimation problem of a probabilistic generative model prescribed using a natural exponential family of distributions. For this problem, the typical maximum likelihood estimator usually overfits under limited training sample size, is sensitive to noise and may perform poorly on downstream predictive tasks. To mitigate these issues, we propose a distributionally robust max… ▽ More We consider the parameter estimation problem of a probabilistic generative model prescribed using a natural exponential family of distributions. For this problem, the typical maximum likelihood estimator usually overfits under limited training sample size, is sensitive to noise and may perform poorly on downstream predictive tasks. To mitigate these issues, we propose a distributionally robust maximum likelihood estimator that minimizes the worst-case expected log-loss uniformly over a parametric Kullback-Leibler ball around a parametric nominal distribution. Leveraging the analytical expression of the Kullback-Leibler divergence between two distributions in the same natural exponential family, we show that the min-max estimation problem is tractable in a broad setting, including the robust training of generalized linear models. Our novel robust estimator also enjoys statistical consistency and delivers promising empirical results in both regression and classification tasks. △ Less

Submitted 11 October, 2020; originally announced October 2020.

arXiv:2007.09320 [pdf, ps, other]

Convolution Bounds on Quantile Aggregation

Authors: Jose Blanchet, Henry Lam, Yang Liu, Ruodu Wang

Abstract: Quantile aggregation with dependence uncertainty has a long history in probability theory with wide applications in finance, risk management, statistics, and operations research. Using a recent result on inf-convolution of quantile-based risk measures, we establish new analytical bounds for quantile aggregation which we call convolution bounds. Convolution bounds both unify every analytical result… ▽ More Quantile aggregation with dependence uncertainty has a long history in probability theory with wide applications in finance, risk management, statistics, and operations research. Using a recent result on inf-convolution of quantile-based risk measures, we establish new analytical bounds for quantile aggregation which we call convolution bounds. Convolution bounds both unify every analytical result available in quantile aggregation and enlighten our understanding of these methods. These bounds are the best available in general. Moreover, convolution bounds are easy to compute, and we show that they are sharp in many relevant cases. They also allow for interpretability on the extremal dependence structure. The results directly lead to bounds on the distribution of the sum of random variables with arbitrary dependence. We discuss relevant applications in risk management and economics. △ Less

Submitted 24 April, 2024; v1 submitted 17 July, 2020; originally announced July 2020.

arXiv:2006.05630 [pdf, other]

Distributionally Robust Batch Contextual Bandits

Authors: Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet

Abstract: Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the s… ▽ More Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset. △ Less

Submitted 11 September, 2023; v1 submitted 9 June, 2020; originally announced June 2020.

Comments: The short version has been accepted in ICML 2020

arXiv:2003.14381 [pdf, other]

Sample-path large deviations for unbounded additive functionals of the reflected random walk

Authors: Mihail Bazhba, Jose Blanchet, Chang-Han Rhee, Bert Zwart

Abstract: We prove a sample path large deviation principle (LDP) with sub-linear speed for unbounded functionals of certain Markov chains induced by the Lindley recursion. The LDP holds in the Skorokhod space $\mathbb{D}[0,T]$ equipped with the $M_1'$ topology. Our technique hinges on a suitable decomposition of the Markov chain in terms of regeneration cycles. Each regeneration cycle denotes the area accum… ▽ More We prove a sample path large deviation principle (LDP) with sub-linear speed for unbounded functionals of certain Markov chains induced by the Lindley recursion. The LDP holds in the Skorokhod space $\mathbb{D}[0,T]$ equipped with the $M_1'$ topology. Our technique hinges on a suitable decomposition of the Markov chain in terms of regeneration cycles. Each regeneration cycle denotes the area accumulated during the busy period of the reflected random walk. We prove a large deviation principle for the area under the busy period of the MRW, and we show that it exhibits a heavy-tailed behavior. △ Less

Submitted 30 September, 2023; v1 submitted 31 March, 2020; originally announced March 2020.

MSC Class: 60F10 (Primary); 60G17 (Secondary)

arXiv:2002.03205 [pdf, other]

Asymptotically Optimal Control of a Centralized Dynamic Matching Market with General Utilities

Authors: Jose H. Blanchet, Martin I. Reiman, Viragh Shah, Lawrence M. Wein, Linjia Wu

Abstract: We consider a matching market where buyers and sellers arrive according to independent Poisson processes at the same rate and independently abandon the market if not matched after an exponential amount of time with the same mean. In this centralized market, the utility for the system manager from matching any buyer and any seller is a general random variable. We consider a sequence of systems inde… ▽ More We consider a matching market where buyers and sellers arrive according to independent Poisson processes at the same rate and independently abandon the market if not matched after an exponential amount of time with the same mean. In this centralized market, the utility for the system manager from matching any buyer and any seller is a general random variable. We consider a sequence of systems indexed by $n$ where the arrivals in the $n^{\mathrm{th}}$ system are sped up by a factor of $n$. We analyze two families of one-parameter policies: the population threshold policy immediately matches an arriving agent to its best available mate only if the number of mates in the system is above a threshold, and the utility threshold policy matches an arriving agent to its best available mate only if the corresponding utility is above a threshold. Using a fluid analysis of the two-dimensional Markov process of buyers and sellers, we show that when the matching utility distribution is light-tailed, the population threshold policy with threshold $\frac{n}{\ln n}$ is asymptotically optimal among all policies that make matches only at agent arrival epochs. In the heavy-tailed case, we characterize the optimal threshold level for both policies. We also study the utility threshold policy in an unbalanced matching market with heavy-tailed matching utilities and find that the buyers and sellers have the same asymptotically optimal utility threshold. We derive optimal thresholds when the matching utility distribution is exponential, uniform, Pareto, and correlated Pareto. We find that as the right tail of the matching utility distribution gets heavier, the threshold level of each policy (and hence market thickness) increases, as does the magnitude by which the utility threshold policy outperforms the population threshold policy. △ Less

Submitted 10 June, 2021; v1 submitted 8 February, 2020; originally announced February 2020.

Comments: 81 pages

MSC Class: 90B50 (primary); 90B22 (secondary) ACM Class: G.3

arXiv:2002.02149 [pdf, other]

Efficient Scenario Generation for Heavy-tailed Chance Constrained Optimization

Authors: Jose Blanchet, Fan Zhang, Bert Zwart

Abstract: We consider a generic class of chance-constrained optimization problems with heavy-tailed (i.e., power-law type) risk factors. In this setting, we use the scenario approach to obtain a constant approximation to the optimal solution with a computational complexity that is uniform in the risk tolerance parameter. We additionally illustrate the efficiency of our algorithm in the context of solvency i… ▽ More We consider a generic class of chance-constrained optimization problems with heavy-tailed (i.e., power-law type) risk factors. In this setting, we use the scenario approach to obtain a constant approximation to the optimal solution with a computational complexity that is uniform in the risk tolerance parameter. We additionally illustrate the efficiency of our algorithm in the context of solvency in insurance networks. △ Less

Submitted 7 May, 2023; v1 submitted 6 February, 2020; originally announced February 2020.

Comments: 31pages, 7 figure

arXiv:2001.08384 [pdf, other]

Efficient Steady-state Simulation of High-dimensional Stochastic Networks

Authors: Jose Blanchet, Xinyun Chen, Peter Glynn, Nian Si

Abstract: We propose and study an asymptotically optimal Monte Carlo estimator for steady-state expectations of a d-dimensional reflected Brownian motion. Our estimator is asymptotically optimal in the sense that it requires $\tilde{O}(d)$ (up to logarithmic factors in $d$) i.i.d. Gaussian random variables in order to output an estimate with a controlled error. Our construction is based on the analysis of a… ▽ More We propose and study an asymptotically optimal Monte Carlo estimator for steady-state expectations of a d-dimensional reflected Brownian motion. Our estimator is asymptotically optimal in the sense that it requires $\tilde{O}(d)$ (up to logarithmic factors in $d$) i.i.d. Gaussian random variables in order to output an estimate with a controlled error. Our construction is based on the analysis of a suitable multi-level Monte Carlo strategy which, we believe, can be applied widely. This is the first algorithm with linear complexity (under suitable regularity conditions) for steady-state estimation of RBM as the dimension increases. △ Less

Submitted 27 January, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

arXiv:1906.03317 [pdf, ps, other]

Optimal Transport Relaxations with Application to Wasserstein GANs

Authors: Saied Mahdian, Jose Blanchet, Peter Glynn

Abstract: We propose a family of relaxations of the optimal transport problem which regularize the problem by introducing an additional minimization step over a small region around one of the underlying transporting measures. The type of regularization that we obtain is related to smoothing techniques studied in the optimization literature. When using our approach to estimate optimal transport costs based o… ▽ More We propose a family of relaxations of the optimal transport problem which regularize the problem by introducing an additional minimization step over a small region around one of the underlying transporting measures. The type of regularization that we obtain is related to smoothing techniques studied in the optimization literature. When using our approach to estimate optimal transport costs based on empirical measures, we obtain statistical learning bounds which are useful to guide the amount of regularization, while maintaining good generalization properties. To illustrate the computational advantages of our regularization approach, we apply our method to training Wasserstein GANs. We obtain running time improvements, relative to current benchmarks, with no deterioration in testing performance (via FID). The running time improvement occurs because our new optimality-based threshold criterion reduces the number of expensive iterates of the generating networks, while increasing the number of actor-critic iterations. △ Less

Submitted 7 June, 2019; originally announced June 2019.

arXiv:1906.01614 [pdf, ps, other]

Confidence Regions in Wasserstein Distributionally Robust Estimation

Authors: Jose Blanchet, Karthyek Murthy, Nian Si

Abstract: Wasserstein distributionally robust optimization estimators are obtained as solutions of min-max problems in which the statistician selects a parameter minimizing the worst-case loss among all probability models within a certain distance (in a Wasserstein sense) from the underlying empirical measure. While motivated by the need to identify optimal model parameters or decision choices that are robu… ▽ More Wasserstein distributionally robust optimization estimators are obtained as solutions of min-max problems in which the statistician selects a parameter minimizing the worst-case loss among all probability models within a certain distance (in a Wasserstein sense) from the underlying empirical measure. While motivated by the need to identify optimal model parameters or decision choices that are robust to model misspecification, these distributionally robust estimators recover a wide range of regularized estimators, including square-root lasso and support vector machines, among others, as particular cases. This paper studies the asymptotic normality of these distributionally robust estimators as well as the properties of an optimal (in a suitable sense) confidence region induced by the Wasserstein distributionally robust optimization formulation. In addition, key properties of min-max distributionally robust optimization problems are also studied, for example, we show that distributionally robust estimators regularize the loss based on its derivative and we also derive general sufficient conditions which show the equivalence between the min-max distributionally robust optimization problem and the corresponding max-min formulation. △ Less

Submitted 3 March, 2021; v1 submitted 4 June, 2019; originally announced June 2019.

arXiv:1905.12231 [pdf, other]

Multivariate Distributionally Robust Convex Regression under Absolute Error Loss

Authors: Jose Blanchet, Peter W. Glynn, Jun Yan, Zhengqing Zhou

Abstract: This paper proposes a novel non-parametric multidimensional convex regression estimator which is designed to be robust to adversarial perturbations in the empirical measure. We minimize over convex functions the maximum (over Wasserstein perturbations of the empirical measure) of the absolute regression errors. The inner maximization is solved in closed form resulting in a regularization penalty i… ▽ More This paper proposes a novel non-parametric multidimensional convex regression estimator which is designed to be robust to adversarial perturbations in the empirical measure. We minimize over convex functions the maximum (over Wasserstein perturbations of the empirical measure) of the absolute regression errors. The inner maximization is solved in closed form resulting in a regularization penalty involves the norm of the gradient. We show consistency of our estimator and a rate of convergence of order $ \widetilde{O}\left( n^{-1/d}\right) $, matching the bounds of alternative estimators based on square-loss minimization. Contrary to all of the existing results, our convergence rates hold without imposing compactness on the underlying domain and with no a priori bounds on the underlying convex function or its gradient norm. △ Less

Submitted 25 July, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: v3. 17 pages, 2 figures

MSC Class: 62H12; 62G20; 62G05

arXiv:1905.07845 [pdf, other]

doi 10.1109/WSC40007.2019.9004804

A Distributionally Robust Boosting Algorithm

Authors: Jose Blanchet, Yang Kang, Fan Zhang, Zhangyi Hu

Abstract: Distributionally Robust Optimization (DRO) has been shown to provide a flexible framework for decision making under uncertainty and statistical estimation. For example, recent works in DRO have shown that popular statistical estimators can be interpreted as the solutions of suitable formulated data-driven DRO problems. In turn, this connection is used to optimally select tuning parameters in terms… ▽ More Distributionally Robust Optimization (DRO) has been shown to provide a flexible framework for decision making under uncertainty and statistical estimation. For example, recent works in DRO have shown that popular statistical estimators can be interpreted as the solutions of suitable formulated data-driven DRO problems. In turn, this connection is used to optimally select tuning parameters in terms of a principled approach informed by robustness considerations. This paper contributes to this growing literature, connecting DRO and statistics, by showing how boosting algorithms can be studied via DRO. We propose a boosting type algorithm, named DRO-Boosting, as a procedure to solve our DRO formulation. Our DRO-Boosting algorithm recovers Adaptive Boosting (AdaBoost) in particular, thus showing that AdaBoost is effectively solving a DRO problem. We apply our algorithm to a financial dataset on credit card default payment prediction. We find that our approach compares favorably to alternative boosting methods which are widely used in practice. △ Less

Submitted 19 May, 2019; originally announced May 2019.

Comments: 13 pages, 1 figure

arXiv:1904.09929 [pdf, other]

Unbiased Multilevel Monte Carlo: Stochastic Optimization, Steady-state Simulation, Quantiles, and Other Applications

Authors: Jose H. Blanchet, Peter W. Glynn, Yanan Pei

Abstract: We present general principles for the design and analysis of unbiased Monte Carlo estimators in a wide range of settings. Our estimators posses finite work-normalized variance under mild regularity conditions. We apply our estimators to various settings of interest, including unbiased optimization in Sample Average Approximations, unbiased steady-state simulation of regenerative processes, quantil… ▽ More We present general principles for the design and analysis of unbiased Monte Carlo estimators in a wide range of settings. Our estimators posses finite work-normalized variance under mild regularity conditions. We apply our estimators to various settings of interest, including unbiased optimization in Sample Average Approximations, unbiased steady-state simulation of regenerative processes, quantile estimation and nested simulation problems. △ Less

Submitted 22 April, 2019; originally announced April 2019.

Comments: 20 pages, 2 figures

Showing 1–50 of 98 results for author: Blanchet, J