-
Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity
Authors:
Haoxuan Chen,
Yinuo Ren,
Lexing Ying,
Grant M. Rotskoff
Abstract:
Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique~\cite{shih2024parallel}, we propose to divide the sampling proce…
▽ More
Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique~\cite{shih2024parallel}, we propose to divide the sampling process into $\mathcal{O}(1)$ blocks with parallelizable Picard iterations within each block. Rigorous theoretical analysis reveals that our algorithm achieves $\widetilde{\mathcal{O}}(\mathrm{poly} \log d)$ overall time complexity, marking the first implementation with provable sub-linear complexity w.r.t. the data dimension $d$. Our analysis is based on a generalized version of Girsanov's theorem and is compatible with both the SDE and probability flow ODE implementations. Our results shed light on the potential of fast and efficient sampling of high-dimensional data on fast-evolving modern large-memory GPU clusters.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence
Authors:
Minheng Xiao,
Xian Yu,
Lei Ying
Abstract:
Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in many high-stakes applications. While most RL methods aim to learn a point estimate of the random cumulative cost, distributional RL (DRL) seeks to estimate the entire distribution of it. The distribution provides all necessary information about the cost and leads to a unified framework for handling variou…
▽ More
Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in many high-stakes applications. While most RL methods aim to learn a point estimate of the random cumulative cost, distributional RL (DRL) seeks to estimate the entire distribution of it. The distribution provides all necessary information about the cost and leads to a unified framework for handling various risk measures in a risk-sensitive setting. However, develo** policy gradient methods for risk-sensitive DRL is inherently more complex as it pertains to finding the gradient of a probability measure. This paper introduces a policy gradient method for risk-sensitive DRL with general coherent risk measures, where we provide an analytical form of the probability measure's gradient. We further prove the local convergence of the proposed algorithm under mild smoothness assumptions. For practical use, we also design a categorical distributional policy gradient algorithm (CDPG) based on categorical distributional policy evaluation and trajectory-based gradient estimation. Through experiments on a stochastic cliff-walking environment, we illustrate the benefits of considering a risk-sensitive setting in DRL.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
A note on continuous-time online learning
Authors:
Lexing Ying
Abstract:
In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online learning problems: online linear optimization, adversarial bandit, and adversarial linear bandit. For each problem, we extend the discrete-time algorithm to the conti…
▽ More
In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online learning problems: online linear optimization, adversarial bandit, and adversarial linear bandit. For each problem, we extend the discrete-time algorithm to the continuous-time setting and provide a concise proof of the optimal regret bound.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Quantum wave packet transforms with compact frequency support
Authors:
Hongkang Ni,
Lexing Ying
Abstract:
Different kinds of wave packet transforms are widely used for extracting multi-scale structures in signal processing tasks. This paper introduces the quantum circuit implementation of a broad class of wave packets, including Gabor atoms and wavelets, with compact frequency support. Our approach operates in the frequency space, involving reallocation and reshuffling of signals tailored for manipula…
▽ More
Different kinds of wave packet transforms are widely used for extracting multi-scale structures in signal processing tasks. This paper introduces the quantum circuit implementation of a broad class of wave packets, including Gabor atoms and wavelets, with compact frequency support. Our approach operates in the frequency space, involving reallocation and reshuffling of signals tailored for manipulation on quantum computers. The resulting implementation is different from the existing quantum algorithms for spatially compactly supported wavelets and can be readily extended to quantum transforms of other wave packets with compact frequency support.
△ Less
Submitted 3 May, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
A perturbative analysis for noisy spectral estimation
Authors:
Lexing Ying
Abstract:
Spectral estimation is a fundamental task in signal processing. Recent algorithms in quantum phase estimation are concerned with the large noise, large frequency regime of the spectral estimation problem. The recent work in Ding-Epperly-Lin-Zhang shows that the ESPRIT algorithm exhibits superconvergence behavior for the spike locations in terms of the maximum frequency. This note provides a pertur…
▽ More
Spectral estimation is a fundamental task in signal processing. Recent algorithms in quantum phase estimation are concerned with the large noise, large frequency regime of the spectral estimation problem. The recent work in Ding-Epperly-Lin-Zhang shows that the ESPRIT algorithm exhibits superconvergence behavior for the spike locations in terms of the maximum frequency. This note provides a perturbative analysis to explain this behavior. It also extends the discussion to the case where the noise grows with the sampling frequency.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty
Authors:
Kaizhao Liu,
Jose Blanchet,
Lexing Ying,
Yi** Lu
Abstract:
Bootstrap is a popular methodology for simulating input uncertainty. However, it can be computationally expensive when the number of samples is large. We propose a new approach called \textbf{Orthogonal Bootstrap} that reduces the number of required Monte Carlo replications. We decomposes the target being simulated into two parts: the \textit{non-orthogonal part} which has a closed-form result kno…
▽ More
Bootstrap is a popular methodology for simulating input uncertainty. However, it can be computationally expensive when the number of samples is large. We propose a new approach called \textbf{Orthogonal Bootstrap} that reduces the number of required Monte Carlo replications. We decomposes the target being simulated into two parts: the \textit{non-orthogonal part} which has a closed-form result known as Infinitesimal Jackknife and the \textit{orthogonal part} which is easier to be simulated. We theoretically and numerically show that Orthogonal Bootstrap significantly reduces the computational cost of Bootstrap while improving empirical accuracy and maintaining the same width of the constructed interval.
△ Less
Submitted 30 April, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Solving high-dimensional Kolmogorov backward equations with functional hierarchical tensor operators
Authors:
Xun Tang,
Leah Collis,
Lexing Ying
Abstract:
Solving high-dimensional partial differential equations necessitates methods free of exponential scaling in the dimension of the problem. This work introduces a tensor network approach for the Kolmogorov backward equation via approximating directly the Markov operator. We show that the high-dimensional Markov operator can be obtained under a functional hierarchical tensor (FHT) ansatz with a hiera…
▽ More
Solving high-dimensional partial differential equations necessitates methods free of exponential scaling in the dimension of the problem. This work introduces a tensor network approach for the Kolmogorov backward equation via approximating directly the Markov operator. We show that the high-dimensional Markov operator can be obtained under a functional hierarchical tensor (FHT) ansatz with a hierarchical sketching algorithm. When the terminal condition admits an FHT ansatz, the proposed operator outputs an FHT ansatz for the PDE solution through an efficient functional tensor network contraction procedure. In addition, the proposed operator-based approach also provides an efficient way to solve the Kolmogorov forward equation when the initial distribution is in an FHT ansatz. We apply the proposed approach successfully to two challenging time-dependent Ginzburg-Landau models with hundreds of variables.
△ Less
Submitted 22 April, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
Learning-Based Pricing and Matching for Two-Sided Queues
Authors:
Zixian Yang,
Lei Ying
Abstract:
We consider a dynamic system with multiple types of customers and servers. Each type of waiting customer or server joins a separate queue, forming a bipartite graph with customer-side queues and server-side queues. The platform can match the servers and customers if their types are compatible. The matched pairs then leave the system. The platform will charge a customer a price according to their t…
▽ More
We consider a dynamic system with multiple types of customers and servers. Each type of waiting customer or server joins a separate queue, forming a bipartite graph with customer-side queues and server-side queues. The platform can match the servers and customers if their types are compatible. The matched pairs then leave the system. The platform will charge a customer a price according to their type when they arrive and will pay a server a price according to their type. The arrival rate of each queue is determined by the price according to some unknown demand or supply functions. Our goal is to design pricing and matching algorithms to maximize the profit of the platform with unknown demand and supply functions, while kee** queue lengths of both customers and servers below a predetermined threshold. This system can be used to model two-sided markets such as ride-sharing markets with passengers and drivers. The difficulties of the problem include simultaneous learning and decision making, and the tradeoff between maximizing profit and minimizing queue length. We use a longest-queue-first matching algorithm and propose a learning-based pricing algorithm, which combines gradient-free stochastic projected gradient ascent with bisection search. We prove that our proposed algorithm yields a sublinear regret $\tilde{O}(T^{5/6})$ and queue-length bound $\tilde{O}(T^{2/3})$, where $T$ is the time horizon. We further establish a tradeoff between the regret bound and the queue-length bound: $\tilde{O}(T^{1-γ/4})$ versus $\tilde{O}(T^γ)$ for $γ\in (0, 2/3].$
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
A Sinkhorn-type Algorithm for Constrained Optimal Transport
Authors:
Xun Tang,
Holakou Rahmanian,
Michael Shavlovsky,
Kiran Koshy Thekumparampil,
Tesi Xiao,
Lexing Ying
Abstract:
Entropic optimal transport (OT) and the Sinkhorn algorithm have made it practical for machine learning practitioners to perform the fundamental task of calculating transport distance between statistical distributions. In this work, we focus on a general class of OT problems under a combination of equality and inequality constraints. We derive the corresponding entropy regularization formulation an…
▽ More
Entropic optimal transport (OT) and the Sinkhorn algorithm have made it practical for machine learning practitioners to perform the fundamental task of calculating transport distance between statistical distributions. In this work, we focus on a general class of OT problems under a combination of equality and inequality constraints. We derive the corresponding entropy regularization formulation and introduce a Sinkhorn-type algorithm for such constrained OT problems supported by theoretical guarantees. We first bound the approximation error when solving the problem through entropic regularization, which reduces exponentially with the increase of the regularization parameter. Furthermore, we prove a sublinear first-order convergence rate of the proposed Sinkhorn-type algorithm in the dual space by characterizing the optimization procedure with a Lyapunov function. To achieve fast and higher-order convergence under weak entropy regularization, we augment the Sinkhorn-type algorithm with dynamic regularization scheduling and second-order acceleration. Overall, this work systematically combines recent theoretical and numerical advances in entropic optimal transport with the constrained case, allowing practitioners to derive approximate transport plans in complex scenarios.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Multidimensional unstructured sparse recovery via eigenmatrix
Authors:
Lexing Ying
Abstract:
This note considers the multidimensional unstructured sparse recovery problems. Examples include Fourier inversion and sparse deconvolution. The eigenmatrix is a data-driven construction with desired approximate eigenvalues and eigenvectors proposed for the one-dimensional problems. This note extends the eigenmatrix approach to multidimensional problems. Numerical results are provided to demonstra…
▽ More
This note considers the multidimensional unstructured sparse recovery problems. Examples include Fourier inversion and sparse deconvolution. The eigenmatrix is a data-driven construction with desired approximate eigenvalues and eigenvectors proposed for the one-dimensional problems. This note extends the eigenmatrix approach to multidimensional problems. Numerical results are provided to demonstrate the performance of the proposed method.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
A sublinear-time randomized algorithm for column and row subset selection based on strong rank-revealing QR factorizations
Authors:
Alice Cortinovis,
Lexing Ying
Abstract:
In this work, we analyze a sublinear-time algorithm for selecting a few rows and columns of a matrix for low-rank approximation purposes. The algorithm is based on an initial uniformly random selection of rows and columns, followed by a refinement of this choice using a strong rank-revealing QR factorization. We prove bounds on the error of the corresponding low-rank approximation (more precisely,…
▽ More
In this work, we analyze a sublinear-time algorithm for selecting a few rows and columns of a matrix for low-rank approximation purposes. The algorithm is based on an initial uniformly random selection of rows and columns, followed by a refinement of this choice using a strong rank-revealing QR factorization. We prove bounds on the error of the corresponding low-rank approximation (more precisely, the CUR approximation error) when the matrix is a perturbation of a low-rank matrix that can be factorized into the product of matrices with suitable incoherence and/or sparsity assumptions.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Ensemble-Based Annealed Importance Sampling
Authors:
Haoxuan Chen,
Lexing Ying
Abstract:
Sampling from a multimodal distribution is a fundamental and challenging problem in computational science and statistics. Among various approaches proposed for this task, one popular method is Annealed Importance Sampling (AIS). In this paper, we propose an ensemble-based version of AIS by combining it with population-based Monte Carlo methods to improve its efficiency. By kee** track of an ense…
▽ More
Sampling from a multimodal distribution is a fundamental and challenging problem in computational science and statistics. Among various approaches proposed for this task, one popular method is Annealed Importance Sampling (AIS). In this paper, we propose an ensemble-based version of AIS by combining it with population-based Monte Carlo methods to improve its efficiency. By kee** track of an ensemble instead of a single particle along some continuation path between the starting distribution and the target distribution, we take advantage of the interaction within the ensemble to encourage the exploration of undiscovered modes. Specifically, our main idea is to utilize either the snooker algorithm or the genetic algorithm used in Evolutionary Monte Carlo. We discuss how the proposed algorithm can be implemented and derive a partial differential equation governing the evolution of the ensemble under the continuous time and mean-field limit. We also test the efficiency of the proposed algorithm on various continuous and discrete distributions.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Accelerating Sinkhorn Algorithm with Sparse Newton Iterations
Authors:
Xun Tang,
Michael Shavlovsky,
Holakou Rahmanian,
Elisa Tardini,
Kiran Koshy Thekumparampil,
Tesi Xiao,
Lexing Ying
Abstract:
Computing the optimal transport distance between statistical distributions is a fundamental task in machine learning. One remarkable recent advancement is entropic regularization and the Sinkhorn algorithm, which utilizes only matrix scaling and guarantees an approximated solution with near-linear runtime. Despite the success of the Sinkhorn algorithm, its runtime may still be slow due to the pote…
▽ More
Computing the optimal transport distance between statistical distributions is a fundamental task in machine learning. One remarkable recent advancement is entropic regularization and the Sinkhorn algorithm, which utilizes only matrix scaling and guarantees an approximated solution with near-linear runtime. Despite the success of the Sinkhorn algorithm, its runtime may still be slow due to the potentially large number of iterations needed for convergence. To achieve possibly super-exponential convergence, we present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm, by introducing early stop** for the matrix scaling steps and a second stage featuring a Newton-type subroutine. Adopting the variational viewpoint that the Sinkhorn algorithm maximizes a concave Lyapunov potential, we offer the insight that the Hessian matrix of the potential function is approximately sparse. Sparsification of the Hessian results in a fast $O(n^2)$ per-iteration complexity, the same as the Sinkhorn algorithm. In terms of total iteration count, we observe that the SNS algorithm converges orders of magnitude faster across a wide range of practical cases, including optimal transportation between empirical distributions and calculating the Wasserstein $W_1, W_2$ distance of discretized densities. The empirical performance is corroborated by a rigorous bound on the approximate sparsity of the Hessian matrix.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Quantum Hamiltonian Learning for the Fermi-Hubbard Model
Authors:
Hongkang Ni,
Haoya Li,
Lexing Ying
Abstract:
This work proposes a protocol for Fermionic Hamiltonian learning. For the Hubbard model defined on a bounded-degree graph, the Heisenberg-limited scaling is achieved while allowing for state preparation and measurement errors. To achieve $ε$-accurate estimation for all parameters, only $\tilde{\mathcal{O}}(ε^{-1})$ total evolution time is needed, and the constant factor is independent of the syste…
▽ More
This work proposes a protocol for Fermionic Hamiltonian learning. For the Hubbard model defined on a bounded-degree graph, the Heisenberg-limited scaling is achieved while allowing for state preparation and measurement errors. To achieve $ε$-accurate estimation for all parameters, only $\tilde{\mathcal{O}}(ε^{-1})$ total evolution time is needed, and the constant factor is independent of the system size. Moreover, our method only involves simple one or two-site Fermionic manipulations, which is desirable for experiment implementation.
△ Less
Submitted 1 May, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Solving high-dimensional Fokker-Planck equation with functional hierarchical tensor
Authors:
Xun Tang,
Lexing Ying
Abstract:
This work is concerned with solving high-dimensional Fokker-Planck equations with the novel perspective that solving the PDE can be reduced to independent instances of density estimation tasks based on the trajectories sampled from its associated particle dynamics. With this approach, one sidesteps error accumulation occurring from integrating the PDE dynamics on a parameterized function class. Th…
▽ More
This work is concerned with solving high-dimensional Fokker-Planck equations with the novel perspective that solving the PDE can be reduced to independent instances of density estimation tasks based on the trajectories sampled from its associated particle dynamics. With this approach, one sidesteps error accumulation occurring from integrating the PDE dynamics on a parameterized function class. This approach significantly simplifies deployment, as one is free of the challenges of implementing loss terms based on the differential equation. In particular, we introduce a novel class of high-dimensional functions called the functional hierarchical tensor (FHT). The FHT ansatz leverages a hierarchical low-rank structure, offering the advantage of linearly scalable runtime and memory complexity relative to the dimension count. We introduce a sketching-based technique that performs density estimation over particles simulated from the particle dynamics associated with the equation, thereby obtaining a representation of the Fokker-Planck solution in terms of our ansatz. We apply the proposed approach successfully to three challenging time-dependent Ginzburg-Landau models with hundreds of variables.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Statistical Spatially Inhomogeneous Diffusion Inference
Authors:
Yinuo Ren,
Yi** Lu,
Lexing Ying,
Grant M. Rotskoff
Abstract:
Inferring a diffusion equation from discretely-observed measurements is a statistical challenge of significant importance in a variety of fields, from single-molecule tracking in biophysical systems to modeling financial instruments. Assuming that the underlying dynamical process obeys a $d$-dimensional stochastic differential equation of the form…
▽ More
Inferring a diffusion equation from discretely-observed measurements is a statistical challenge of significant importance in a variety of fields, from single-molecule tracking in biophysical systems to modeling financial instruments. Assuming that the underlying dynamical process obeys a $d$-dimensional stochastic differential equation of the form $$\mathrm{d}\boldsymbol{x}_t=\boldsymbol{b}(\boldsymbol{x}_t)\mathrm{d} t+Σ(\boldsymbol{x}_t)\mathrm{d}\boldsymbol{w}_t,$$ we propose neural network-based estimators of both the drift $\boldsymbol{b}$ and the spatially-inhomogeneous diffusion tensor $D = ΣΣ^{T}$ and provide statistical convergence guarantees when $\boldsymbol{b}$ and $D$ are $s$-Hölder continuous. Notably, our bound aligns with the minimax optimal rate $N^{-\frac{2s}{2s+d}}$ for nonparametric function estimation even in the presence of correlation within observational data, which necessitates careful handling when establishing fast-rate generalization bounds. Our theoretical results are bolstered by numerical experiments demonstrating accurate inference of spatially-inhomogeneous diffusion tensors.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Eigenmatrix for unstructured sparse recovery
Authors:
Lexing Ying
Abstract:
This note considers the unstructured sparse recovery problems in a general form. Examples include rational approximation, spectral function estimation, Fourier inversion, Laplace inversion, and sparse deconvolution. The main challenges are the noise in the sample values and the unstructured nature of the sample locations. This note proposes the eigenmatrix, a data-driven construction with desired…
▽ More
This note considers the unstructured sparse recovery problems in a general form. Examples include rational approximation, spectral function estimation, Fourier inversion, Laplace inversion, and sparse deconvolution. The main challenges are the noise in the sample values and the unstructured nature of the sample locations. This note proposes the eigenmatrix, a data-driven construction with desired approximate eigenvalues and eigenvectors. The eigenmatrix offers a new way for these sparse recovery problems. Numerical results are provided to demonstrate the efficiency of the proposed method.
△ Less
Submitted 7 March, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Multi-Objective Optimization via Wasserstein-Fisher-Rao Gradient Flow
Authors:
Yinuo Ren,
Tesi Xiao,
Tanmay Gangwani,
Anshuka Rangi,
Holakou Rahmanian,
Lexing Ying,
Subhajit Sanyal
Abstract:
Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications. We introduce a novel interacting particle method for MOO inspired by molecular dynamics simulations. Our approach combines overdamped Langevin and birth-death dynamics, incorporating a "dominance potential" to steer particles toward global Pareto optimality. In contrast to pr…
▽ More
Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications. We introduce a novel interacting particle method for MOO inspired by molecular dynamics simulations. Our approach combines overdamped Langevin and birth-death dynamics, incorporating a "dominance potential" to steer particles toward global Pareto optimality. In contrast to previous methods, our method is able to relocate dominated particles, making it particularly adept at managing Pareto fronts of complicated geometries. Our method is also theoretically grounded as a Wasserstein-Fisher-Rao gradient flow with convergence guarantees. Extensive experiments confirm that our approach outperforms state-of-the-art methods on challenging synthetic and real-world datasets.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Multimodal Sampling via Approximate Symmetries
Authors:
Lexing Ying
Abstract:
Sampling from multimodal distributions is a challenging task in scientific computing. When a distribution has an exact symmetry between the modes, direct jumps among them can accelerate the samplings significantly. However, the distributions from most applications do not have exact symmetries. This paper considers the distributions with approximate symmetries. We first construct an exactly symmetr…
▽ More
Sampling from multimodal distributions is a challenging task in scientific computing. When a distribution has an exact symmetry between the modes, direct jumps among them can accelerate the samplings significantly. However, the distributions from most applications do not have exact symmetries. This paper considers the distributions with approximate symmetries. We first construct an exactly symmetric reference distribution from the target one by averaging over the group orbit associated with the approximate symmetry. Next, we can apply the multilevel Monte Carlo methods by constructing a continuation path between the reference and target distributions. We discuss how to implement these steps with annealed importance sampling and tempered transitions. Compared with traditional multilevel methods, the proposed approach can be more effective since the reference and target distributions are much closer. Numerical results of the Ising models are presented to illustrate the efficiency of the proposed method.
△ Less
Submitted 4 January, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Heisenberg-limited Hamiltonian learning for interacting bosons
Authors:
Haoya Li,
Yu Tong,
Hongkang Ni,
Tuvia Gefen,
Lexing Ying
Abstract:
We develop a protocol for learning a class of interacting bosonic Hamiltonians from dynamics with Heisenberg-limited scaling. For Hamiltonians with an underlying bounded-degree graph structure, we can learn all parameters with root mean squared error $ε$ using $\mathcal{O}(1/ε)$ total evolution time, which is independent of the system size, in a way that is robust against state-preparation and mea…
▽ More
We develop a protocol for learning a class of interacting bosonic Hamiltonians from dynamics with Heisenberg-limited scaling. For Hamiltonians with an underlying bounded-degree graph structure, we can learn all parameters with root mean squared error $ε$ using $\mathcal{O}(1/ε)$ total evolution time, which is independent of the system size, in a way that is robust against state-preparation and measurement error. In the protocol, we only use bosonic coherent states, beam splitters, phase shifters, and homodyne measurements, which are easy to implement on many experimental platforms. A key technique we develop is to apply random unitaries to enforce symmetry in the effective Hamiltonian, which may be of independent interest.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
When can Regression-Adjusted Control Variates Help? Rare Events, Sobolev Embedding and Minimax Optimality
Authors:
Jose Blanchet,
Haoxuan Chen,
Yi** Lu,
Lexing Ying
Abstract:
This paper studies the use of a machine learning-based estimator as a control variate for mitigating the variance of Monte Carlo sampling. Specifically, we seek to uncover the key factors that influence the efficiency of control variates in reducing variance. We examine a prototype estimation problem that involves simulating the moments of a Sobolev function based on observations obtained from (ra…
▽ More
This paper studies the use of a machine learning-based estimator as a control variate for mitigating the variance of Monte Carlo sampling. Specifically, we seek to uncover the key factors that influence the efficiency of control variates in reducing variance. We examine a prototype estimation problem that involves simulating the moments of a Sobolev function based on observations obtained from (random) quadrature nodes. Firstly, we establish an information-theoretic lower bound for the problem. We then study a specific quadrature rule that employs a nonparametric regression-adjusted control variate to reduce the variance of the Monte Carlo simulation. We demonstrate that this kind of quadrature rule can improve the Monte Carlo rate and achieve the minimax optimal rate under a sufficient smoothness assumption. Due to the Sobolev Embedding Theorem, the sufficient smoothness assumption eliminates the existence of rare and extreme events. Finally, we show that, in the presence of rare and extreme events, a truncated version of the Monte Carlo algorithm can achieve the minimax optimal rate while the control variate cannot improve the convergence rate.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Computing Free Convolutions via Contour Integrals
Authors:
Alice Cortinovis,
Lexing Ying
Abstract:
This work proposes algorithms for computing additive and multiplicative free convolutions of two given measures. We consider measures with compact support whose free convolution results in a measure with a density function that exhibits a square-root decay at the boundary (for example, the semicircle distribution or the Marchenko-Pastur distribution). A key ingredient of our method is rewriting th…
▽ More
This work proposes algorithms for computing additive and multiplicative free convolutions of two given measures. We consider measures with compact support whose free convolution results in a measure with a density function that exhibits a square-root decay at the boundary (for example, the semicircle distribution or the Marchenko-Pastur distribution). A key ingredient of our method is rewriting the intermediate quantities of the free convolution using the Cauchy integral formula and then discretizing these integrals using the trapezoidal quadrature rule, which converges exponentially fast under suitable analyticity properties of the functions to be integrated.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
A note on spike localization for line spectrum estimation
Authors:
Haoya Li,
Hongkang Ni,
Lexing Ying
Abstract:
This note considers the problem of approximating the locations of dominant spikes for a probability measure from noisy spectrum measurements under the condition of residue signal, significant noise level, and no minimum spectrum separation. We show that the simple procedure of thresholding the smoothed inverse Fourier transform allows for approximating the spike locations rather accurately.
This note considers the problem of approximating the locations of dominant spikes for a probability measure from noisy spectrum measurements under the condition of residue signal, significant noise level, and no minimum spectrum separation. We show that the simple procedure of thresholding the smoothed inverse Fourier transform allows for approximating the spike locations rather accurately.
△ Less
Submitted 13 March, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
On the Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures
Authors:
Xian Yu,
Lei Ying
Abstract:
Risk-sensitive reinforcement learning (RL) has become a popular tool to control the risk of uncertain outcomes and ensure reliable performance in various sequential decision-making problems. While policy gradient methods have been developed for risk-sensitive RL, it remains unclear if these methods enjoy the same global convergence guarantees as in the risk-neutral case. In this paper, we consider…
▽ More
Risk-sensitive reinforcement learning (RL) has become a popular tool to control the risk of uncertain outcomes and ensure reliable performance in various sequential decision-making problems. While policy gradient methods have been developed for risk-sensitive RL, it remains unclear if these methods enjoy the same global convergence guarantees as in the risk-neutral case. In this paper, we consider a class of dynamic time-consistent risk measures, called Expected Conditional Risk Measures (ECRMs), and derive policy gradient updates for ECRM-based objective functions. Under both constrained direct parameterization and unconstrained softmax parameterization, we provide global convergence and iteration complexities of the corresponding risk-averse policy gradient algorithms. We further test risk-averse variants of REINFORCE and actor-critic algorithms to demonstrate the efficacy of our method and the importance of risk control.
△ Less
Submitted 29 May, 2023; v1 submitted 25 January, 2023;
originally announced January 2023.
-
On efficient quantum block encoding of pseudo-differential operators
Authors:
Haoya Li,
Hongkang Ni,
Lexing Ying
Abstract:
Block encoding lies at the core of many existing quantum algorithms. Meanwhile, efficient and explicit block encodings of dense operators are commonly acknowledged as a challenging problem. This paper presents a comprehensive study of the block encoding of a rich family of dense operators: the pseudo-differential operators (PDOs). First, a block encoding scheme for generic PDOs is developed. Then…
▽ More
Block encoding lies at the core of many existing quantum algorithms. Meanwhile, efficient and explicit block encodings of dense operators are commonly acknowledged as a challenging problem. This paper presents a comprehensive study of the block encoding of a rich family of dense operators: the pseudo-differential operators (PDOs). First, a block encoding scheme for generic PDOs is developed. Then we propose a more efficient scheme for PDOs with a separable structure. Finally, we demonstrate an explicit and efficient block encoding algorithm for PDOs with a dimension-wise fully separable structure. Complexity analysis is provided for all block encoding algorithms presented. The application of theoretical results is illustrated with worked examples, including the representation of variable coefficient elliptic operators and the computation of the inverse of elliptic operators without invoking quantum linear system algorithms (QLSAs).
△ Less
Submitted 31 May, 2023; v1 submitted 21 January, 2023;
originally announced January 2023.
-
Network Utility Maximization with Unknown Utility Functions: A Distributed, Data-Driven Bilevel Optimization Approach
Authors:
Kaiyi Ji,
Lei Ying
Abstract:
Fair resource allocation is one of the most important topics in communication networks. Existing solutions almost exclusively assume each user utility function is known and concave. This paper seeks to answer the following question: how to allocate resources when utility functions are unknown, even to the users? This answer has become increasingly important in the next-generation AI-aware communic…
▽ More
Fair resource allocation is one of the most important topics in communication networks. Existing solutions almost exclusively assume each user utility function is known and concave. This paper seeks to answer the following question: how to allocate resources when utility functions are unknown, even to the users? This answer has become increasingly important in the next-generation AI-aware communication networks where the user utilities are complex and their closed-forms are hard to obtain. In this paper, we provide a new solution using a distributed and data-driven bilevel optimization approach, where the lower level is a distributed network utility maximization (NUM) algorithm with concave surrogate utility functions, and the upper level is a data-driven learning algorithm to find the best surrogate utility functions that maximize the sum of true network utility. The proposed algorithm learns from data samples (utility values or gradient values) to autotune the surrogate utility functions to maximize the true network utility, so works for unknown utility functions. For the general network, we establish the nonasymptotic convergence rate of the proposed algorithm with nonconcave utility functions. The simulations validate our theoretical results and demonstrate the great effectiveness of the proposed method in a real-world network.
△ Less
Submitted 5 January, 2023; v1 submitted 4 January, 2023;
originally announced January 2023.
-
High-dimensional density estimation with tensorizing flow
Authors:
Yinuo Ren,
Hongli Zhao,
Yuehaw Khoo,
Lexing Ying
Abstract:
We propose the tensorizing flow method for estimating high-dimensional probability density functions from the observed data. The method is based on tensor-train and flow-based generative modeling. Our method first efficiently constructs an approximate density in the tensor-train form via solving the tensor cores from a linear system based on the kernel density estimators of low-dimensional margina…
▽ More
We propose the tensorizing flow method for estimating high-dimensional probability density functions from the observed data. The method is based on tensor-train and flow-based generative modeling. Our method first efficiently constructs an approximate density in the tensor-train form via solving the tensor cores from a linear system based on the kernel density estimators of low-dimensional marginals. We then train a continuous-time flow model from this tensor-train density to the observed empirical distribution by performing a maximum likelihood estimation. The proposed method combines the optimization-less feature of the tensor-train with the flexibility of the flow-based generative models. Numerical results are included to demonstrate the performance of the proposed method.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls
Authors:
Yi** Lu,
Jia** Li,
Lexing Ying,
Jose Blanchet
Abstract:
The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the dif…
▽ More
The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the difference between the weighted average outcomes of the treated and control units, where the weights are learned from the observed data. {Under this setting, we surprisingly observed that the optimal experimental design problem could be reduced to a so-called \textit{phase synchronization} problem.} We solve this problem via a normalized variant of the generalized power method with spectral initialization. On the theoretical side, we establish the first global optimality guarantee for experiment design when pre-treatment data is sampled from certain data-generating processes. Empirically, we conduct extensive experiments to demonstrate the effectiveness of our method on both the US Bureau of Labor Statistics and the Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean square error, our algorithm surpasses the random design by a large margin.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Continuous-in-time Limit for Bayesian Bandits
Authors:
Yuhua Zhu,
Zachary Izzo,
Lexing Ying
Abstract:
This paper revisits the bandit problem in the Bayesian setting. The Bayesian approach formulates the bandit problem as an optimization problem, and the goal is to find the optimal policy which minimizes the Bayesian regret. One of the main challenges facing the Bayesian approach is that computation of the optimal policy is often intractable, especially when the length of the problem horizon or the…
▽ More
This paper revisits the bandit problem in the Bayesian setting. The Bayesian approach formulates the bandit problem as an optimization problem, and the goal is to find the optimal policy which minimizes the Bayesian regret. One of the main challenges facing the Bayesian approach is that computation of the optimal policy is often intractable, especially when the length of the problem horizon or the number of arms is large. In this paper, we first show that under a suitable rescaling, the Bayesian bandit problem converges toward a continuous Hamilton-Jacobi-Bellman (HJB) equation. The optimal policy for the limiting HJB equation can be explicitly obtained for several common bandit problems, and we give numerical methods to solve the HJB equation when an explicit solution is not available. Based on these results, we propose an approximate Bayes-optimal policy for solving Bayesian bandit problems with large horizons. Our method has the added benefit that its computational cost does not increase as the horizon increases.
△ Less
Submitted 29 September, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Minimax Optimal Kernel Operator Learning via Multilevel Training
Authors:
Jikai **,
Yi** Lu,
Jose Blanchet,
Lexing Ying
Abstract:
Learning map**s between infinite-dimensional function spaces has achieved empirical success in many disciplines of machine learning, including generative modeling, functional data analysis, causal inference, and multi-agent reinforcement learning. In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbe…
▽ More
Learning map**s between infinite-dimensional function spaces has achieved empirical success in many disciplines of machine learning, including generative modeling, functional data analysis, causal inference, and multi-agent reinforcement learning. In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbert spaces. We establish the information-theoretic lower bound in terms of the Sobolev Hilbert-Schmidt norm and show that a regularization that learns the spectral components below the bias contour and ignores the ones that are above the variance contour can achieve the optimal learning rate. At the same time, the spectral components between the bias and variance contours give us flexibility in designing computationally feasible machine learning algorithms. Based on this observation, we develop a multilevel kernel operator learning algorithm that is optimal when learning linear operators between infinite-dimensional function spaces.
△ Less
Submitted 24 July, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Importance Tempering: Group Robustness for Overparameterized Models
Authors:
Yi** Lu,
Wenlong Ji,
Zachary Izzo,
Lexing Ying
Abstract:
Although overparameterized models have shown their success on many machine learning tasks, the accuracy could drop on the testing distribution that is different from the training one. This accuracy drop still limits applying machine learning in the wild. At the same time, importance weighting, a traditional technique to handle distribution shifts, has been demonstrated to have less or even no effe…
▽ More
Although overparameterized models have shown their success on many machine learning tasks, the accuracy could drop on the testing distribution that is different from the training one. This accuracy drop still limits applying machine learning in the wild. At the same time, importance weighting, a traditional technique to handle distribution shifts, has been demonstrated to have less or even no effect on overparameterized models both empirically and theoretically. In this paper, we propose importance tempering to improve the decision boundary and achieve consistently better results for overparameterized models. Theoretically, we justify that the selection of group temperature can be different under label shift and spurious correlation setting. At the same time, we also prove that properly selected temperatures can extricate the minority collapse for imbalanced classification. Empirically, we achieve state-of-the-art results on worst group classification tasks using importance tempering.
△ Less
Submitted 27 September, 2022; v1 submitted 18 September, 2022;
originally announced September 2022.
-
Correcting Convexity Bias in Function and Functional Estimate
Authors:
Chao Ma,
Lexing Ying
Abstract:
A general framework with a series of different methods is proposed to improve the estimate of convex function (or functional) values when only noisy observations of the true input are available. Technically, our methods catch the bias introduced by the convexity and remove this bias from a baseline estimate. Theoretical analysis are conducted to show that the proposed methods can strictly reduce t…
▽ More
A general framework with a series of different methods is proposed to improve the estimate of convex function (or functional) values when only noisy observations of the true input are available. Technically, our methods catch the bias introduced by the convexity and remove this bias from a baseline estimate. Theoretical analysis are conducted to show that the proposed methods can strictly reduce the expected estimate error under mild conditions. When applied, the methods require no specific knowledge about the problem except the convexity and the evaluation of the function. Therefore, they can serve as off-the-shelf tools to obtain good estimate for a wide range of problems, including optimization problems with random objective functions or constraints, and functionals of probability distributions such as the entropy and the Wasserstein distance. Numerical experiments on a wide variety of problems show that our methods can significantly improve the quality of the estimate compared with the baseline method.
△ Less
Submitted 14 September, 2022; v1 submitted 16 August, 2022;
originally announced August 2022.
-
On Low-Complexity Quickest Intervention of Mutated Diffusion Processes Through Local Approximation
Authors:
Qining Zhang,
Honghao Wei,
Weina Wang,
Lei Ying
Abstract:
We consider the problem of controlling a mutated diffusion process with an unknown mutation time. The problem is formulated as the quickest intervention problem with the mutation modeled by a change-point, which is a generalization of the quickest change-point detection (QCD). Our goal is to intervene in the mutated process as soon as possible while maintaining a low intervention cost with optimal…
▽ More
We consider the problem of controlling a mutated diffusion process with an unknown mutation time. The problem is formulated as the quickest intervention problem with the mutation modeled by a change-point, which is a generalization of the quickest change-point detection (QCD). Our goal is to intervene in the mutated process as soon as possible while maintaining a low intervention cost with optimally chosen intervention actions. This model and the proposed algorithms can be applied to pandemic prevention (such as Covid-19) or misinformation containment. We formulate the problem as a partially observed Markov decision process (POMDP) and convert it to an MDP through the belief state of the change-point. We first propose a grid approximation approach to calculate the optimal intervention policy, whose computational complexity could be very high when the number of grids is large. In order to reduce the computational complexity, we further propose a low-complexity threshold-based policy through the analysis of the first-order approximation of the value functions in the ``local intervention'' regime. Simulation results show the low-complexity algorithm has a similar performance as the grid approximation and both perform much better than the QCD-based algorithms.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Will Bilevel Optimizers Benefit from Loops
Authors:
Kaiyi Ji,
Mingrui Liu,
Yingbin Liang,
Lei Ying
Abstract:
Bilevel optimization has arisen as a powerful tool for solving a variety of machine learning problems. Two current popular bilevel optimizers AID-BiO and ITD-BiO naturally involve solving one or two sub-problems, and consequently, whether we solve these problems with loops (that take many iterations) or without loops (that take only a few iterations) can significantly affect the overall computatio…
▽ More
Bilevel optimization has arisen as a powerful tool for solving a variety of machine learning problems. Two current popular bilevel optimizers AID-BiO and ITD-BiO naturally involve solving one or two sub-problems, and consequently, whether we solve these problems with loops (that take many iterations) or without loops (that take only a few iterations) can significantly affect the overall computational efficiency. Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations. In this paper, we first establish unified convergence analysis for both AID-BiO and ITD-BiO that are applicable to all implementation choices of loops. We then specialize our results to characterize the computational complexity for all implementations, which enable an explicit comparison among them. Our result indicates that for AID-BiO, the loop for estimating the optimal point of the inner function is beneficial for overall efficiency, although it causes higher complexity for each update step, and the loop for approximating the outer-level Hessian-inverse-vector product reduces the gradient complexity. For ITD-BiO, the two loops always coexist, and our convergence upper and lower bounds show that such loops are necessary to guarantee a vanishing convergence error, whereas the no-loop scheme suffers from an unavoidable non-vanishing convergence error. Our numerical experiments further corroborate our theoretical results.
△ Less
Submitted 31 May, 2022; v1 submitted 27 May, 2022;
originally announced May 2022.
-
Annealed importance sampling for Ising models with mixed boundary conditions
Authors:
Lexing Ying
Abstract:
This note introduces a method for sampling Ising models with mixed boundary conditions. As an application of annealed importance sampling and the Swendsen-Wang algorithm, the method adopts a sequence of intermediate distributions that keeps the temperature fixed but turns on the boundary condition gradually. The numerical results show that the variance of the sample weights is relatively small.
This note introduces a method for sampling Ising models with mixed boundary conditions. As an application of annealed importance sampling and the Swendsen-Wang algorithm, the method adopts a sequence of intermediate distributions that keeps the temperature fixed but turns on the boundary condition gradually. The numerical results show that the variance of the sample weights is relatively small.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent
Authors:
Yi** Lu,
Jose Blanchet,
Lexing Ying
Abstract:
In this paper, we study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations using a general class of objective functions. Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN) for solving elliptic partial differential…
▽ More
In this paper, we study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations using a general class of objective functions. Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN) for solving elliptic partial differential equations (PDEs) as special cases. We consider a potentially infinite-dimensional parameterization of our model using a suitable Reproducing Kernel Hilbert Space and a continuous parameterization of problem hardness through the definition of kernel integral operators. We prove that gradient descent over this objective function can also achieve statistical optimality and the optimal number of passes over the data increases with sample size. Based on our theory, we explain an implicit acceleration of using a Sobolev norm as the objective function for training, inferring that the optimal number of epochs of DRM becomes larger than the number of PINN when both the data size and the hardness of tasks increase, although both DRM and PINN can achieve statistical optimality.
△ Less
Submitted 19 September, 2022; v1 submitted 15 May, 2022;
originally announced May 2022.
-
Double Flip Move for Ising Models with Mixed Boundary Conditions
Authors:
Lexing Ying
Abstract:
This note introduces the double flip move for accelerating the Swendsen-Wang algorithm for Ising models with mixed boundary conditions below the critical temperature. The double flip move consists of a geometric flip of the spin lattice followed by a spin value flip. Both the symmetric and approximately symmetric models are considered. We prove the detailed balance of the double flip move and demo…
▽ More
This note introduces the double flip move for accelerating the Swendsen-Wang algorithm for Ising models with mixed boundary conditions below the critical temperature. The double flip move consists of a geometric flip of the spin lattice followed by a spin value flip. Both the symmetric and approximately symmetric models are considered. We prove the detailed balance of the double flip move and demonstrate its empirical efficiency in mixing.
△ Less
Submitted 15 May, 2022;
originally announced May 2022.
-
Accelerating Primal-dual Methods for Regularized Markov Decision Processes
Authors:
Haoya Li,
Hsiang-fu Yu,
Lexing Ying,
Inderjit Dhillon
Abstract:
Entropy regularized Markov decision processes have been widely used in reinforcement learning. This paper is concerned with the primal-dual formulation of the entropy regularized problems. Standard first-order methods suffer from slow convergence due to the lack of strict convexity and concavity. To address this issue, we first introduce a new quadratically convexified primal-dual formulation. The…
▽ More
Entropy regularized Markov decision processes have been widely used in reinforcement learning. This paper is concerned with the primal-dual formulation of the entropy regularized problems. Standard first-order methods suffer from slow convergence due to the lack of strict convexity and concavity. To address this issue, we first introduce a new quadratically convexified primal-dual formulation. The natural gradient ascent descent of the new formulation enjoys global convergence guarantee and exponential convergence rate. We also propose a new interpolating metric that further accelerates the convergence significantly. Numerical results are provided to demonstrate the performance of the proposed methods under multiple settings.
△ Less
Submitted 12 June, 2023; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Analytic continuation from limited noisy Matsubara data
Authors:
Lexing Ying
Abstract:
This note proposes a new algorithm for estimating spectral function from limited noisy Matsubara data. We consider both the molecule and condensed matter cases. In each case, the algorithm constructs an interpolant of the Matsubara data and uses conformal map** and Prony's method to estimate the spectral function. Numerical results are provided to demonstrate the performance of the algorithm.
This note proposes a new algorithm for estimating spectral function from limited noisy Matsubara data. We consider both the molecule and condensed matter cases. In each case, the algorithm constructs an interpolant of the Matsubara data and uses conformal map** and Prony's method to estimate the spectral function. Numerical results are provided to demonstrate the performance of the algorithm.
△ Less
Submitted 12 March, 2022; v1 submitted 19 February, 2022;
originally announced February 2022.
-
Large-System Insensitivity of Zero-Waiting Load Balancing Algorithms
Authors:
Xin Liu,
Kang Gong,
Lei Ying
Abstract:
This paper studies the sensitivity (or insensitivity) of a class of load balancing algorithms that achieve asymptotic zero-waiting in the sub-Halfin-Whitt regime, named LB-zero. Most existing results on zero-waiting load balancing algorithms assume the service time distribution is exponential. This paper establishes the {\em large-system insensitivity} of LB-zero for jobs whose service time follow…
▽ More
This paper studies the sensitivity (or insensitivity) of a class of load balancing algorithms that achieve asymptotic zero-waiting in the sub-Halfin-Whitt regime, named LB-zero. Most existing results on zero-waiting load balancing algorithms assume the service time distribution is exponential. This paper establishes the {\em large-system insensitivity} of LB-zero for jobs whose service time follows a Coxian distribution with a finite number of phases. This result suggests that LB-zero achieves asymptotic zero-waiting for a large class of service time distributions, which is confirmed in our simulations. To prove this result, this paper develops a new technique, called "Iterative State-Space Peeling" (or ISSP for short). ISSP first identifies an iterative relation between the upper and lower bounds on the queue states and then proves that the system lives near the fixed point of the iterative bounds with a high probability. Based on ISSP, the steady-state distribution of the system is further analyzed by applying Stein's method in the neighborhood of the fixed point. ISSP, like state-space collapse in heavy-traffic analysis, is a general approach that may be used to study other complex stochastic systems.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
Stable factorization for phase factors of quantum signal processing
Authors:
Lexing Ying
Abstract:
This paper proposes a new factorization algorithm for computing the phase factors of quantum signal processing. The proposed algorithm avoids root finding of high degree polynomials by using a key step of Prony's method and is numerically stable in the double precision arithmetics. Experimental results are reported for Hamiltonian simulation, eigenstate filtering, matrix inversion, and Fermi-Dirac…
▽ More
This paper proposes a new factorization algorithm for computing the phase factors of quantum signal processing. The proposed algorithm avoids root finding of high degree polynomials by using a key step of Prony's method and is numerically stable in the double precision arithmetics. Experimental results are reported for Hamiltonian simulation, eigenstate filtering, matrix inversion, and Fermi-Dirac operator.
△ Less
Submitted 17 October, 2022; v1 submitted 5 February, 2022;
originally announced February 2022.
-
Pole recovery from noisy data on imaginary axis
Authors:
Lexing Ying
Abstract:
This note proposes an algorithm for identifying the poles and residues of a meromorphic function from its noisy values on the imaginary axis. The algorithm uses Möbius transform and Prony's method in the frequency domain. Numerical results are provided to demonstrate the performance of the algorithm.
This note proposes an algorithm for identifying the poles and residues of a meromorphic function from its noisy values on the imaginary axis. The algorithm uses Möbius transform and Prony's method in the frequency domain. Numerical results are provided to demonstrate the performance of the algorithm.
△ Less
Submitted 12 February, 2022; v1 submitted 5 February, 2022;
originally announced February 2022.
-
Operator Shifting for Model-based Policy Evaluation
Authors:
Xun Tang,
Lexing Ying,
Yuhua Zhu
Abstract:
In model-based reinforcement learning, the transition matrix and reward vector are often estimated from random samples subject to noise. Even if the estimated model is an unbiased estimate of the true underlying model, the value function computed from the estimated model is biased. We introduce an operator shifting method for reducing the error introduced by the estimated model. When the error is…
▽ More
In model-based reinforcement learning, the transition matrix and reward vector are often estimated from random samples subject to noise. Even if the estimated model is an unbiased estimate of the true underlying model, the value function computed from the estimated model is biased. We introduce an operator shifting method for reducing the error introduced by the estimated model. When the error is in the residual norm, we prove that the shifting factor is always positive and upper bounded by $1+O\left(1/n\right)$, where $n$ is the number of samples used in learning each row of the transition matrix. We also propose a practical numerical algorithm for implementing the operator shifting.
△ Less
Submitted 7 February, 2023; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Machine Learning For Elliptic PDEs: Fast Rate Generalization Bound, Neural Scaling Law and Minimax Optimality
Authors:
Yi** Lu,
Haoxuan Chen,
Jianfeng Lu,
Lexing Ying,
Jose Blanchet
Abstract:
In this paper, we study the statistical limits of deep learning techniques for solving elliptic partial differential equations (PDEs) from random samples using the Deep Ritz Method (DRM) and Physics-Informed Neural Networks (PINNs). To simplify the problem, we focus on a prototype elliptic PDE: the Schrödinger equation on a hypercube with zero Dirichlet boundary condition, which has wide applicati…
▽ More
In this paper, we study the statistical limits of deep learning techniques for solving elliptic partial differential equations (PDEs) from random samples using the Deep Ritz Method (DRM) and Physics-Informed Neural Networks (PINNs). To simplify the problem, we focus on a prototype elliptic PDE: the Schrödinger equation on a hypercube with zero Dirichlet boundary condition, which has wide application in the quantum-mechanical systems. We establish upper and lower bounds for both methods, which improves upon concurrently developed upper bounds for this problem via a fast rate generalization bound. We discover that the current Deep Ritz Methods is sub-optimal and propose a modified version of it. We also prove that PINN and the modified version of DRM can achieve minimax optimal bounds over Sobolev spaces. Empirically, following recent work which has shown that the deep model accuracy will improve with growing training sets according to a power law, we supply computational experiments to show a similar behavior of dimension dependent power law for deep PDE solvers.
△ Less
Submitted 12 November, 2021; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Approximate Newton policy gradient algorithms
Authors:
Haoya Li,
Samarth Gupta,
Hsiangfu Yu,
Lexing Ying,
Inderjit Dhillon
Abstract:
Policy gradient algorithms have been widely applied to Markov decision processes and reinforcement learning problems in recent years. Regularization with various entropy functions is often used to encourage exploration and improve stability. This paper proposes an approximate Newton method for the policy gradient algorithm with entropy regularization. In the case of Shannon entropy, the resulting…
▽ More
Policy gradient algorithms have been widely applied to Markov decision processes and reinforcement learning problems in recent years. Regularization with various entropy functions is often used to encourage exploration and improve stability. This paper proposes an approximate Newton method for the policy gradient algorithm with entropy regularization. In the case of Shannon entropy, the resulting algorithm reproduces the natural policy gradient algorithm. For other entropy functions, this method results in brand-new policy gradient algorithms. We prove that all these algorithms enjoy Newton-type quadratic convergence and that the corresponding gradient flow converges globally to the optimal solution. We use synthetic and industrial-scale examples to demonstrate that the proposed approximate Newton method typically converges in single-digit iterations, often orders of magnitude faster than other state-of-the-art algorithms.
△ Less
Submitted 8 June, 2023; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Variational Actor-Critic Algorithms
Authors:
Yuhua Zhu,
Lexing Ying
Abstract:
We introduce a class of variational actor-critic algorithms based on a variational formulation over both the value function and the policy. The objective function of the variational formulation consists of two parts: one for maximizing the value function and the other for minimizing the Bellman residual. Besides the vanilla gradient descent with both the value function and the policy updates, we p…
▽ More
We introduce a class of variational actor-critic algorithms based on a variational formulation over both the value function and the policy. The objective function of the variational formulation consists of two parts: one for maximizing the value function and the other for minimizing the Bellman residual. Besides the vanilla gradient descent with both the value function and the policy updates, we propose two variants, the clip** method and the flip** method, in order to speed up the convergence. We also prove that, when the prefactor of the Bellman residual is sufficiently large, the fixed point of the algorithm is close to the optimal policy.
△ Less
Submitted 13 January, 2023; v1 submitted 2 August, 2021;
originally announced August 2021.
-
Shrinkage Estimation of Functions of Large Noisy Symmetric Matrices
Authors:
Panagiotis Lolas,
Lexing Ying
Abstract:
We study the problem of estimating functions of a large symmetric matrix $A_n$ when we only have
access to a noisy estimate $\hat{A}_n=A_n+σZ_n/\sqrt{n}.$ We are interested
in the case that $Z_n$ is a Wigner ensemble and suggest an algorithm based on nonlinear shrinkage of
the eigenvalues of $\hat{A}_n.$ As an intermediate step we explain how recovery of the spectrum of
$A_n$ is possible u…
▽ More
We study the problem of estimating functions of a large symmetric matrix $A_n$ when we only have
access to a noisy estimate $\hat{A}_n=A_n+σZ_n/\sqrt{n}.$ We are interested
in the case that $Z_n$ is a Wigner ensemble and suggest an algorithm based on nonlinear shrinkage of
the eigenvalues of $\hat{A}_n.$ As an intermediate step we explain how recovery of the spectrum of
$A_n$ is possible using only the spectrum of $\hat{A}_n$. Our algorithm has important applications,
for example, in solving high-dimensional noisy systems of equations or symmetric matrix
denoising. Throughout our analysis we rely on tools from random matrix theory.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Combining resampling and reweighting for faithful stochastic optimization
Authors:
**g An,
Lexing Ying
Abstract:
Many machine learning and data science tasks require solving non-convex optimization problems. When the loss function is a sum of multiple terms, a popular method is the stochastic gradient descent. Viewed as a process for sampling the loss function landscape, the stochastic gradient descent is known to prefer flat minima. Though this is desired for certain optimization problems such as in deep le…
▽ More
Many machine learning and data science tasks require solving non-convex optimization problems. When the loss function is a sum of multiple terms, a popular method is the stochastic gradient descent. Viewed as a process for sampling the loss function landscape, the stochastic gradient descent is known to prefer flat minima. Though this is desired for certain optimization problems such as in deep learning, it causes issues when the goal is to find the global minimum, especially if the global minimum resides in a sharp valley.
Illustrated with a simple motivating example, we show that the fundamental reason is that the difference in the Lipschitz constants of multiple terms in the loss function causes stochastic gradient descent to experience different variances at different minima. In order to mitigate this effect and perform faithful optimization, we propose a combined resampling-reweighting scheme to balance the variance at local minima and extend to general loss functions. We explain from the numerical stability perspective how the proposed scheme is more likely to select the true global minimum, and the local convergence analysis perspective how it converges to a minimum faster when compared with the vanilla stochastic gradient descent. Experiments from robust statistics and computational chemistry are provided to demonstrate the theoretical findings.
△ Less
Submitted 9 September, 2021; v1 submitted 31 May, 2021;
originally announced May 2021.
-
A semigroup method for high dimensional elliptic PDEs and eigenvalue problems based on neural networks
Authors:
Haoya Li,
Lexing Ying
Abstract:
In this paper, we propose a semigroup method for solving high-dimensional elliptic partial differential equations (PDEs) and the associated eigenvalue problems based on neural networks. For the PDE problems, we reformulate the original equations as variational problems with the help of semigroup operators and then solve the variational problems with neural network (NN) parameterization. The main a…
▽ More
In this paper, we propose a semigroup method for solving high-dimensional elliptic partial differential equations (PDEs) and the associated eigenvalue problems based on neural networks. For the PDE problems, we reformulate the original equations as variational problems with the help of semigroup operators and then solve the variational problems with neural network (NN) parameterization. The main advantages are that no mixed second-order derivative computation is needed during the stochastic gradient descent training and that the boundary conditions are taken into account automatically by the semigroup operator. Unlike popular methods like PINN \cite{raissi2019physics} and Deep Ritz \cite{weinan2018deep} where the Dirichlet boundary condition is enforced solely through penalty functions and thus changes the true solution, the proposed method is able to address the boundary conditions without penalty functions and it gives the correct true solution even when penalty functions are added, thanks to the semigroup operator. For eigenvalue problems, a primal-dual method is proposed, efficiently resolving the constraint with a simple scalar dual variable and resulting in a faster algorithm compared with the BSDE solver \cite{han2020solving} in certain problems such as the eigenvalue problem associated with the linear Schrödinger operator. Numerical results are provided to demonstrate the performance of the proposed methods.
△ Less
Submitted 9 January, 2022; v1 submitted 7 May, 2021;
originally announced May 2021.
-
Approximate inversion of discrete Fourier integral operators
Authors:
Jordi Feliu-Fabà,
Lexing Ying
Abstract:
This paper introduces a factorization for the inverse of discrete Fourier integral operators that can be applied in quasi-linear time. The factorization starts by approximating the operator with the butterfly factorization. Next, a hierarchical matrix representation is constructed for the hermitian matrix arising from composing the Fourier integral operator with its adjoint. This representation is…
▽ More
This paper introduces a factorization for the inverse of discrete Fourier integral operators that can be applied in quasi-linear time. The factorization starts by approximating the operator with the butterfly factorization. Next, a hierarchical matrix representation is constructed for the hermitian matrix arising from composing the Fourier integral operator with its adjoint. This representation is inverted efficiently with a new algorithm based on the hierarchical interpolative factorization. By combining these two factorizations, an approximate inverse factorization for the Fourier integral operator is obtained as a product of $O(\log N)$ sparse matrices of size $N\times N$. The resulting approximate inverse factorization can be used as a direct solver or as a preconditioner. Numerical examples on 1D and 2D Fourier integral operators, including a generalized Radon transform, demonstrate the performance of this new approach.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.