Search | arXiv e-print repository

SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

Authors: Liulu He, Yufei Zhao, Rui Gao, Yuan Du, Li Du

Abstract: Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast co… ▽ More Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast convolution by extending the Discrete Fourier Transform (DFT) with symbolic computing, in which only additions are required to perform the transformation at specific transform points, avoiding the calculation of irrational number and reducing the requirement for precision. Additionally, we enhance convolution efficiency by introducing correction terms to convert invalid circular convolution outputs of the Fourier method into effective ones. The numerical error analysis is presented for the first time in this type of work and proves that our algorithms can provide a 3.68x multiplication reduction for 3x3 convolution, while the Winograd algorithm only achieves a 2.25x reduction with similarly low numerical errors. Experiments carried out on benchmarks and FPGA show that our new algorithms can further improve the computation efficiency of quantized models while maintaining accuracy, surpassing both the quantization-alone method and existing works on fast convolution quantization. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: ICML 2024

arXiv:2406.16240 [pdf, ps, other]

Quasi-étale covers of Du Val del Pezzo surfaces and Zariski dense exceptional sets in Manin's conjecture

Authors: Runxuan Gao

Abstract: We construct first examples of singular del Pezzo surfaces with Zariski dense exceptional sets in Manin's conjecture, varying in degrees $1, 2$ and $3$. To systematically study these examples, we classify all quasi-étale covers of Du Val del Pezzo surfaces up to singularity types and study their equivariant geometry. We construct first examples of singular del Pezzo surfaces with Zariski dense exceptional sets in Manin's conjecture, varying in degrees $1, 2$ and $3$. To systematically study these examples, we classify all quasi-étale covers of Du Val del Pezzo surfaces up to singularity types and study their equivariant geometry. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 32 pages

arXiv:2405.18083 [pdf, ps, other]

On ergodic optimization for unimodal maps

Authors: Bing Gao, Rui Gao

Abstract: In this article, we show that for a typical non-uniformly expanding unimodal map, the unique maximizing measure of a generic Lipschitz function is supported on a periodic orbit. In this article, we show that for a typical non-uniformly expanding unimodal map, the unique maximizing measure of a generic Lipschitz function is supported on a periodic orbit. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 18 pages

arXiv:2403.14822 [pdf, other]

Non-Convex Robust Hypothesis Testing using Sinkhorn Uncertainty Sets

Authors: Jie Wang, Rui Gao, Yao Xie

Abstract: We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-c… ▽ More We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-convex, non-smooth probabilistic functions that are often intractable to optimize, existing methods resort to approximations rather than exact solutions. To tackle the challenge, we introduce an exact mixed-integer exponential conic reformulation of the problem, which can be solved into a global optimum with a moderate amount of input data. Subsequently, we propose a convex approximation, demonstrating its superiority over current state-of-the-art methodologies in literature. Furthermore, we establish connections between robust hypothesis testing and regularized formulations of non-robust risk functions, offering insightful interpretations. Our numerical study highlights the satisfactory testing performance and computational efficiency of the proposed framework. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 26 pages, 2 figures

arXiv:2311.04138 [pdf, other]

The geometric exceptional set in Manin's conjecture for Batyrev and Tschinkel's example

Authors: Runxuan Gao

Abstract: Batyrev and Tschinkel's example is a Fermat cubic surface bundle $X$ which is a Fano $5$-fold. It is the first example for which Manin's conjecture can never hold for a proper closed exceptional set. Lehmann, Sengupta, and Tanimoto proposed a conjectural geometric description of the exceptional set in Manin's conjecture and showed that it is always contained in a thin set. Over a field of characte… ▽ More Batyrev and Tschinkel's example is a Fermat cubic surface bundle $X$ which is a Fano $5$-fold. It is the first example for which Manin's conjecture can never hold for a proper closed exceptional set. Lehmann, Sengupta, and Tanimoto proposed a conjectural geometric description of the exceptional set in Manin's conjecture and showed that it is always contained in a thin set. Over a field of characteristic $0$, we construct finitely many thin maps such that any thin map $f:Y\rightarrow X$ with equal or larger $a$- and $b$-values in lexicographical order factors rationally through one of them. In particular, these define a thin set which coincides with the conjectural exceptional set. △ Less

Submitted 31 January, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: 22 pages, Theorem 1.8 is generalized to any field of characteristic 0

arXiv:2311.01708 [pdf, ps, other]

Physics-Informed Generator-Encoder Adversarial Networks with Latent Space Matching for Stochastic Differential Equations

Authors: Ruisong Gao, Min Yang, ** Zhang

Abstract: We propose a new class of physics-informed neural networks, called Physics-Informed Generator-Encoder Adversarial Networks, to effectively address the challenges posed by forward, inverse, and mixed problems in stochastic differential equations. In these scenarios, while the governing equations are known, the available data consist of only a limited set of snapshots for system parameters. Our mode… ▽ More We propose a new class of physics-informed neural networks, called Physics-Informed Generator-Encoder Adversarial Networks, to effectively address the challenges posed by forward, inverse, and mixed problems in stochastic differential equations. In these scenarios, while the governing equations are known, the available data consist of only a limited set of snapshots for system parameters. Our model consists of two key components: the generator and the encoder, both updated alternately by gradient descent. In contrast to previous approaches of directly matching the approximated solutions with real snapshots, we employ an indirect matching that operates within the lower-dimensional latent feature space. This method circumvents challenges associated with high-dimensional inputs and complex data distributions, while yielding more accurate solutions compared to existing neural network solvers. In addition, the approach also mitigates the training instability issues encountered in previous adversarial frameworks in an efficient manner. Numerical results provide compelling evidence of the effectiveness of the proposed method in solving different types of stochastic differential equations. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 24 pages

arXiv:2311.01261 [pdf, ps, other]

Overlap Times in Tandem Queues: Identically Distributed Station Case

Authors: Ruici Gao, Jamol Pender

Abstract: In this paper, we investigate overlap times in a two-dimensional infinite server tandem queue. Specifically, we analyze the amount of time that a pair of customers spend overlap** in any station of the two dimensional tandem network. We assume that both stations have independent and identically distributed exponential service times with the same rate parameter $μ$. Our main contribution is the d… ▽ More In this paper, we investigate overlap times in a two-dimensional infinite server tandem queue. Specifically, we analyze the amount of time that a pair of customers spend overlap** in any station of the two dimensional tandem network. We assume that both stations have independent and identically distributed exponential service times with the same rate parameter $μ$. Our main contribution is the derivation of the joint tail distribution, the two marginal tail probabilities, the moments of the overlap times and the tail distribution of the sum of the overlap times in both stations. Our results shed light on how customers overlap downstream in serial queueing systems. △ Less

Submitted 12 March, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

arXiv:2207.04348 [pdf, ps, other]

A Zariski dense exceptional set in Manin's conjecture: dimension 2

Authors: Runxuan Gao

Abstract: Recently, Lehmann, Sengupta, and Tanimoto proposed a conjectural construction of the exceptional set in Manin's Conjecture, which we call the geometric exceptional set. We construct a del Pezzo surface of degree $1$ whose geometric exceptional set is Zariski dense. In particular, this provides the first counterexample to the original version of Manin's Conjecture in dimension $2$ in characteristic… ▽ More Recently, Lehmann, Sengupta, and Tanimoto proposed a conjectural construction of the exceptional set in Manin's Conjecture, which we call the geometric exceptional set. We construct a del Pezzo surface of degree $1$ whose geometric exceptional set is Zariski dense. In particular, this provides the first counterexample to the original version of Manin's Conjecture in dimension $2$ in characteristic $0$. Assuming the finiteness of Tate-Shafarevich groups of elliptic curves over $\mathbb{Q}$ with $j$-invariant $0$, we show that there are infinitely many such counterexamples. △ Less

Submitted 17 May, 2023; v1 submitted 9 July, 2022; originally announced July 2022.

Comments: 14 pages, final version to appear in Research in Number Theory

arXiv:2206.05467 [pdf, ps, other]

Low complexity of optimizing measures over an expanding circle map

Authors: Rui Gao, Weixiao Shen

Abstract: In this paper, we prove that for real analytic expanding circle maps, all optimizing measures of a real analytic potential function have zero entropy, unless the potential is cohomologous to constant. We use the group structure of the symbolic space to solve a transversality problem involved. We also discuss applications to optimizing measures for generic smooth potentials and to Lyapunov optimizi… ▽ More In this paper, we prove that for real analytic expanding circle maps, all optimizing measures of a real analytic potential function have zero entropy, unless the potential is cohomologous to constant. We use the group structure of the symbolic space to solve a transversality problem involved. We also discuss applications to optimizing measures for generic smooth potentials and to Lyapunov optimizing measures. △ Less

Submitted 29 July, 2022; v1 submitted 11 June, 2022; originally announced June 2022.

Comments: 15 pages, minor corrections

arXiv:2205.00362 [pdf, ps, other]

A Short and General Duality Proof for Wasserstein Distributionally Robust Optimization

Authors: Luhao Zhang, **cheng Yang, Rui Gao

Abstract: We present a general duality result for Wasserstein distributionally robust optimization that holds for any Kantorovich transport cost, measurable loss function, and nominal probability distribution. Assuming an interchangeability principle inherent in existing duality results, our proof only uses one-dimensional convex analysis. Furthermore, we demonstrate that the interchangeability principle ho… ▽ More We present a general duality result for Wasserstein distributionally robust optimization that holds for any Kantorovich transport cost, measurable loss function, and nominal probability distribution. Assuming an interchangeability principle inherent in existing duality results, our proof only uses one-dimensional convex analysis. Furthermore, we demonstrate that the interchangeability principle holds if and only if certain measurable projection and weak measurable selection conditions are satisfied. To illustrate the broader applicability of our approach, we provide a rigorous treatment of duality results in distributionally robust Markov decision processes and distributionally robust multistage stochastic programming. Additionally, we extend our analysis to other problems such as infinity-Wasserstein distributionally robust optimization, risk-averse optimization, and globalized distributionally robust counterpart. △ Less

Submitted 4 June, 2024; v1 submitted 30 April, 2022; originally announced May 2022.

MSC Class: 49N15

arXiv:2202.06012

Cloud-based computational model predictive control using a parallel multi-block ADMM approach

Authors: Yaling Ma, Runze Gao, Li Dai, **xian Wu, Yuanqing Xia

Abstract: Heavy computational load for solving nonconvex problems for large-scale systems or systems with real-time demands at each sample step has been recognized as one of the reasons for preventing a wider application of nonlinear model predictive control (NMPC). To improve the real-time feasibility of NMPC with input nonlinearity, we devise an innovative scheme called cloud-based computational model pre… ▽ More Heavy computational load for solving nonconvex problems for large-scale systems or systems with real-time demands at each sample step has been recognized as one of the reasons for preventing a wider application of nonlinear model predictive control (NMPC). To improve the real-time feasibility of NMPC with input nonlinearity, we devise an innovative scheme called cloud-based computational model predictive control (MPC) by using an elaborately designed parallel multi-block alternating direction method of multipliers (ADMM) algorithm. This novel parallel multi-block ADMM algorithm is tailored to tackle the computational issue of solving a nonconvex problem with nonlinear constraints. △ Less

Submitted 15 April, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

Comments: Statements and experiments are flawed

arXiv:2111.02523 [pdf, other]

Adding Safety Rules to Surgeon-Authored VR Training

Authors: Ruiliang Gao, Sergei Kurenov, Erik W. Black, Jorg Peters

Abstract: Introduction: Safety criteria in surgical VR training are typically hard-coded and informally summarized. The Virtual Reality (VR) content creation interface, TIPS-author, for the Toolkit for Illustration of Procedures in Surgery (TIPS) allows surgeon-educators (SEs) to create laparoscopic VR-training modules with force feedback. TIPS-author initializes anatomy shape and physical properties select… ▽ More Introduction: Safety criteria in surgical VR training are typically hard-coded and informally summarized. The Virtual Reality (VR) content creation interface, TIPS-author, for the Toolkit for Illustration of Procedures in Surgery (TIPS) allows surgeon-educators (SEs) to create laparoscopic VR-training modules with force feedback. TIPS-author initializes anatomy shape and physical properties selected by the SE accessing a cloud data base of physics-enabled pieces of anatomy. Methods: A new addition to TIPS-author are safety rules that are set by the SE and are automatically monitored during simulation. Errors are recorded as visual snapshots for feedback to the trainee. This paper reports on the implementation and opportunistic evaluation of the snap-shot mechanism as a trainee feedback mechanism. TIPS was field tested at two surgical conferences, one before and one after adding the snapshot feature. Results: While other ratings of TIPS remained unchanged for an overall Likert scale score of 5.24 out of 7 (7 equals very useful), the rating of the statement `The TIPS interface helps learners understand the force necessary to explore the anatomy' improved from 5.04 to 5.35 out of 7 after the snapshot mechanism was added. Conclusions: The ratings indicate the viability of the TIPS open-source2 E-authored surgical training units. Presenting SE-determined procedural missteps via the snapshot mechanism at the end of the training increases acceptance △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: How do I migrate this to cs.HC ? I need the identifier for a deadline and

arXiv:2110.02629 [pdf, other]

doi 10.1109/TCYB.2021.3111082

Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem

Authors: **gwen Li, Yining Ma, Ruize Gao, Zhiguang Cao, Andrew Lim, Wen Song, Jie Zhang

Abstract: Existing deep reinforcement learning (DRL) based methods for solving the capacitated vehicle routing problem (CVRP) intrinsically cope with homogeneous vehicle fleet, in which the fleet is assumed as repetitions of a single vehicle. Hence, their key to construct a solution solely lies in the selection of the next node (customer) to visit excluding the selection of vehicle. However, vehicles in rea… ▽ More Existing deep reinforcement learning (DRL) based methods for solving the capacitated vehicle routing problem (CVRP) intrinsically cope with homogeneous vehicle fleet, in which the fleet is assumed as repetitions of a single vehicle. Hence, their key to construct a solution solely lies in the selection of the next node (customer) to visit excluding the selection of vehicle. However, vehicles in real-world scenarios are likely to be heterogeneous with different characteristics that affect their capacity (or travel speed), rendering existing DRL methods less effective. In this paper, we tackle heterogeneous CVRP (HCVRP), where vehicles are mainly characterized by different capacities. We consider both min-max and min-sum objectives for HCVRP, which aim to minimize the longest or total travel time of the vehicle(s) in the fleet. To solve those problems, we propose a DRL method based on the attention mechanism with a vehicle selection decoder accounting for the heterogeneous fleet constraint and a node selection decoder accounting for the route construction, which learns to construct a solution by automatically selecting both a vehicle and a node for this vehicle at each step. Experimental results based on randomly generated instances show that, with desirable generalization to various problem sizes, our method outperforms the state-of-the-art DRL method and most of the conventional heuristics, and also delivers competitive performance against the state-of-the-art heuristic method, i.e., SISR. Additionally, the results of extended experiments demonstrate that our method is also able to solve CVRPLib instances with satisfactory performance. △ Less

Submitted 7 March, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: This paper has been accepted at IEEE Transactions on Cybernetics

arXiv:2109.11926 [pdf, other]

Sinkhorn Distributionally Robust Optimization

Authors: Jie Wang, Rui Gao, Yao Xie

Abstract: We study distributionally robust optimization (DRO) with Sinkhorn distance -- a variant of Wasserstein distance based on entropic regularization. We derive convex programming dual reformulation for general nominal distributions, transport costs, and loss functions. Compared with Wasserstein DRO, our proposed approach offers enhanced computational tractability for a broader class of loss functions,… ▽ More We study distributionally robust optimization (DRO) with Sinkhorn distance -- a variant of Wasserstein distance based on entropic regularization. We derive convex programming dual reformulation for general nominal distributions, transport costs, and loss functions. Compared with Wasserstein DRO, our proposed approach offers enhanced computational tractability for a broader class of loss functions, and the worst-case distribution exhibits greater plausibility in practical scenarios. To solve the dual reformulation, we develop a stochastic mirror descent algorithm with biased gradient oracles. Remarkably, this algorithm achieves near-optimal sample complexity for both smooth and nonsmooth loss functions, nearly matching the sample complexity of the Empirical Risk Minimization counterpart. Finally, we provide numerical examples using synthetic and real data to demonstrate its superior performance. △ Less

Submitted 3 June, 2023; v1 submitted 24 September, 2021; originally announced September 2021.

Comments: 57 pages, 9 figures

arXiv:2105.14348 [pdf, other]

Robust Hypothesis Testing with Wasserstein Uncertainty Sets

Authors: Liyan Xie, Rui Gao, Yao Xie

Abstract: We consider a data-driven robust hypothesis test where the optimal test will minimize the worst-case performance regarding distributions that are close to the empirical distributions with respect to the Wasserstein distance. This leads to a new non-parametric hypothesis testing framework based on distributionally robust optimization, which is more robust when there are limited samples for one or b… ▽ More We consider a data-driven robust hypothesis test where the optimal test will minimize the worst-case performance regarding distributions that are close to the empirical distributions with respect to the Wasserstein distance. This leads to a new non-parametric hypothesis testing framework based on distributionally robust optimization, which is more robust when there are limited samples for one or both hypotheses. Such a scenario often arises from applications such as health care, online change-point detection, and anomaly detection. We study the computational and statistical properties of the proposed test by presenting a tractable convex reformulation of the original infinite-dimensional variational problem exploiting Wasserstein's properties and characterizing the radii selection for the uncertainty sets. We also demonstrate the good performance of our method on synthetic and real data. △ Less

Submitted 29 May, 2021; originally announced May 2021.

arXiv:2105.10767 [pdf, ps, other]

Regularity of calibrated sub-actions for circle expanding maps and Sturmian optimization

Authors: Rui Gao

Abstract: In this short and elementary note, we study some ergodic optimization problems for circle expanding maps. We first make an observation that if a function is not far from being convex, then its calibrated sub-actions are closer to convex functions in certain effective way. As an application of this simple observation, for circle doubling map, we generalize a result of Bousch saying that translation… ▽ More In this short and elementary note, we study some ergodic optimization problems for circle expanding maps. We first make an observation that if a function is not far from being convex, then its calibrated sub-actions are closer to convex functions in certain effective way. As an application of this simple observation, for circle doubling map, we generalize a result of Bousch saying that translations of the cosine function are uniquely optimized by Sturmian measures. Our argument follows the mainline of Bousch's original proof, while the technical part is simplified by the observation mentioned above, and no numerical calculation is needed. △ Less

Submitted 1 April, 2022; v1 submitted 22 May, 2021; originally announced May 2021.

Comments: 11 pages

arXiv:2102.06449 [pdf, other]

Two-sample Test with Kernel Projected Wasserstein Distance

Authors: Jie Wang, Rui Gao, Yao Xie

Abstract: We develop a kernel projected Wasserstein distance for the two-sample test, an essential building block in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. This method operates by finding the nonlinear map** in the data space which maximizes the distance between projected distributions. In contrast to existing works about proje… ▽ More We develop a kernel projected Wasserstein distance for the two-sample test, an essential building block in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. This method operates by finding the nonlinear map** in the data space which maximizes the distance between projected distributions. In contrast to existing works about projected Wasserstein distance, the proposed method circumvents the curse of dimensionality more efficiently. We present practical algorithms for computing this distance function together with the non-asymptotic uncertainty quantification of empirical estimates. Numerical examples validate our theoretical results and demonstrate good performance of the proposed method. △ Less

Submitted 23 February, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: 50 pages, 11 figures, 3 tables, accepted as oral presentation in AISTATS-22

Journal ref: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (2022) 151:8022-8055

arXiv:2009.04382 [pdf]

Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality

Authors: Rui Gao

Abstract: Wasserstein distributionally robust optimization (DRO) aims to find robust and generalizable solutions by hedging against data perturbations in Wasserstein distance. Despite its recent empirical success in operations research and machine learning, existing performance guarantees for generic loss functions are either overly conservative due to the curse of dimensionality, or plausible only in large… ▽ More Wasserstein distributionally robust optimization (DRO) aims to find robust and generalizable solutions by hedging against data perturbations in Wasserstein distance. Despite its recent empirical success in operations research and machine learning, existing performance guarantees for generic loss functions are either overly conservative due to the curse of dimensionality, or plausible only in large sample asymptotics. In this paper, we develop a non-asymptotic framework for analyzing the out-of-sample performance for Wasserstein robust learning and the generalization bound for its related Lipschitz and gradient regularization problems. To the best of our knowledge, this gives the first finite-sample guarantee for generic Wasserstein DRO problems without suffering from the curse of dimensionality. Our results highlight that Wasserstein DRO, with a properly chosen radius, balances between the empirical mean of the loss and the variation of the loss, measured by the Lipschitz norm or the gradient norm of the loss. Our analysis is based on two novel methodological developments that are of independent interest: 1) a new concentration inequality controlling the decay rate of large deviation probabilities by the variation of the loss and, 2) a localized Rademacher complexity theory based on the variation of the loss. △ Less

Submitted 30 April, 2022; v1 submitted 9 September, 2020; originally announced September 2020.

arXiv:2007.12009 [pdf, ps, other]

On fair entropy of the tent family

Authors: Bing Gao, Rui Gao

Abstract: The notions of fair measure and fair entropy were introduced by Misiurewicz and Rodrigues recently, and discussed in detail for piecewise monotone interval maps. In particular, they showed that the fair entropy $h(a)$ of the tent map $f_a$, as a function of the parameter $a=\exp(h_{top}(f_a))$, is continuous and strictly increasing on $[\sqrt{2},2]$. In this short note, we extend the last result a… ▽ More The notions of fair measure and fair entropy were introduced by Misiurewicz and Rodrigues recently, and discussed in detail for piecewise monotone interval maps. In particular, they showed that the fair entropy $h(a)$ of the tent map $f_a$, as a function of the parameter $a=\exp(h_{top}(f_a))$, is continuous and strictly increasing on $[\sqrt{2},2]$. In this short note, we extend the last result and characterize regularity of the function $h$ precisely. We prove that $h$ is $\frac{1}{2}$-Hölder continuous on $[\sqrt{2},2]$ and identify its best Hölder exponent on each subinterval of $[\sqrt{2},2]$. On the other hand, parallel to a recent result on topological entropy of the quadratic family due to Dobbs and Mihalache, we give a formula of pointwise Hölder exponents of $h$ at parameters chosen in an explicitly constructed set of full measure. This formula particularly implies that the derivative of $h$ vanishes almost everywhere. △ Less

Submitted 23 July, 2020; originally announced July 2020.

arXiv:2007.11573 [pdf, other]

Autonomous Tracking and State Estimation with Generalised Group Lasso

Authors: Rui Gao, Simo Särkkä, Rubén Claveria-Vega, Simon Godsill

Abstract: We address the problem of autonomous tracking and state estimation for marine vessels, autonomous vehicles, and other dynamic signals under a (structured) sparsity assumption. The aim is to improve the tracking and estimation accuracy with respect to classical Bayesian filters and smoothers. We formulate the estimation problem as a dynamic generalised group Lasso problem and develop a class of smo… ▽ More We address the problem of autonomous tracking and state estimation for marine vessels, autonomous vehicles, and other dynamic signals under a (structured) sparsity assumption. The aim is to improve the tracking and estimation accuracy with respect to classical Bayesian filters and smoothers. We formulate the estimation problem as a dynamic generalised group Lasso problem and develop a class of smoothing-and-splitting methods to solve it. The Levenberg--Marquardt iterated extended Kalman smoother-based multi-block alternating direction method of multipliers (LM-IEKS-mADMM) algorithms are based on the alternating direction method of multipliers (ADMM) framework. This leads to minimisation subproblems with an inherent structure to which three new augmented recursive smoothers are applied. Our methods can deal with large-scale problems without pre-processing for dimensionality reduction. Moreover, the methods allow one to solve nonsmooth nonconvex optimisation problems. We then prove that under mild conditions, the proposed methods converge to a stationary point of the optimisation problem. By simulated and real-data experiments including multi-sensor range measurement problems, marine vessel tracking, autonomous vehicle tracking, and audio signal restoration, we show the practical effectiveness of the proposed methods. △ Less

Submitted 30 May, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

Comments: 14pags, 10 figures

arXiv:2005.08275 [pdf, other]

doi 10.1109/LSP.2020.3010159

Variable Splitting Methods for Constrained State Estimation in Partially Observed Markov Processes

Authors: Rui Gao, Filip Tronarp, Simo Särkkä

Abstract: In this paper, we propose a class of efficient, accurate, and general methods for solving state-estimation problems with equality and inequality constraints. The methods are based on recent developments in variable splitting and partially observed Markov processes. We first present the generalized framework based on variable splitting, then develop efficient methods to solve the state-estimation s… ▽ More In this paper, we propose a class of efficient, accurate, and general methods for solving state-estimation problems with equality and inequality constraints. The methods are based on recent developments in variable splitting and partially observed Markov processes. We first present the generalized framework based on variable splitting, then develop efficient methods to solve the state-estimation subproblems arising in the framework. The solutions to these subproblems can be made efficient by leveraging the Markovian structure of the model as is classically done in so-called Bayesian filtering and smoothing methods. The numerical experiments demonstrate that our methods outperform conventional optimization methods in computation cost as well as the estimation performance. △ Less

Submitted 17 July, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

Comments: 3 figures

arXiv:1910.09834 [pdf, other]

A hybrid stochastic differential reinsurance and investment game with bounded memory

Authors: Yanfei Bai, Zhongbao Zhou, Helu Xiao, Rui Gao, Feimin Zhong

Abstract: This paper investigates a hybrid stochastic differential reinsurance and investment game between one reinsurer and two insurers, including a stochastic Stackelberg differential subgame and a non-zero-sum stochastic differential subgame. The reinsurer, as the leader of the Stackelberg game, can price reinsurance premium and invest its wealth in a financial market that contains a risk-free asset and… ▽ More This paper investigates a hybrid stochastic differential reinsurance and investment game between one reinsurer and two insurers, including a stochastic Stackelberg differential subgame and a non-zero-sum stochastic differential subgame. The reinsurer, as the leader of the Stackelberg game, can price reinsurance premium and invest its wealth in a financial market that contains a risk-free asset and a risky asset. The two insurers, as the followers of the Stackelberg game, can purchase proportional reinsurance from the reinsurer and invest in the same financial market. The competitive relationship between two insurers is modeled by the non-zero-sum game, and their decision making will consider the relative performance measured by the difference in their terminal wealth. We consider wealth processes with delay to characterize the bounded memory feature. This paper aims to find the equilibrium strategy for the reinsurer and insurers by maximizing the expected utility of the reinsurer's terminal wealth with delay and maximizing the expected utility of the combination of insurers' terminal wealth and the relative performance with delay. By using the idea of backward induction and the dynamic programming approach, we derive the equilibrium strategy and value functions explicitly. Then, we provide the corresponding verification theorem. Finally, some numerical examples and sensitivity analysis are presented to demonstrate the effects of model parameters on the equilibrium strategy. We find the delay factor discourages or stimulates investment depending on the length of delay. Moreover, competitive factors between two insurers make their optimal reinsurance-investment strategy interact, and reduce reinsurance demand and reinsurance premium price. △ Less

Submitted 22 October, 2019; originally announced October 2019.

Comments: 35 pages, 9 figures, 9 tables

MSC Class: 90B50; 91B30

arXiv:1910.04364

Network Entropy based on Cluster Expansion on Motifs for Undirected Graphs

Authors: Ruize Gao, Ying Zhao

Abstract: The structure of the network can be described by motifs, which are subgraphs that often repeat themselves. In order to understand the structure of network motifs, it is of great importance to study subgraphs from the perspective of statistical mechanics. In this paper, we use clustering extensions in statistical physics to solve the problem of using motifs as network primitives. By projecting the… ▽ More The structure of the network can be described by motifs, which are subgraphs that often repeat themselves. In order to understand the structure of network motifs, it is of great importance to study subgraphs from the perspective of statistical mechanics. In this paper, we use clustering extensions in statistical physics to solve the problem of using motifs as network primitives. By projecting the network motifs to clusters in the gas model, we develop the partition function of the network, which enables us to calculate global thermodynamic quantities, such as energy, entropy, and vice versa. Then, we give the analytic expressions of the number of specific types of motifs and calculate their correlated entropy. We conduct algebraic experiments on datasets, both synthetic and in real life, and evaluate the qualitative and quantitative characterization of motif entropy derived from the partition function. Our findings show that the motif entropy of networks in real life, for instance, financial and stock market networks, is of high correlation to the change of network structure. Hence, our findings are consistent with recent studies about the similar topic that network motifs can be represented as basic elements of well-defined information processing functions. △ Less

Submitted 23 October, 2019; v1 submitted 10 October, 2019; originally announced October 2019.

Comments: arXiv admin note: This submission has been removed by arXiv administrators as the submitter did not have the right to agree to the license at the time of submission. Version 3 was an inappropriate replacement

arXiv:1905.11675 [pdf, ps, other]

Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems

Authors: Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Liwei Wang

Abstract: First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks. Second-order methods, despite their better convergence rate, are rarely used in practice due to the prohibitive computational cost in calculating the second-order information. In this paper, we propose a novel Gram-Gauss-Newton (GGN) algorithm to train deep neural n… ▽ More First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks. Second-order methods, despite their better convergence rate, are rarely used in practice due to the prohibitive computational cost in calculating the second-order information. In this paper, we propose a novel Gram-Gauss-Newton (GGN) algorithm to train deep neural networks for regression problems with square loss. Our method draws inspiration from the connection between neural network optimization and kernel regression of neural tangent kernel (NTK). Different from typical second-order methods that have heavy computational cost in each iteration, GGN only has minor overhead compared to first-order methods such as SGD. We also give theoretical results to show that for sufficiently wide neural networks, the convergence rate of GGN is \emph{quadratic}. Furthermore, we provide convergence guarantee for mini-batch GGN algorithm, which is, to our knowledge, the first convergence result for the mini-batch version of a second-order method on overparameterized neural networks. Preliminary experiments on regression tasks demonstrate that for training standard networks, our GGN algorithm converges much faster and achieves better performance than SGD. △ Less

Submitted 25 September, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

arXiv:1808.10156 [pdf, other]

Local stable and unstable sets for positive entropy $C^1$ dynamical systems

Authors: Shilin Feng, Rui Gao, Wen Huang, Zeng Lian

Abstract: For any $C^1$ diffeomorphism on a smooth compact Riemannian manifold that admits an ergodic measure with positive entropy, a lower bound of the Hausdorff dimension for the local stable and unstable sets is given in terms of the measure-theoretic entropy and the maximal Lyapunov exponent. The mainline of our approach to this result is under the settings of topological dynamical systems, which is al… ▽ More For any $C^1$ diffeomorphism on a smooth compact Riemannian manifold that admits an ergodic measure with positive entropy, a lower bound of the Hausdorff dimension for the local stable and unstable sets is given in terms of the measure-theoretic entropy and the maximal Lyapunov exponent. The mainline of our approach to this result is under the settings of topological dynamical systems, which is also applicable to infinite dimensional $C^1$ dynamical systems. △ Less

Submitted 30 August, 2018; originally announced August 2018.

arXiv:1805.10611 [pdf, other]

Robust Hypothesis Testing Using Wasserstein Uncertainty Sets

Authors: Rui Gao, Liyan Xie, Yao Xie, Huan Xu

Abstract: We develop a novel computationally efficient and general framework for robust hypothesis testing. The new framework features a new way to construct uncertainty sets under the null and the alternative distributions, which are sets centered around the empirical distribution defined via Wasserstein metric, thus our approach is data-driven and free of distributional assumptions. We develop a convex sa… ▽ More We develop a novel computationally efficient and general framework for robust hypothesis testing. The new framework features a new way to construct uncertainty sets under the null and the alternative distributions, which are sets centered around the empirical distribution defined via Wasserstein metric, thus our approach is data-driven and free of distributional assumptions. We develop a convex safe approximation of the minimax formulation and show that such approximation renders a nearly-optimal detector among the family of all possible tests. By exploiting the structure of the least favorable distribution, we also develop a tractable reformulation of such approximation, with complexity independent of the dimension of observation space and can be nearly sample-size-independent in general. Real-data example using human activity data demonstrated the excellent performance of the new robust detector. △ Less

Submitted 27 May, 2018; originally announced May 2018.

arXiv:1712.08015 [pdf, other]

Risk-Based Distributionally Robust Optimal Power Flow With Dynamic Line Rating

Authors: Cheng Wang, Rui Gao, Feng Qiu, Jianhui Wang, Linwei Xin

Abstract: In this paper, we propose a risk-based data-driven approach to optimal power flow (DROPF) with dynamic line rating. The risk terms, including penalties for load shedding, wind generation curtailment and line overload, are embedded into the objective function. To hedge against the uncertainties on wind generation data and line rating data, we consider a distributionally robust approach. The ambigui… ▽ More In this paper, we propose a risk-based data-driven approach to optimal power flow (DROPF) with dynamic line rating. The risk terms, including penalties for load shedding, wind generation curtailment and line overload, are embedded into the objective function. To hedge against the uncertainties on wind generation data and line rating data, we consider a distributionally robust approach. The ambiguity set is based on second-order moment and Wasserstein distance, which captures the correlations between wind generation outputs and line ratings, and is robust to data perturbation. We show that the proposed DROPF model can be reformulated as a conic program. Considering the relatively large number of constraints involved, an approximation of the proposed DROPF model is suggested, which significantly reduces the computational costs. A Wasserstein distance constrained DROPF and its tractable reformulation are also provided for practical large-scale test systems. Simulation results on the 5-bus, the IEEE 118-bus and the Polish 2736-bus test systems validate the effectiveness of the proposed models. △ Less

Submitted 21 December, 2017; originally announced December 2017.

arXiv:1712.06050 [pdf]

Wasserstein Distributionally Robust Optimization and Variation Regularization

Authors: Rui Gao, Xi Chen, Anton J. Kleywegt

Abstract: Wasserstein distributionally robust optimization (DRO) has recently achieved empirical success for various applications in operations research and machine learning, owing partly to its regularization effect. Although connection between Wasserstein DRO and regularization has been established in several settings, existing results often require restrictive assumptions, such as smoothness or convexity… ▽ More Wasserstein distributionally robust optimization (DRO) has recently achieved empirical success for various applications in operations research and machine learning, owing partly to its regularization effect. Although connection between Wasserstein DRO and regularization has been established in several settings, existing results often require restrictive assumptions, such as smoothness or convexity, that are not satisfied for many problems. In this paper, we develop a general theory on the variation regularization effect of the Wasserstein DRO - a new form of regularization that generalizes total-variation regularization, Lipschitz regularization and gradient regularization. Our results cover possibly non-convex and non-smooth losses and losses on non-Euclidean spaces. Examples include multi-item newsvendor, portfolio selection, linear prediction, neural networks, manifold learning, and intensity estimation for Poisson processes, etc. As an application of our theory of variation regularization, we derive new generalization guarantees for adversarial robust learning. △ Less

Submitted 30 October, 2020; v1 submitted 16 December, 2017; originally announced December 2017.

Comments: The paper is previously titled "Wasserstein Distributional Robustness and Regularization in Statistical Learning"

arXiv:1701.04200 [pdf, other]

Distributionally Robust Stochastic Optimization with Dependence Structure

Authors: Rui Gao, Anton J. Kleywegt

Abstract: Distributionally robust stochastic optimization (DRSO) is a framework for decision-making problems under certainty, which finds solutions that perform well for a chosen set of probability distributions. Many different approaches for specifying a set of distributions have been proposed. The choice matters, because it affects the results, and the relative performance of different choices depend on t… ▽ More Distributionally robust stochastic optimization (DRSO) is a framework for decision-making problems under certainty, which finds solutions that perform well for a chosen set of probability distributions. Many different approaches for specifying a set of distributions have been proposed. The choice matters, because it affects the results, and the relative performance of different choices depend on the characteristics of the problems. In this paper, we consider problems in which different random variables exhibit some form of dependence, but the exact values of the parameters that represent the dependence are not known. We consider various sets of distributions that incorporate the dependence structure, and we study the corresponding DRSO problems. In the first part of the paper, we consider problems with linear dependence between random variables. We consider sets of distributions that are within a specified Wasserstein distance of a nominal distribution, and that satisfy a second-order moment constraint. We obtain a tractable dual reformulation of the corresponding DRSO problem. This approach is compared with the traditional moment-based DRSO and Wasserstein-based DRSO with no moment constraints. Numerical experiments suggest that our new formulation has superior out-of-sample performance. In the second part of the paper, we consider problems with various types of rank dependence between random variables, including rank dependence measured by Spearman's footrule distance between empirical rankings, comonotonic distributions, box uncertainty for individual observations, and Wasserstein distance between copulas associated with continuous distributions. We also obtain a dual reformulation of the DRSO problem. A desirable byproduct of the formulation is that it also avoids an issue associated with the one-sided moment constraints in moment-based DRSO problems. △ Less

Submitted 16 January, 2017; originally announced January 2017.

arXiv:1604.02199 [pdf, other]

Distributionally Robust Stochastic Optimization with Wasserstein Distance

Authors: Rui Gao, Anton J. Kleywegt

Abstract: Distributionally robust stochastic optimization (DRSO) is an approach to optimization under uncertainty in which, instead of assuming that there is a known true underlying probability distribution, one hedges against a chosen set of distributions. In this paper we first point out that the set of distributions should be chosen to be appropriate for the application at hand, and that some of the choi… ▽ More Distributionally robust stochastic optimization (DRSO) is an approach to optimization under uncertainty in which, instead of assuming that there is a known true underlying probability distribution, one hedges against a chosen set of distributions. In this paper we first point out that the set of distributions should be chosen to be appropriate for the application at hand, and that some of the choices that have been popular until recently are, for many applications, not good choices. We next consider sets of distributions that are within a chosen Wasserstein distance from a nominal distribution. Such a choice of sets has two advantages: (1) The resulting distributions hedged against are more reasonable than those resulting from other popular choices of sets. (2) The problem of determining the worst-case expectation over the resulting set of distributions has desirable tractability properties. We derive a strong duality reformulation of the corresponding DRSO problem and construct approximate worst-case distributions explicitly via the first-order optimality conditions of the dual problem. Our contributions are four-fold. (i) We identify necessary and sufficient conditions for the existence of a worst-case distribution, which are naturally related to the growth rate of the objective function. (ii) We show that the worst-case distributions resulting from an appropriate Wasserstein distance have a concise structure and a clear interpretation. (iii) Using this structure, we show that data-driven DRSO problems can be approximated to any accuracy by robust optimization problems, and thereby many DRSO problems become tractable by using tools from robust optimization. (iv) Our strong duality result holds in a very general setting. As examples, we show that it can be applied to infinite-dimensional process control and intensity estimation for point processes. △ Less

Submitted 30 April, 2022; v1 submitted 7 April, 2016; originally announced April 2016.

Comments: Accepted by Mathematic of Operations Research

MSC Class: 90C15; 90C46

arXiv:1207.2702 [pdf, other]

Analytic skew-products of quadratic polynomials over Misiurewicz-Thurston maps

Authors: Rui Gao, Weixiao Shen

Abstract: We consider skew-products of quadratic maps over certain Misiurewicz-Thurston maps and study their statistical properties. We prove that, when the coupling function is a polynomial of odd degree, such a system admits two positive Lyapunov exponents almost everywhere and a unique absolutely continuous invariant probability measure. We consider skew-products of quadratic maps over certain Misiurewicz-Thurston maps and study their statistical properties. We prove that, when the coupling function is a polynomial of odd degree, such a system admits two positive Lyapunov exponents almost everywhere and a unique absolutely continuous invariant probability measure. △ Less

Submitted 11 July, 2012; originally announced July 2012.

Comments: 23 pages

arXiv:1109.1381 [pdf, ps, other]

doi 10.3792/pjaa.88.41

The Shi arrangement of the type $D_\ell$

Authors: Ruimei Gao, Donghe Pei, Hiroaki Terao

Abstract: In this paper, we give a basis for the derivation module of the cone over the Shi arrangement of the type $D_\ell$ explicitly. In this paper, we give a basis for the derivation module of the cone over the Shi arrangement of the type $D_\ell$ explicitly. △ Less

Submitted 7 September, 2011; originally announced September 2011.

MSC Class: 32S22; 05E15

Journal ref: Proc. Japan Acad. Ser. A Math. Sci., 88 (2012), 41-45

Showing 1–32 of 32 results for author: Gao, R