Search | arXiv e-print repository

Uniform Convergence of Adversarially Robust Classifiers

Abstract: In recent years there has been significant interest in the effect of different types of adversarial perturbations in data classification problems. Many of these models incorporate the adversarial power, which is an important parameter with an associated trade-off between accuracy and robustness. This work considers a general framework for adversarially-perturbed classification problems, in a large… ▽ More In recent years there has been significant interest in the effect of different types of adversarial perturbations in data classification problems. Many of these models incorporate the adversarial power, which is an important parameter with an associated trade-off between accuracy and robustness. This work considers a general framework for adversarially-perturbed classification problems, in a large data or population-level limit. In such a regime, we demonstrate that as adversarial strength goes to zero that optimal classifiers converge to the Bayes classifier in the Hausdorff distance. This significantly strengthens previous results, which generally focus on $L^1$-type convergence. The main argument relies upon direct geometric comparisons and is inspired by techniques from geometric measure theory. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 36 pages, 2 figures

MSC Class: 28A75; 62G35; 68Q32; 35B25

arXiv:2404.19255 [pdf, other]

On a Family of Relaxed Gradient Descent Methods for Quadratic Minimization

Authors: Liam MacDonald, Rua Murray, Rachael Tappenden

Abstract: This paper studies the convergence properties of a family of Relaxed $\ell$-Minimal Gradient Descent methods for quadratic optimization; the family includes the omnipresent Steepest Descent method, as well as the Minimal Gradient method. Simple proofs are provided that show, in an appropriately chosen norm, the gradient and the distance of the iterates from optimality converge linearly, for all me… ▽ More This paper studies the convergence properties of a family of Relaxed $\ell$-Minimal Gradient Descent methods for quadratic optimization; the family includes the omnipresent Steepest Descent method, as well as the Minimal Gradient method. Simple proofs are provided that show, in an appropriately chosen norm, the gradient and the distance of the iterates from optimality converge linearly, for all members of the family. Moreover, the function values decrease linearly, and iteration complexity results are provided. All theoretical results hold when (fixed) relaxation is employed. It is also shown that, given a fixed overhead and storage budget, every Relaxed $\ell$-Minimal Gradient Descent method can be implemented using exactly one matrix vector product. Numerical experiments are presented that illustrate the benefits of relaxation across the family. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 23 pages, 6 figures, 2 tables

arXiv:2312.17610 [pdf, ps, other]

Wellposedness and singularity formation beyond the Yudovich class

Authors: Tarek M. Elgindi, Ryan W. Murray, Ayman R. Said

Abstract: We introduce a local-in-time existence and uniqueness class for solutions to the 2d Euler equation with unbounded vorticity. Furthermore, we show that solutions belonging to this class can develop stronger singularities in finite time, meaning that they experience finite time blow up and exit the wellposedness class. Such solutions may be continued as weak solutions (potentially non-uniquely) afte… ▽ More We introduce a local-in-time existence and uniqueness class for solutions to the 2d Euler equation with unbounded vorticity. Furthermore, we show that solutions belonging to this class can develop stronger singularities in finite time, meaning that they experience finite time blow up and exit the wellposedness class. Such solutions may be continued as weak solutions (potentially non-uniquely) after the singularity. While the general dynamics of 2d Euler solutions beyond the Yudovich class will certainly not be so tame, studying such solutions gives a way to study singular phenomena in a more controlled setting. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2311.08316 [pdf, other]

CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)

Authors: Maksim Melnichenko, Oleg Balabanov, Riley Murray, James Demmel, Michael W. Mahoney, Piotr Luszczek

Abstract: This paper develops and analyzes a new algorithm for QR decomposition with column pivoting (QRCP) of rectangular matrices with large row counts. The algorithm combines methods from randomized numerical linear algebra in a particularly careful way in order to accelerate both pivot decisions for the input matrix and the process of decomposing the pivoted matrix into the QR form. The source of the la… ▽ More This paper develops and analyzes a new algorithm for QR decomposition with column pivoting (QRCP) of rectangular matrices with large row counts. The algorithm combines methods from randomized numerical linear algebra in a particularly careful way in order to accelerate both pivot decisions for the input matrix and the process of decomposing the pivoted matrix into the QR form. The source of the latter acceleration is a use of randomized preconditioning and CholeskyQR. Comprehensive analysis is provided in both exact and finite-precision arithmetic to characterize the algorithm's rank-revealing properties and its numerical stability granted probabilistic assumptions of the sketching operator. An implementation of the proposed algorithm is described and made available inside the open-source RandLAPACK library, which itself relies on RandBLAS - also available in open-source format. Experiments with this implementation on an Intel Xeon Gold 6248R CPU demonstrate order-of-magnitude speedups relative to LAPACK's standard function for QRCP, and comparable performance to a specialized algorithm for unpivoted QR of tall matrices, which lacks the strong rank-revealing properties of the proposed method. △ Less

Submitted 4 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: v1: 26 pages in the body, 10 pages in the appendices, 10 figures. v2: performance experiments now use a larger sketch size for CQRRPT

arXiv:2302.11474 [pdf, other]

Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software

Authors: Riley Murray, James Demmel, Michael W. Mahoney, N. Benjamin Erichson, Maksim Melnichenko, Osman Asif Malik, Laura Grigori, Piotr Luszczek, Michał Dereziński, Miles E. Lopes, Tianyu Liang, Hengrui Luo, Jack Dongarra

Abstract: Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations. The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more ef… ▽ More Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations. The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more efficiently than deterministic algorithms. This idea proved fruitful in the development of scalable algorithms for machine learning and statistical data analysis applications. However, RandNLA's true potential only came into focus upon integration with the fields of numerical analysis and "classical" numerical linear algebra. Through the efforts of many individuals, randomized algorithms have been developed that provide full control over the accuracy of their solutions and that can be every bit as reliable as algorithms that might be found in libraries such as LAPACK. Recent years have even seen the incorporation of certain RandNLA methods into MATLAB, the NAG Library, NVIDIA's cuSOLVER, and SciKit-Learn. For all its success, we believe that RandNLA has yet to realize its full potential. In particular, we believe the scientific community stands to benefit significantly from suitably defined "RandBLAS" and "RandLAPACK" libraries, to serve as standards conceptually analogous to BLAS and LAPACK. This 200-page monograph represents a step toward defining such standards. In it, we cover topics spanning basic sketching, least squares and optimization, low-rank approximation, full matrix decompositions, leverage score sampling, and sketching data with tensor product structures (among others). Much of the provided pseudo-code has been tested via publicly available MATLAB and Python implementations. △ Less

Submitted 12 April, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: v1: this is the first arXiv release of LAPACK Working Note 299. v2: complete rewrite of the subsection on trace estimation, among other changes. See frontmatter page ii (pdf page 5) for revision history

arXiv:2301.12584 [pdf, other]

Fast Exact Leverage Score Sampling from Khatri-Rao Products with Applications to Tensor Decomposition

Authors: Vivek Bharadwaj, Osman Asif Malik, Riley Murray, Laura Grigori, Aydin Buluc, James Demmel

Abstract: We present a data structure to randomly sample rows from the Khatri-Rao product of several matrices according to the exact distribution of its leverage scores. Our proposed sampler draws each row in time logarithmic in the height of the Khatri-Rao product and quadratic in its column count, with persistent space overhead at most the size of the input matrices. As a result, it tractably draws sample… ▽ More We present a data structure to randomly sample rows from the Khatri-Rao product of several matrices according to the exact distribution of its leverage scores. Our proposed sampler draws each row in time logarithmic in the height of the Khatri-Rao product and quadratic in its column count, with persistent space overhead at most the size of the input matrices. As a result, it tractably draws samples even when the matrices forming the Khatri-Rao product have tens of millions of rows each. When used to sketch the linear least squares problems arising in CANDECOMP / PARAFAC tensor decomposition, our method achieves lower asymptotic complexity per solve than recent state-of-the-art methods. Experiments on billion-scale sparse tensors validate our claims, with our algorithm achieving higher accuracy than competing methods as the decomposition rank grows. △ Less

Submitted 28 February, 2024; v1 submitted 29 January, 2023; originally announced January 2023.

Comments: The 37th Conference on Neural Information Processing Systems (Neurips'23). 28 pages, 10 figures, 6 tables

arXiv:2301.01659

Non-Linear Singularity Formation for Circular Vortex Sheets

Authors: Ryan Murray, Galen Wilcox

Abstract: We study the evolution of vortex sheets according to the Birkhoff-Rott equation, which describe the motion of sharp shear interfaces governed by the incompressible Euler equation in two dimensions. In a recent work, the authors demonstrated within this context a marginal linear stability of circular vortex sheets, standing in sharp contrast with classical instability of the flat vortex sheet, whic… ▽ More We study the evolution of vortex sheets according to the Birkhoff-Rott equation, which describe the motion of sharp shear interfaces governed by the incompressible Euler equation in two dimensions. In a recent work, the authors demonstrated within this context a marginal linear stability of circular vortex sheets, standing in sharp contrast with classical instability of the flat vortex sheet, which is known as the Kelvin-Helmholtz instability. This article continues that analysis by investigating how non-linear effects induce singularity formation near the circular vortex sheet. In high-frequency regimes, the singularity formation is primarily driven by a complex-valued, conjugated Burgers equation, which we study by modifying a classical argument from hyperbolic conservation laws. This provides a deeper understanding of the mechanisms driving the breakdown of circular vortex sheets, which are observed both numerically and experimentally. △ Less

Submitted 1 September, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: The paper "The Influence of Vortex Sheet Geometry on the Kelvin-Helmholtz Instability" contains a critical error which is repeated in this follow-up work. The mistake renders the linear and nonlinear transport equations (1.4) and (2.4) incorrect

arXiv:2211.08418 [pdf, other]

On the long-time behavior of scale-invariant solutions to the 2d Euler equation and applications

Authors: Tarek. M. Elgindi, Ryan. W. Murray, Ayman. R. Said

Abstract: We study the long-time behavior of scale-invariant solutions of the 2d Euler equation satisfying a discrete symmetry. We show that all scale-invariant solutions with bounded variation on $\mathbb{S}^1$ relax to states that are piece-wise constant with finitely many jumps. All continuous scale-invariant solutions become singular and homogenize in infinite time. On $\mathbb{R}^2$, this corresponds… ▽ More We study the long-time behavior of scale-invariant solutions of the 2d Euler equation satisfying a discrete symmetry. We show that all scale-invariant solutions with bounded variation on $\mathbb{S}^1$ relax to states that are piece-wise constant with finitely many jumps. All continuous scale-invariant solutions become singular and homogenize in infinite time. On $\mathbb{R}^2$, this corresponds to generic infinite-time spiral and cusp formation. The main tool in our analysis is the discovery of a monotone quantity that measures the number of particles that are moving away from the origin. This monotonicity also applies locally to solutions of the 2d Euler equation that are $m$-fold symmetric ($m\geq 4$) and have radial limits at the point of symmetry. Our results are also applicable to the Euler equation on a large class of surfaces of revolution (like $\mathbb{S}^2$ and $\mathbb{T}^2$). Our analysis then gives generic spiraling of trajectories and infinite-time loss of regularity for globally smooth solutions on any such smooth surface, under a discrete symmetry. △ Less

Submitted 15 November, 2022; originally announced November 2022.

MSC Class: 35Qxx

arXiv:2211.03585

The Influence of Vortex Sheet Geometry on the Kelvin-Helmholtz Instability

Authors: Ryan Murray, Galen Wilcox

Abstract: This article revisits the instability of sharp shear interfaces, also called vortex sheets, in incompressible fluid flows. We study the Birkhoff-Rott equation, which describes the motion of vortex sheets according to the incompressible Euler equations in two dimensions. The classical Kelvin-Helmholtz instability demonstrates that an infinite, flat vortex sheet has a strong linear instability. We s… ▽ More This article revisits the instability of sharp shear interfaces, also called vortex sheets, in incompressible fluid flows. We study the Birkhoff-Rott equation, which describes the motion of vortex sheets according to the incompressible Euler equations in two dimensions. The classical Kelvin-Helmholtz instability demonstrates that an infinite, flat vortex sheet has a strong linear instability. We show that this is not the case for circular vortex sheets: such a configuration has a delicate linear stability, and is the first example of a linearly stable solution to the Birkhoff-Rott equation. We subsequently derive a sufficient condition for linear instability of a circular vortex sheet for a family of generalized Birkhoff-Rott kernels, and prove that a common regularized kernel used in numerical simulation and analysis destabilizes the circular vortex sheet. Absent a destabilizing kernel regularization, our work suggests that the nonlinear dynamics are critical for understanding circular vortex sheet instability, and so the essential mechanism of the Kelvin-Helmholtz instability is dependent on global vortex sheet geometry. As expected, nonlinear numerical simulations utilizing the regularized kernel exhibit unstable behavior. Finally, we show experimental results which qualitatively match the types of instabilities that are observed numerically, demonstrating the persistence of the Kelvin-Helmholtz instability in real circular shear flows. △ Less

Submitted 1 September, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: Regretfully, we have found a mistake that renders the main result (Theorem 2.2) incorrect. It is necessary to treat positive and negative Fourier modes separately in order to satisfy analyticity requirements when evaluating the integral (2.26). Proceeding through similar steps with negative modes, the matrix analogous to (2.31) has positive eigenvalues, implying instability

arXiv:2210.05105 [pdf, other]

doi 10.1145/3626183.3659980

Distributed-Memory Randomized Algorithms for Sparse Tensor CP Decomposition

Authors: Vivek Bharadwaj, Osman Asif Malik, Riley Murray, Aydin Buluç, James Demmel

Abstract: Candecomp / PARAFAC (CP) decomposition, a generalization of the matrix singular value decomposition to higher-dimensional tensors, is a popular tool for analyzing multidimensional sparse data. On tensors with billions of nonzero entries, computing a CP decomposition is a computationally intensive task. We propose the first distributed-memory implementations of two randomized CP decomposition algor… ▽ More Candecomp / PARAFAC (CP) decomposition, a generalization of the matrix singular value decomposition to higher-dimensional tensors, is a popular tool for analyzing multidimensional sparse data. On tensors with billions of nonzero entries, computing a CP decomposition is a computationally intensive task. We propose the first distributed-memory implementations of two randomized CP decomposition algorithms, CP-ARLS-LEV and STS-CP, that offer nearly an order-of-magnitude speedup at high decomposition ranks over well-tuned non-randomized decomposition packages. Both algorithms rely on leverage score sampling and enjoy strong theoretical guarantees, each with varying time and accuracy tradeoffs. We tailor the communication schedule for our random sampling algorithms, eliminating expensive reduction collectives and forcing communication costs to scale with the random sample count. Finally, we optimize the local storage format for our methods, switching between analogues of compressed sparse column and compressed sparse row formats. Experiments show that our methods are fast and scalable, producing 11x speedup over SPLATT by decomposing the billion-scale Reddit tensor on 512 CPU cores in under two minutes. △ Less

Submitted 27 April, 2024; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: To appear in the Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'24). 14 pages, 13 figures, 5 tables

arXiv:2210.03828 [pdf, other]

Sampling-Based Decomposition Algorithms for Arbitrary Tensor Networks

Authors: Osman Asif Malik, Vivek Bharadwaj, Riley Murray

Abstract: We show how to develop sampling-based alternating least squares (ALS) algorithms for decomposition of tensors into any tensor network (TN) format. Provided the TN format satisfies certain mild assumptions, resulting algorithms will have input sublinear per-iteration cost. Unlike most previous works on sampling-based ALS methods for tensor decomposition, the sampling in our framework is done accord… ▽ More We show how to develop sampling-based alternating least squares (ALS) algorithms for decomposition of tensors into any tensor network (TN) format. Provided the TN format satisfies certain mild assumptions, resulting algorithms will have input sublinear per-iteration cost. Unlike most previous works on sampling-based ALS methods for tensor decomposition, the sampling in our framework is done according to the exact leverage score distribution of the design matrices in the ALS subproblems. We implement and test two tensor decomposition algorithms that use our sampling framework in a feature extraction experiment where we compare them against a number of other decomposition algorithms. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: 20 pages, 8 figures

arXiv:2209.02305 [pdf, ps, other]

Rates of Convergence for Regression with the Graph Poly-Laplacian

Authors: Nicolás García Trillos, Ryan Murray, Matthew Thorpe

Abstract: In the (special) smoothing spline problem one considers a variational problem with a quadratic data fidelity penalty and Laplacian regularisation. Higher order regularity can be obtained via replacing the Laplacian regulariser with a poly-Laplacian regulariser. The methodology is readily adapted to graphs and here we consider graph poly-Laplacian regularisation in a fully supervised, non-parametri… ▽ More In the (special) smoothing spline problem one considers a variational problem with a quadratic data fidelity penalty and Laplacian regularisation. Higher order regularity can be obtained via replacing the Laplacian regulariser with a poly-Laplacian regulariser. The methodology is readily adapted to graphs and here we consider graph poly-Laplacian regularisation in a fully supervised, non-parametric, noise corrupted, regression problem. In particular, given a dataset $\{x_i\}_{i=1}^n$ and a set of noisy labels $\{y_i\}_{i=1}^n\subset\mathbb{R}$ we let $u_n:\{x_i\}_{i=1}^n\to\mathbb{R}$ be the minimiser of an energy which consists of a data fidelity term and an appropriately scaled graph poly-Laplacian term. When $y_i = g(x_i)+ξ_i$, for iid noise $ξ_i$, and using the geometric random graph, we identify (with high probability) the rate of convergence of $u_n$ to $g$ in the large data limit $n\to\infty$. Furthermore, our rate, up to logarithms, coincides with the known rate of convergence in the usual smoothing spline model. △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2201.05274 [pdf, other]

Eikonal depth: an optimal control approach to statistical depths

Authors: Martin Molina-Fructuoso, Ryan Murray

Abstract: Statistical depths provide a fundamental generalization of quantiles and medians to data in higher dimensions. This paper proposes a new type of globally defined statistical depth, based upon control theory and eikonal equations, which measures the smallest amount of probability density that has to be passed through in a path to points outside the support of the distribution: for example spatial i… ▽ More Statistical depths provide a fundamental generalization of quantiles and medians to data in higher dimensions. This paper proposes a new type of globally defined statistical depth, based upon control theory and eikonal equations, which measures the smallest amount of probability density that has to be passed through in a path to points outside the support of the distribution: for example spatial infinity. This depth is easy to interpret and compute, expressively captures multi-modal behavior, and extends naturally to data that is non-Euclidean. We prove various properties of this depth, and provide discussion of computational considerations. In particular, we demonstrate that this notion of depth is robust under an aproximate isometrically constrained adversarial model, a property which is not enjoyed by the Tukey depth. Finally we give some illustrative examples in the context of two-dimensional mixture models and MNIST. △ Less

Submitted 13 January, 2022; originally announced January 2022.

arXiv:2111.13613 [pdf, other]

doi 10.1093/imaiai/iaac029

The Geometry of Adversarial Training in Binary Classification

Authors: Leon Bungert, Nicolás García Trillos, Ryan Murray

Abstract: We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type $L^1+$ (nonlocal) $\operatorname{TV}$, a form frequently studied in imag… ▽ More We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type $L^1+$ (nonlocal) $\operatorname{TV}$, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense), and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks. △ Less

Submitted 1 August, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

MSC Class: 62G35; 49Q20; 68Q32; 65J20

Journal ref: Information and Inference: A Journal of the IMA, 2023

arXiv:2109.04082 [pdf, other]

Risk-Averse Decision Making Under Uncertainty

Authors: Mohamadreza Ahmadi, Ugo Rosolia, Michel D. Ingham, Richard M. Murray, Aaron D. Ames

Abstract: A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs), with application to artificial intelligence and operations research, among others. Traditionally, policy synthesis techniques are proposed such that a total expected cost or reward is minimized or maximized. However, optimality in the total expect… ▽ More A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs), with application to artificial intelligence and operations research, among others. Traditionally, policy synthesis techniques are proposed such that a total expected cost or reward is minimized or maximized. However, optimality in the total expected cost sense is only reasonable if system behavior in the large number of runs is of interest, which has limited the use of such policies in practical mission-critical scenarios, wherein large deviations from the expected behavior may lead to mission failure. In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures, which we refer to as the constrained risk-averse problem. For MDPs, we reformulate the problem into a infsup problem via the Lagrangian framework and propose an optimization-based method to synthesize Markovian policies. For MDPs, we demonstrate that the formulated optimization problems are in the form of difference convex programs (DCPs) and can be solved by the disciplined convex-concave programming (DCCP) framework. We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints. For POMDPs, we show that, if the coherent risk measures can be defined as a Markov risk transition map**, an infinite-dimensional optimization can be used to design Markovian belief-based policies. For stochastic finite-state controllers (FSCs), we show that the latter optimization simplifies to a (finite-dimensional) DCP and can be solved by the DCCP framework. We incorporate these DCPs in a policy iteration algorithm to design risk-averse FSCs for POMDPs. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2012.02423

arXiv:2107.00345 [pdf, other]

Algebraic perspectives on signomial optimization

Authors: Mareike Dressler, Riley Murray

Abstract: Signomials are obtained by generalizing polynomials to allow for arbitrary real exponents. This generalization offers great expressive power, but has historically sacrificed the organizing principle of ``degree'' that is central to polynomial optimization theory. We reclaim that principle here through the concept of signomial rings, which we use to derive complete convex relaxation hierarchies of… ▽ More Signomials are obtained by generalizing polynomials to allow for arbitrary real exponents. This generalization offers great expressive power, but has historically sacrificed the organizing principle of ``degree'' that is central to polynomial optimization theory. We reclaim that principle here through the concept of signomial rings, which we use to derive complete convex relaxation hierarchies of upper and lower bounds for signomial optimization via sums of arithmetic-geometric exponentials (SAGE) nonnegativity certificates. The Positivstellensatz underlying the lower bounds relies on the concept of conditional SAGE and removes regularity conditions required by earlier works, such as convexity and Archimedeanity of the feasible set. Through worked examples we illustrate the practicality of this hierarchy in areas such as chemical reaction network theory and chemical engineering. These examples include comparisons to direct global solvers (e.g., BARON and ANTIGONE) and the Lasserre hierarchy (where appropriate). The completeness of our hierarchy of upper bounds follows from a generic construction whereby a Positivstellensatz for signomial nonnegativity over a compact set provides for arbitrarily strong outer approximations of the corresponding cone of nonnegative signomials. While working toward that result, we prove basic facts on the existence and uniqueness of solutions to signomial moment problems. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: We welcome all comments and criticism from the community. 30 pages. Authors listed alphabetically

MSC Class: Primary: 14P99; 90C25. Secondary:90C23; 52A20; 28B15

arXiv:2104.01648 [pdf, other]

Tukey Depths and Hamilton-Jacobi Differential Equations

Authors: Martin Molina-Fructuoso, Ryan Murray

Abstract: The widespread application of modern machine learning has increased the need for robust statistical algorithms. This work studies one such fundamental statistical measure known as the Tukey depth. We study the problem in the continuum (population) limit. In particular, we derive the associated necessary conditions, which take the form of a first-order partial differential equation. We discuss the… ▽ More The widespread application of modern machine learning has increased the need for robust statistical algorithms. This work studies one such fundamental statistical measure known as the Tukey depth. We study the problem in the continuum (population) limit. In particular, we derive the associated necessary conditions, which take the form of a first-order partial differential equation. We discuss the classical interpretation of this necessary condition as the viscosity solution of a Hamilton-Jacobi equation, but with a non-classical Hamiltonian with discontinuous dependence on the gradient at zero. We prove that this equation possesses a unique viscosity solution and that this solution always bounds the Tukey depth from below. In certain cases, we prove that the Tukey depth is equal to the viscosity solution, and we give some illustrations of standard numerical methods from the optimal control community which deal directly with the partial differential equation. We conclude by outlining several promising research directions both in terms of new numerical algorithms and theoretical challenges. △ Less

Submitted 4 April, 2021; originally announced April 2021.

arXiv:2012.02423 [pdf, other]

Constrained Risk-Averse Markov Decision Processes

Authors: Mohamadreza Ahmadi, Ugo Rosolia, Michel D. Ingham, Richard M. Murray, Aaron D. Ames

Abstract: We consider the problem of designing policies for Markov decision processes (MDPs) with dynamic coherent risk objectives and constraints. We begin by formulating the problem in a Lagrangian framework. Under the assumption that the risk objectives and constraints can be represented by a Markov risk transition map**, we propose an optimization-based method to synthesize Markovian policies that low… ▽ More We consider the problem of designing policies for Markov decision processes (MDPs) with dynamic coherent risk objectives and constraints. We begin by formulating the problem in a Lagrangian framework. Under the assumption that the risk objectives and constraints can be represented by a Markov risk transition map**, we propose an optimization-based method to synthesize Markovian policies that lower-bound the constrained risk-averse problem. We demonstrate that the formulated optimization problems are in the form of difference convex programs (DCPs) and can be solved by the disciplined convex-concave programming (DCCP) framework. We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints. Finally, we illustrate the effectiveness of the proposed method with numerical experiments on a rover navigation problem involving conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures. △ Less

Submitted 28 March, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

Comments: Draft Accepted for Presentation at The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), Feb. 2-9, 2021

arXiv:2011.10797 [pdf, other]

Adversarial Classification: Necessary conditions and geometric flows

Authors: Nicolas Garcia Trillos, Ryan Murray

Abstract: We study a version of adversarial classification where an adversary is empowered to corrupt data inputs up to some distance $\varepsilon$, using tools from variational analysis. In particular, we describe necessary conditions associated with the optimal classifier subject to such an adversary. Using the necessary conditions, we derive a geometric evolution equation which can be used to track the c… ▽ More We study a version of adversarial classification where an adversary is empowered to corrupt data inputs up to some distance $\varepsilon$, using tools from variational analysis. In particular, we describe necessary conditions associated with the optimal classifier subject to such an adversary. Using the necessary conditions, we derive a geometric evolution equation which can be used to track the change in classification boundaries as $\varepsilon$ varies. This evolution equation may be described as an uncoupled system of differential equations in one dimension, or as a mean curvature type equation in higher dimension. In one dimension, and under mild assumptions on the data distribution, we rigorously prove that one can use the initial value problem starting from $\varepsilon=0$, which is simply the Bayes classifier, in order to solve for the global minimizer of the adversarial problem for small values of $\varepsilon$. In higher dimensions we provide a similar result, albeit conditional to the existence of regular solutions of the initial value problem. In the process of proving our main results we obtain a result of independent interest connecting the original adversarial problem with an optimal transport problem under no assumptions on whether classes are balanced or not. Numerical examples illustrating these ideas are also presented. △ Less

Submitted 11 March, 2022; v1 submitted 21 November, 2020; originally announced November 2020.

arXiv:2008.05387 [pdf, other]

Distributed Gradient Flow: Nonsmoothness, Nonconvexity, and Saddle Point Evasion

Authors: Brian Swenson, Ryan Murray, H. Vincent Poor, Soummya Kar

Abstract: The paper considers distributed gradient flow (DGF) for multi-agent nonconvex optimization. DGF is a continuous-time approximation of distributed gradient descent that is often easier to study than its discrete-time counterpart. The paper has two main contributions. First, the paper considers optimization of nonsmooth, nonconvex objective functions. It is shown that DGF converges to critical point… ▽ More The paper considers distributed gradient flow (DGF) for multi-agent nonconvex optimization. DGF is a continuous-time approximation of distributed gradient descent that is often easier to study than its discrete-time counterpart. The paper has two main contributions. First, the paper considers optimization of nonsmooth, nonconvex objective functions. It is shown that DGF converges to critical points in this setting. The paper then considers the problem of avoiding saddle points. It is shown that if agents' objective functions are assumed to be smooth and nonconvex, then DGF can only converge to a saddle point from a zero-measure set of initial conditions. To establish this result, the paper proves a stable manifold theorem for DGF, which is a fundamental contribution of independent interest. In a companion paper, analogous results are derived for discrete-time algorithms. △ Less

Submitted 12 August, 2020; originally announced August 2020.

arXiv:2006.06811 [pdf, ps, other]

Sublinear Circuits and the Constrained Signomial Nonnegativity Problem

Authors: Riley Murray, Helen Naumann, Thorsten Theobald

Abstract: Conditional Sums-of-AM/GM-Exponentials (conditional SAGE) is a decomposition method to prove nonnegativity of a signomial or polynomial over some subset $X$ of real space. In this article, we undertake the first structural analysis of conditional SAGE signomials for convex sets $X$. We introduce the $X$-circuits of a finite subset $\mathcal{A} \subset \mathbb{R}^n$, which generalize the simplicial… ▽ More Conditional Sums-of-AM/GM-Exponentials (conditional SAGE) is a decomposition method to prove nonnegativity of a signomial or polynomial over some subset $X$ of real space. In this article, we undertake the first structural analysis of conditional SAGE signomials for convex sets $X$. We introduce the $X$-circuits of a finite subset $\mathcal{A} \subset \mathbb{R}^n$, which generalize the simplicial circuits of the affine-linear matroid induced by $\mathcal{A}$ to a constrained setting. The $X$-circuits serve as the main tool in our analysis and exhibit particularly rich combinatorial properties for polyhedral $X$, in which case the set of $X$-circuits is comprised of one-dimensional cones of suitable polyhedral fans. The framework of $X$-circuits transparently reveals when an $X$-nonnegative conditional AM/GM-exponential can in fact be further decomposed as a sum of simpler $X$-nonnegative signomials. We develop a duality theory for $X$-circuits with connections to geometry of sets that are convex according to the geometric mean. This theory provides an optimal power cone reconstruction of conditional SAGE signomials when $X$ is polyhedral. In conjunction with a notion of reduced $X$-circuits, the duality theory facilitates a characterization of the extreme rays of conditional SAGE cones. Since signomials under logarithmic variable substitutions give polynomials, our results also have implications for nonnegative polynomials and polynomial optimization. △ Less

Submitted 20 January, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: Revised version, 31 pages

MSC Class: Primary: 14P05; 90C23; 90C30. Secondary: 05B35; 52A20

arXiv:2004.09304 [pdf, other]

From graph cuts to isoperimetric inequalities: Convergence rates of Cheeger cuts on data clouds

Authors: Nicolas Garcia Trillos, Ryan Murray, Matthew Thorpe

Abstract: In this work we study statistical properties of graph-based clustering algorithms that rely on the optimization of balanced graph cuts, the main example being the optimization of Cheeger cuts. We consider proximity graphs built from data sampled from an underlying distribution supported on a generic smooth compact manifold $M$. In this setting, we obtain high probability convergence rates for both… ▽ More In this work we study statistical properties of graph-based clustering algorithms that rely on the optimization of balanced graph cuts, the main example being the optimization of Cheeger cuts. We consider proximity graphs built from data sampled from an underlying distribution supported on a generic smooth compact manifold $M$. In this setting, we obtain high probability convergence rates for both the Cheeger constant and the associated Cheeger cuts towards their continuum counterparts. The key technical tools are careful estimates of interpolation operators which lift empirical Cheeger cuts to the continuum, as well as continuum stability estimates for isoperimetric problems. To our knowledge the quantitative estimates obtained here are the first of their kind. △ Less

Submitted 11 March, 2022; v1 submitted 20 April, 2020; originally announced April 2020.

MSC Class: 49Q20

arXiv:2004.04227 [pdf, other]

Formal Test Synthesis for Safety-Critical Autonomous Systems based on Control Barrier Functions

Authors: Prithvi Akella, Mohamadreza Ahmadi, Richard M. Murray, Aaron D. Ames

Abstract: The prolific rise in autonomous systems has led to questions regarding their safe instantiation in real-world scenarios. Failures in safety-critical contexts such as human-robot interactions or even autonomous driving can ultimately lead to loss of life. In this context, this paper aims to provide a method by which one can algorithmically test and evaluate an autonomous system. Given a black-box a… ▽ More The prolific rise in autonomous systems has led to questions regarding their safe instantiation in real-world scenarios. Failures in safety-critical contexts such as human-robot interactions or even autonomous driving can ultimately lead to loss of life. In this context, this paper aims to provide a method by which one can algorithmically test and evaluate an autonomous system. Given a black-box autonomous system with some operational specifications, we construct a minimax problem based on control barrier functions to generate a family of test parameters designed to optimally evaluate whether the system can satisfy the specifications. To illustrate our results, we utilize the Robotarium as a case study for an autonomous system that claims to satisfy waypoint navigation and obstacle avoidance simultaneously. We demonstrate that the proposed test synthesis framework systematically finds those sequences of events (tests) that identify points of system failure. △ Less

Submitted 8 April, 2020; originally announced April 2020.

arXiv:2003.02818 [pdf, other]

Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima

Authors: Brian Swenson, Ryan Murray, H. Vincent Poor, Soummya Kar

Abstract: In centralized settings, it is well known that stochastic gradient descent (SGD) avoids saddle points and converges to local minima in nonconvex problems. However, similar guarantees are lacking for distributed first-order algorithms. The paper studies distributed stochastic gradient descent (D-SGD)--a simple network-based implementation of SGD. Conditions under which D-SGD avoids saddle points an… ▽ More In centralized settings, it is well known that stochastic gradient descent (SGD) avoids saddle points and converges to local minima in nonconvex problems. However, similar guarantees are lacking for distributed first-order algorithms. The paper studies distributed stochastic gradient descent (D-SGD)--a simple network-based implementation of SGD. Conditions under which D-SGD avoids saddle points and converges to local minima are studied. First, we consider the problem of computing critical points. Assuming loss functions are nonconvex and possibly nonsmooth, it is shown that, for each fixed initialization, D-SGD converges to critical points of the loss with probability one. Next, we consider the problem of avoiding saddle points. In this case, we again assume that loss functions may be nonconvex and nonsmooth, but are smooth in a neighborhood of a saddle point. It is shown that, for any fixed initialization, D-SGD avoids such saddle points with probability one. Results are proved by studying the underlying (distributed) gradient flow, using the ordinary differential equation (ODE) method of stochastic approximation, and extending classical techniques from dynamical systems theory such as stable manifolds. Results are proved in the general context of subspace-constrained optimization, of which D-SGD is a special case. △ Less

Submitted 4 March, 2022; v1 submitted 5 March, 2020; originally announced March 2020.

arXiv:1912.04867 [pdf, other]

Robust Market Equilibria with Uncertain Preferences

Authors: Riley Murray, Christian Kroer, Alex Peysakhovich, Parikshit Shah

Abstract: The problem of allocating scarce items to individuals is an important practical question in market design. An increasingly popular set of mechanisms for this task uses the concept of market equilibrium: individuals report their preferences, have a budget of real or fake currency, and a set of prices for items and allocations is computed that sets demand equal to supply. An important real world iss… ▽ More The problem of allocating scarce items to individuals is an important practical question in market design. An increasingly popular set of mechanisms for this task uses the concept of market equilibrium: individuals report their preferences, have a budget of real or fake currency, and a set of prices for items and allocations is computed that sets demand equal to supply. An important real world issue with such mechanisms is that individual valuations are often only imperfectly known. In this paper, we show how concepts from classical market equilibrium can be extended to reflect such uncertainty. We show that in linear, divisible Fisher markets a robust market equilibrium (RME) always exists; this also holds in settings where buyers may retain unspent money. We provide theoretical analysis of the allocative properties of RME in terms of envy and regret. Though RME are hard to compute for general uncertainty sets, we consider some natural and tractable uncertainty sets which lead to well behaved formulations of the problem that can be solved via modern convex programming methods. Finally, we show that very mild uncertainty about valuations can cause RME allocations to outperform those which take estimates as having no underlying uncertainty. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: Extended preprint of an article accepted to AAAI-20. Contains supplementary material as appendices. Due to figures, this manuscript is best printed in color

MSC Class: Primary 91B52. Secondary 91B25; 68T37; 90C47; 90C25

arXiv:1911.08626 [pdf, other]

Intermittent Connectivity for Exploration in Communication-Constrained Multi-Agent Systems

Authors: Filip Klaesson, Petter Nilsson, Aaron D. Ames, Richard M. Murray

Abstract: Motivated by exploration of communication-constrained underground environments using robot teams, we study the problem of planning for intermittent connectivity in multi-agent systems. We propose a novel concept of information-consistency to handle situations where the plan is not initially known by all agents, and suggest an integer linear program for synthesizing information-consistent plans tha… ▽ More Motivated by exploration of communication-constrained underground environments using robot teams, we study the problem of planning for intermittent connectivity in multi-agent systems. We propose a novel concept of information-consistency to handle situations where the plan is not initially known by all agents, and suggest an integer linear program for synthesizing information-consistent plans that also achieve auxiliary goals. Furthermore, inspired by network flow problems we propose a novel way to pose connectivity constraints that scales much better than previous methods. In the second part of the paper we apply these results in an exploration setting, and propose a clustering method that separates a large exploration problem into smaller problems that can be solved independently. We demonstrate how the resulting exploration algorithm is able to coordinate a team of ten agents to explore a large environment. △ Less

Submitted 19 November, 2019; originally announced November 2019.

arXiv:1909.12499 [pdf, other]

Risk-Averse Planning Under Uncertainty

Authors: Mohamadreza Ahmadi, Masahiro Ono, Michel D. Ingham, Richard M. Murray, Aaron D. Ames

Abstract: We consider the problem of designing policies for partially observable Markov decision processes (POMDPs) with dynamic coherent risk objectives. Synthesizing risk-averse optimal policies for POMDPs requires infinite memory and thus undecidable. To overcome this difficulty, we propose a method based on bounded policy iteration for designing stochastic but finite state (memory) controllers, which ta… ▽ More We consider the problem of designing policies for partially observable Markov decision processes (POMDPs) with dynamic coherent risk objectives. Synthesizing risk-averse optimal policies for POMDPs requires infinite memory and thus undecidable. To overcome this difficulty, we propose a method based on bounded policy iteration for designing stochastic but finite state (memory) controllers, which takes advantage of standard convex optimization methods. Given a memory budget and optimality criterion, the proposed method modifies the stochastic finite state controller leading to sub-optimal solutions with lower coherent risk. △ Less

Submitted 27 September, 2019; originally announced September 2019.

arXiv:1908.02747 [pdf, ps, other]

Distributed Gradient Descent: Nonconvergence to Saddle Points and the Stable-Manifold Theorem

Authors: Brian Swenson, Ryan Murray, H. Vincent Poor, Soummya Kar

Abstract: The paper studies a distributed gradient descent (DGD) process and considers the problem of showing that in nonconvex optimization problems, DGD typically converges to local minima rather than saddle points. The paper considers unconstrained minimization of a smooth objective function. In centralized settings, the problem of demonstrating nonconvergence to saddle points of gradient descent (and va… ▽ More The paper studies a distributed gradient descent (DGD) process and considers the problem of showing that in nonconvex optimization problems, DGD typically converges to local minima rather than saddle points. The paper considers unconstrained minimization of a smooth objective function. In centralized settings, the problem of demonstrating nonconvergence to saddle points of gradient descent (and variants) is typically handled by way of the stable-manifold theorem from classical dynamical systems theory. However, the classical stable-manifold theorem is not applicable in distributed settings. The paper develops an appropriate stable-manifold theorem for DGD showing that convergence to saddle points may only occur from a low-dimensional stable manifold. Under appropriate assumptions (e.g., coercivity), this result implies that DGD typically converges to local minima and not to saddle points. △ Less

Submitted 23 October, 2019; v1 submitted 7 August, 2019; originally announced August 2019.

arXiv:1907.00814 [pdf, other]

doi 10.1007/s12532-020-00193-4

Signomial and Polynomial Optimization via Relative Entropy and Partial Dualization

Authors: Riley Murray, Venkat Chandrasekaran, Adam Wierman

Abstract: We describe a generalization of the Sums-of-AM/GM Exponential (SAGE) relaxation methodology for obtaining bounds on constrained signomial and polynomial optimization problems. Our approach leverages the fact that relative entropy based SAGE certificates conveniently and transparently blend with convex duality, in a manner that Sums-of-Squares certificates do not. This more general approach not onl… ▽ More We describe a generalization of the Sums-of-AM/GM Exponential (SAGE) relaxation methodology for obtaining bounds on constrained signomial and polynomial optimization problems. Our approach leverages the fact that relative entropy based SAGE certificates conveniently and transparently blend with convex duality, in a manner that Sums-of-Squares certificates do not. This more general approach not only retains key properties of ordinary SAGE relaxations (e.g. sparsity preservation), but also inspires a novel perspective-based method of solution recovery. We illustrate the utility of our methodology with a range of examples from the global optimization literature, along with a publicly available software package. △ Less

Submitted 21 July, 2019; v1 submitted 1 July, 2019; originally announced July 2019.

Comments: Software at https://rileyjmurray.github.io/sageopt/. Nine tables, one figure. Forty pages (with large margins). Ten pages of computational experiments; print pages 1-25 and 36-40 to skip the computational experiments. Version 2: minor simplification to section 4.2.1

MSC Class: 90C26 ACM Class: G.1.7; G.4

arXiv:1901.10089 [pdf, ps, other]

A maximum principle argument for the uniform convergence of graph Laplacian regressors

Authors: Nicolas Garcia Trillos, Ryan Murray

Abstract: This paper investigates the use of methods from partial differential equations and the Calculus of variations to study learning problems that are regularized using graph Laplacians. Graph Laplacians are a powerful, flexible method for capturing local and global geometry in many classes of learning problems, and the techniques developed in this paper help to broaden the methodology of studying such… ▽ More This paper investigates the use of methods from partial differential equations and the Calculus of variations to study learning problems that are regularized using graph Laplacians. Graph Laplacians are a powerful, flexible method for capturing local and global geometry in many classes of learning problems, and the techniques developed in this paper help to broaden the methodology of studying such problems. In particular, we develop the use of maximum principle arguments to establish asymptotic consistency guarantees within the context of noise corrupted, non-parametric regression with samples living on an unknown manifold embedded in $\mathbb{R}^d$. The maximum principle arguments provide a new technical tool which informs parameter selection by giving concrete error estimates in terms of various regularization parameters. A review of learning algorithms which utilize graph Laplacians, as well as previous developments in the use of differential equation and variational techniques to study those algorithms, is given. In addition, new connections are drawn between Laplacian methods and other machine learning techniques, such as kernel regression and k-nearest neighbor methods. △ Less

Submitted 27 June, 2020; v1 submitted 28 January, 2019; originally announced January 2019.

arXiv:1810.01614 [pdf, other]

doi 10.1007/s10208-021-09497-w

Newton Polytopes and Relative Entropy Optimization

Authors: Riley Murray, Venkat Chandrasekaran, Adam Wierman

Abstract: Certifying function nonnegativity is a ubiquitous problem in computational mathematics, with especially notable applications in optimization. We study the question of certifying nonnegativity of signomials based on the recently proposed approach of Sums-of-AM/GM-Exponentials (SAGE) decomposition due to the second author and Shah. The existence of a SAGE decomposition is a sufficient condition for… ▽ More Certifying function nonnegativity is a ubiquitous problem in computational mathematics, with especially notable applications in optimization. We study the question of certifying nonnegativity of signomials based on the recently proposed approach of Sums-of-AM/GM-Exponentials (SAGE) decomposition due to the second author and Shah. The existence of a SAGE decomposition is a sufficient condition for nonnegativity of a signomial, and it can be verified by solving a tractable convex relative entropy program. We present new structural properties of SAGE certificates such as a characterization of the extreme rays of the cones associated to these decompositions as well as an appealing form of sparsity preservation. These lead to a number of important consequences such as conditions under which signomial nonnegativity is equivalent to the existence of a SAGE decomposition; our results represent the broadest-known class of nonconvex signomial optimization problems that can be solved efficiently via convex relaxation. The analysis in this paper proceeds by leveraging the interaction between the convex duality underlying SAGE certificates and the face structure of Newton polytopes. While our primary focus is on signomials, we also discuss how our results provide efficient methods for certifying polynomial nonnegativity, with complexity independent of the degree of a polynomial. △ Less

Submitted 14 May, 2020; v1 submitted 3 October, 2018; originally announced October 2018.

Comments: Body shortened from 29 to 24 pages. Additional consideration to related work. Some claims made in Section 5 have been formalized. Revised within 2 months of first-round reviews

MSC Class: 52A40; 90C30; 14P15

arXiv:1808.03218 [pdf, other]

Spatial extreme values: variational techniques and stochastic integrals

Authors: Nicolas Garcia Trillos, Ryan Murray, Daniel Sanz-Alonso

Abstract: This work employs variational techniques to revisit and expand the construction and analysis of extreme value processes. These techniques permit a novel study of spatial statistics of the location of minimizing events. We develop integral formulas for computing statistics of spatially-biased extremal events, and show that they are analogous to stochastic integrals in the setting of standard stocha… ▽ More This work employs variational techniques to revisit and expand the construction and analysis of extreme value processes. These techniques permit a novel study of spatial statistics of the location of minimizing events. We develop integral formulas for computing statistics of spatially-biased extremal events, and show that they are analogous to stochastic integrals in the setting of standard stochastic processes. We also establish an asymptotic result in the spirit of the Fisher-Tippett-Gnedenko theory for a broader class of extremal events and discuss some applications of our results. △ Less

Submitted 9 August, 2018; originally announced August 2018.

arXiv:1802.07668 [pdf, ps, other]

A model for system uncertainty in reinforcement learning

Authors: Ryan Murray, Michele Palladino

Abstract: This work provides a rigorous framework for studying continuous time control problems in uncertain environments. The framework considered models uncertainty in state dynamics as a measure on the space of functions. This measure is considered to change over time as agents learn their environment. This model can be seem as a variant of either Bayesian reinforcement learning or adaptive control. We s… ▽ More This work provides a rigorous framework for studying continuous time control problems in uncertain environments. The framework considered models uncertainty in state dynamics as a measure on the space of functions. This measure is considered to change over time as agents learn their environment. This model can be seem as a variant of either Bayesian reinforcement learning or adaptive control. We study necessary conditions for locally optimal trajectories within this model, in particular deriving an appropriate dynamic programming principle and Hamilton-Jacobi equations. This model provides one possible framework for studying the tradeoff between exploration and exploitation in reinforcement learning. △ Less

Submitted 21 February, 2018; originally announced February 2018.

arXiv:1711.05224 [pdf, other]

Revisiting Normalized Gradient Descent: Fast Evasion of Saddle Points

Authors: Ryan Murray, Brian Swenson, Soummya Kar

Abstract: The note considers normalized gradient descent (NGD), a natural modification of classical gradient descent (GD) in optimization problems. A serious shortcoming of GD in non-convex problems is that GD may take arbitrarily long to escape from the neighborhood of a saddle point. This issue can make the convergence of GD arbitrarily slow, particularly in high-dimensional non-convex problems where the… ▽ More The note considers normalized gradient descent (NGD), a natural modification of classical gradient descent (GD) in optimization problems. A serious shortcoming of GD in non-convex problems is that GD may take arbitrarily long to escape from the neighborhood of a saddle point. This issue can make the convergence of GD arbitrarily slow, particularly in high-dimensional non-convex problems where the relative number of saddle points is often large. The paper focuses on continuous-time descent. It is shown that, contrary to standard GD, NGD escapes saddle points `quickly.' In particular, it is shown that (i) NGD `almost never' converges to saddle points and (ii) the time required for NGD to escape from a ball of radius $r$ about a saddle point $x^*$ is at most $5\sqrtκr$, where $κ$ is the condition number of the Hessian of $f$ at $x^*$. As an application of this result, a global convergence-time bound is established for NGD under mild assumptions. △ Less

Submitted 24 July, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

arXiv:1707.06466 [pdf, other]

Regular Potential Games

Authors: Brian Swenson, Ryan Murray, Soummya Kar

Abstract: A fundamental problem with the Nash equilibrium concept is the existence of certain "structurally deficient" equilibria that (i) lack fundamental robustness properties, and (ii) are difficult to analyze. The notion of a "regular" Nash equilibrium was introduced by Harsanyi. Such equilibria are isolated, highly robust, and relatively simple to analyze. A game is said to be regular if all equilibria… ▽ More A fundamental problem with the Nash equilibrium concept is the existence of certain "structurally deficient" equilibria that (i) lack fundamental robustness properties, and (ii) are difficult to analyze. The notion of a "regular" Nash equilibrium was introduced by Harsanyi. Such equilibria are isolated, highly robust, and relatively simple to analyze. A game is said to be regular if all equilibria in the game are regular. In this paper it is shown that almost all potential games are regular. That is, except for a closed subset with Lebesgue measure zero, all potential games are regular. As an immediate consequence of this, the paper also proves an oddness result for potential games: in almost all potential games, the number of Nash equilibrium strategies is finite and odd. Specialized results are given for weighted potential games, exact potential games, and games with identical payoffs. Applications of the results to game-theoretic learning are discussed. △ Less

Submitted 30 July, 2019; v1 submitted 18 July, 2017; originally announced July 2017.

MSC Class: 91A06; 91B52; 91A26

arXiv:1707.06465 [pdf, other]

On Best-Response Dynamics in Potential Games

Authors: Brian Swenson, Ryan Murray, Soummya Kar

Abstract: The paper studies the convergence properties of (continuous) best-response dynamics from game theory. Despite their fundamental role in game theory, best-response dynamics are poorly understood in many games of interest due to the discontinuous, set-valued nature of the best-response map. The paper focuses on elucidating several important properties of best-response dynamics in the class of multi-… ▽ More The paper studies the convergence properties of (continuous) best-response dynamics from game theory. Despite their fundamental role in game theory, best-response dynamics are poorly understood in many games of interest due to the discontinuous, set-valued nature of the best-response map. The paper focuses on elucidating several important properties of best-response dynamics in the class of multi-agent games known as potential games---a class of games with fundamental importance in multi-agent systems and distributed control. It is shown that in almost every potential game and for almost every initial condition, the best-response dynamics (i) have a unique solution, (ii) converge to pure-strategy Nash equilibria, and (iii) converge at an exponential rate. △ Less

Submitted 6 February, 2018; v1 submitted 18 July, 2017; originally announced July 2017.

MSC Class: 93A14; 93A15; 91A06; 91A26; 37B25

arXiv:1705.00606 [pdf, ps, other]

A Note Regarding Second-Order $Γ$-limits for the Cahn--Hilliard Functional

Authors: Giovanni Leoni, Ryan Murray

Abstract: This note completely resolves the asymptotic development of order $2$ by $Γ$-convergence of the mass-constrained Cahn--Hilliard functional, by showing that one of the critical assumptions of the authors' previous work (Leoni, Murray, Second-order $Γ$-limit for the Cahn--Hilliard functional, Arch. Ration. Mech. Anal. 219, 3, 2016) is unnecessary. This note completely resolves the asymptotic development of order $2$ by $Γ$-convergence of the mass-constrained Cahn--Hilliard functional, by showing that one of the critical assumptions of the authors' previous work (Leoni, Murray, Second-order $Γ$-limit for the Cahn--Hilliard functional, Arch. Ration. Mech. Anal. 219, 3, 2016) is unnecessary. △ Less

Submitted 1 May, 2017; originally announced May 2017.

arXiv:1609.07965 [pdf, ps, other]

Cutoff estimates for the Becker-Döring equations

Authors: Ryan Murray, Robert Pego

Abstract: This paper continues the authors' previous study (SIAM J. Math. Anal., 2016) of the trend toward equilibrium of the Becker-Döring equations with subcritical mass, by characterizing certain fine properties of solutions to the linearized equation. In particular, we partially characterize the spectrum of the linearized operator, showing that it contains the entire imaginary axis in polynomially weigh… ▽ More This paper continues the authors' previous study (SIAM J. Math. Anal., 2016) of the trend toward equilibrium of the Becker-Döring equations with subcritical mass, by characterizing certain fine properties of solutions to the linearized equation. In particular, we partially characterize the spectrum of the linearized operator, showing that it contains the entire imaginary axis in polynomially weighted spaces. Moreover, we prove detailed cutoff estimates that establish upper and lower bounds on the lifetime of a class of perturbations to equilibrium. △ Less

Submitted 26 September, 2016; originally announced September 2016.

arXiv:1607.00274 [pdf, other]

A new analytical approach to consistency and overfitting in regularized empirical risk minimization

Authors: Nicolas Garcia Trillos, Ryan Murray

Abstract: This work considers the problem of binary classification: given training data $x_1, \dots, x_n$ from a certain population, together with associated labels $y_1,\dots, y_n \in \left\{0,1 \right\}$, determine the best label for an element $x$ not among the training data. More specifically, this work considers a variant of the regularized empirical risk functional which is defined intrinsically to th… ▽ More This work considers the problem of binary classification: given training data $x_1, \dots, x_n$ from a certain population, together with associated labels $y_1,\dots, y_n \in \left\{0,1 \right\}$, determine the best label for an element $x$ not among the training data. More specifically, this work considers a variant of the regularized empirical risk functional which is defined intrinsically to the observed data and does not depend on the underlying population. Tools from modern analysis are used to obtain a concise proof of asymptotic consistency as regularization parameters are taken to zero at rates related to the size of the sample. These analytical tools give a new framework for understanding overfitting and underfitting, and rigorously connect the notion of overfitting with a loss of compactness. △ Less

Submitted 1 July, 2016; originally announced July 2016.

MSC Class: 49J55; 49J45; 60D05; 68R10; 62G20

arXiv:1512.01706 [pdf, other]

Slow motion for the nonlocal Allen-Cahn equation in n-dimensions

Authors: Ryan Murray, Matteo Rinaldi

Abstract: The goal of this paper is to study the slow motion of solutions of the nonlocal Allen-Cahn equation in a bounded domain $Ω\subset \mathbb{R}^n$, for $n > 1$. The initial data is assumed to be close to a configuration whose interface separating the states minimizes the surface area (or perimeter); both local and global perimeter minimizers are taken into account. The evolution of interfaces on a ti… ▽ More The goal of this paper is to study the slow motion of solutions of the nonlocal Allen-Cahn equation in a bounded domain $Ω\subset \mathbb{R}^n$, for $n > 1$. The initial data is assumed to be close to a configuration whose interface separating the states minimizes the surface area (or perimeter); both local and global perimeter minimizers are taken into account. The evolution of interfaces on a time scale $\varepsilon^{-1}$ is deduced, where $\varepsilon$ is the interaction length parameter. The key tool is a second-order $Γ$-convergence analysis of the energy functional, which provides sharp energy estimates. New regularity results are derived for the isoperimetric function of a domain. Slow motion of solutions for the Cahn-Hilliard equation starting close to global perimeter minimizers is proved as well. △ Less

Submitted 5 December, 2015; originally announced December 2015.

arXiv:1509.01762 [pdf, ps, other]

Polynomial decay to equilibrium for the Becker-Döring equations

Authors: Ryan W. Murray, Robert L. Pego

Abstract: This paper studies rates of decay to equilibrium for the Becker-Döring equations with subcritical initial data. In particular, polynomial rates of decay are established when initial perturbations of equilibrium have polynomial moments. This is proved by using new dissipation estimates in polynomially weighted $\ell^1$ spaces, operator decomposition techniques from kinetic theory, and interpolation… ▽ More This paper studies rates of decay to equilibrium for the Becker-Döring equations with subcritical initial data. In particular, polynomial rates of decay are established when initial perturbations of equilibrium have polynomial moments. This is proved by using new dissipation estimates in polynomially weighted $\ell^1$ spaces, operator decomposition techniques from kinetic theory, and interpolation estimates from the study of travelling waves. △ Less

Submitted 5 September, 2015; originally announced September 2015.

arXiv:1503.07272 [pdf, other]

doi 10.1007/s00205-015-0924-4

Second-Order $Γ$-limit for the Cahn-Hilliard Functional

Authors: Giovanni Leoni, Ryan Murray

Abstract: The goal of this paper is to solve a long standing open problem, namely, the asymptotic development of order $2$ by $Γ$-convergence of the mass-constrained Cahn-Hilliard functional. This is achieved by introducing a novel rearrangement technique, which works without Dirichlet boundary conditions. The goal of this paper is to solve a long standing open problem, namely, the asymptotic development of order $2$ by $Γ$-convergence of the mass-constrained Cahn-Hilliard functional. This is achieved by introducing a novel rearrangement technique, which works without Dirichlet boundary conditions. △ Less

Submitted 1 August, 2015; v1 submitted 24 March, 2015; originally announced March 2015.

arXiv:1311.7130 [pdf, ps, other]

Convex Optimal Uncertainty Quantification

Authors: Shuo Han, Molei Tao, Ufuk Topcu, Houman Owhadi, Richard M. Murray

Abstract: Optimal uncertainty quantification (OUQ) is a framework for numerical extreme-case analysis of stochastic systems with imperfect knowledge of the underlying probability distribution. This paper presents sufficient conditions under which an OUQ problem can be reformulated as a finite-dimensional convex optimization problem, for which efficient numerical solutions can be obtained. The sufficient con… ▽ More Optimal uncertainty quantification (OUQ) is a framework for numerical extreme-case analysis of stochastic systems with imperfect knowledge of the underlying probability distribution. This paper presents sufficient conditions under which an OUQ problem can be reformulated as a finite-dimensional convex optimization problem, for which efficient numerical solutions can be obtained. The sufficient conditions include that the objective function is piecewise concave and the constraints are piecewise convex. In particular, we show that piecewise concave objective functions may appear in applications where the objective is defined by the optimal value of a parameterized linear program. △ Less

Submitted 27 April, 2015; v1 submitted 27 November, 2013; originally announced November 2013.

Comments: Accepted for publication in SIAM Journal on Optimization

arXiv:1307.7407 [pdf, other]

First hyperbolic times for intermittent maps with unbounded derivative

Authors: Chris Bose, Rua Murray

Abstract: We establish some statistical properties of the hyperbolic times for a class of nonuniformly expanding dynamical systems. The maps arise as factors of area preserving maps of the unit square via a geometric Baker's map type construction, exhibit intermittent dynamics, and have unbounded derivatives. The geometric approach captures various examples from the literature over the last thirty years. Th… ▽ More We establish some statistical properties of the hyperbolic times for a class of nonuniformly expanding dynamical systems. The maps arise as factors of area preserving maps of the unit square via a geometric Baker's map type construction, exhibit intermittent dynamics, and have unbounded derivatives. The geometric approach captures various examples from the literature over the last thirty years. The statistics of these maps are controlled by the order of tangency that a certain "cut function" makes with the boundary of the square. Using a large deviations result of Melbourne and Nicol we obtain sharp estimates on the distribution of first hyperbolic times. As shown by Alves, Viana and others, knowledge of the tail of the distribution of first hyperbolic times leads to estimates on the rate of decay of correlations and derivation of a CLT. For our family of maps, we compare the estimates on correlation decay rate and CLT derived via hyperbolic times with those derived by a direct Young tower construction. The latter estimates are known to be sharp. △ Less

Submitted 28 July, 2013; originally announced July 2013.

arXiv:1211.0068 [pdf, other]

Numerical approximation of conditionally invariant measures via Maximum Entropy

Authors: Christopher Bose, Rua Murray

Abstract: It is well known that open dynamical systems can admit an uncountable number of (absolutely continuous) conditionally invariant measures (ACCIMs) for each prescribed escape rate. We propose and illustrate a convex optimisation based selection scheme (essentially maximum entropy) for gaining numerical access to some of these measures. The work is similar to the Maximum Entropy (MAXENT) approach for… ▽ More It is well known that open dynamical systems can admit an uncountable number of (absolutely continuous) conditionally invariant measures (ACCIMs) for each prescribed escape rate. We propose and illustrate a convex optimisation based selection scheme (essentially maximum entropy) for gaining numerical access to some of these measures. The work is similar to the Maximum Entropy (MAXENT) approach for calculating absolutely continuous invariant measures of nonsingular dynamical systems, but contains some interesting new twists, including: (i) the natural escape rate is not known in advance, which can destroy convex structure in the problem; (ii) exploitation of convex duality to solve each approximation step induces important (but dynamically relevant and not at first apparent) localisation of support; (iii) significant potential for application to the approximation of other dynamically interesting objects (for example, invariant manifolds). △ Less

Submitted 21 February, 2013; v1 submitted 31 October, 2012; originally announced November 2012.

Comments: 8 Figures; Presented at Banff International Research Station workshop: "Open Dynamical Systems: Ergodic Theory, Probabilistic Methods and Applications (12w5050)"; introduction expanded and reorganised + 2 figures added thanks to referee suggestions

MSC Class: 37M25 65K10 49M29 94A17

arXiv:1206.2406 [pdf, ps, other]

doi 10.1142/S0218127413501307

Polynomial decay of correlations in the generalized baker's transformation

Authors: Christopher Bose, Rua Murray

Abstract: We introduce a family of area preserving generalized baker's transformations acting on the unit square and having sharp polynomial rates of mixing for Holder data. The construction is geometric, relying on the graph of a single variable "cut function". Each baker's map B is non-uniformly hyperbolic and while the exact mixing rate depends on B, all polynomial rates can be attained. The analysis of… ▽ More We introduce a family of area preserving generalized baker's transformations acting on the unit square and having sharp polynomial rates of mixing for Holder data. The construction is geometric, relying on the graph of a single variable "cut function". Each baker's map B is non-uniformly hyperbolic and while the exact mixing rate depends on B, all polynomial rates can be attained. The analysis of mixing rates depends on building a suitable Young tower for an expanding factor. The mechanisms leading to a slow rate of correlation decay are especially transparent in our examples due to the simple geometry in the construction. For this reason we propose this class of maps as an excellent testing ground for new techniques for the analysis of decay of correlations in non-uniformly hyperbolic systems. Finally, some of our examples can be seen to be extensions of certain 1-D non-uniformly expanding maps that have appeared in the literature over the last twenty years thereby providing a unified treatment of these interesting and well-studied examples. △ Less

Submitted 11 June, 2012; originally announced June 2012.

Comments: 24 pages, 2 figures

Journal ref: Int. J. Bifurcation Chaos 23, 1350130 (2013)

arXiv:1204.2329 [pdf, other]

Ulam's method for Lasota-Yorke maps with holes

Authors: Christopher Bose, Gary Froyland, Cecilia González-Tokman, Rua Murray

Abstract: Ulam's method is a rigorous numerical scheme for approximating invariant densities of dynamical systems. The phase space is partitioned into connected sets and an inter-set transition matrix is computed from the dynamics; an approximate invariant density is read off as the leading left eigenvector of this matrix. When a hole in phase space is introduced, one instead searches for \emph{conditional}… ▽ More Ulam's method is a rigorous numerical scheme for approximating invariant densities of dynamical systems. The phase space is partitioned into connected sets and an inter-set transition matrix is computed from the dynamics; an approximate invariant density is read off as the leading left eigenvector of this matrix. When a hole in phase space is introduced, one instead searches for \emph{conditional} invariant densities and their associated escape rates. For Lasota-Yorke maps with holes we prove that a simple adaptation of the standard Ulam scheme provides convergent sequences of escape rates (from the leading eigenvalue), conditional invariant densities (from the corresponding left eigenvector), and quasi-conformal measures (from the corresponding right eigenvector). We also immediately obtain a convergent sequence for the invariant measure supported on the survivor set. Our approach allows us to consider relatively large holes. We illustrate the approach with several families of examples, including a class of Lorenz maps. △ Less

Submitted 3 February, 2013; v1 submitted 10 April, 2012; originally announced April 2012.

Comments: 20 pages, 6 figures, added section on Lorenz-like maps

MSC Class: 37A05; 37E05; 37M25

arXiv:1012.2149 [pdf, ps, other]

doi 10.1088/0951-7715/24/9/003

Spectral degeneracy and escape dynamics for intermittent maps with a hole

Authors: Gary Froyland, Rua Murray, Ognjen Stancevic

Abstract: We study intermittent maps from the point of view of metastability. Small neighbourhoods of an intermittent fixed point and their complements form pairs of almost-invariant sets. Treating the small neighbourhood as a hole, we first show that the absolutely continuous conditional invariant measures (ACCIMs) converge to the ACIM as the length of the small neighbourhood shrinks to zero. We then quant… ▽ More We study intermittent maps from the point of view of metastability. Small neighbourhoods of an intermittent fixed point and their complements form pairs of almost-invariant sets. Treating the small neighbourhood as a hole, we first show that the absolutely continuous conditional invariant measures (ACCIMs) converge to the ACIM as the length of the small neighbourhood shrinks to zero. We then quantify how the escape dynamics from these almost-invariant sets are connected with the second eigenfunctions of Perron-Frobenius (transfer) operators when a small perturbation is applied near the intermittent fixed point. In particular, we describe precisely the scaling of the second eigenvalue with the perturbation size, provide upper and lower bounds, and demonstrate $L^1$ convergence of the positive part of the second eigenfunction to the ACIM as the perturbation goes to zero. This perturbation and associated eigenvalue scalings and convergence results are all compatible with Ulam's method and provide a formal explanation for the numerical behaviour of Ulam's method in this nonuniformly hyperbolic setting. The main results of the paper are illustrated with numerical computations. △ Less

Submitted 31 May, 2011; v1 submitted 9 December, 2010; originally announced December 2010.

Comments: 34 pages

MSC Class: 37E05; 37M25; 37D25

Showing 1–48 of 48 results for author: Murray, R