-
Parallel-in-time solution of hyperbolic PDE systems via characteristic-variable block preconditioning
Authors:
H. De Sterck,
R. D. Falgout,
O. A. Krzysik,
J. B. Schroder
Abstract:
We consider the parallel-in-time solution of hyperbolic partial differential equation (PDE) systems in one spatial dimension, both linear and nonlinear. In the nonlinear setting, the discretized equations are solved with a preconditioned residual iteration based on a global linearization. The linear(ized) equation systems are approximately solved parallel-in-time using a block preconditioner appli…
▽ More
We consider the parallel-in-time solution of hyperbolic partial differential equation (PDE) systems in one spatial dimension, both linear and nonlinear. In the nonlinear setting, the discretized equations are solved with a preconditioned residual iteration based on a global linearization. The linear(ized) equation systems are approximately solved parallel-in-time using a block preconditioner applied in the characteristic variables of the underlying linear(ized) hyperbolic PDE. This change of variables is motivated by the observation that inter-variable coupling for characteristic variables is weak relative to intra-variable coupling, at least locally where spatio-temporal variations in the eigenvectors of the associated flux Jacobian are sufficiently small. For an $\ell$-dimensional system of PDEs, applying the preconditioner consists of solving a sequence of $\ell$ scalar linear(ized)-advection-like problems, each being associated with a different characteristic wave-speed in the underlying linear(ized) PDE. We approximately solve these linear advection problems using multigrid reduction-in-time (MGRIT); however, any other suitable parallel-in-time method could be used. Numerical examples are shown for the (linear) acoustics equations in heterogeneous media, and for the (nonlinear) shallow water equations and Euler equations of gas dynamics with shocks and rarefactions.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Sum-of-norms regularized Nonnegative Matrix Factorization
Authors:
Andersen Ang,
Waqas Bin Hamed,
Hans De Sterck
Abstract:
When applying nonnegative matrix factorization (NMF), generally the rank parameter is unknown. Such rank in NMF, called the nonnegative rank, is usually estimated heuristically since computing the exact value of it is NP-hard. In this work, we propose an approximation method to estimate such rank while solving NMF on-the-fly. We use sum-of-norm (SON), a group-lasso structure that encourages pairwi…
▽ More
When applying nonnegative matrix factorization (NMF), generally the rank parameter is unknown. Such rank in NMF, called the nonnegative rank, is usually estimated heuristically since computing the exact value of it is NP-hard. In this work, we propose an approximation method to estimate such rank while solving NMF on-the-fly. We use sum-of-norm (SON), a group-lasso structure that encourages pairwise similarity, to reduce the rank of a factor matrix where the rank is overestimated at the beginning. On various datasets, SON-NMF is able to reveal the correct nonnegative rank of the data without any prior knowledge nor tuning.
SON-NMF is a nonconvx nonsmmoth non-separable non-proximable problem, solving it is nontrivial. First, as rank estimation in NMF is NP-hard, the proposed approach does not enjoy a lower computational complexity. Using a graph-theoretic argument, we prove that the complexity of the SON-NMF is almost irreducible. Second, the per-iteration cost of any algorithm solving SON-NMF is possibly high, which motivated us to propose a first-order BCD algorithm to approximately solve SON-NMF with a low per-iteration cost, in which we do so by the proximal average operator. Lastly, we propose a simple greedy method for post-processing.
SON-NMF exhibits favourable features for applications. Beside the ability to automatically estimate the rank from data, SON-NMF can deal with rank-deficient data matrix, can detect weak component with small energy. Furthermore, on the application of hyperspectral imaging, SON-NMF handle the issue of spectral variability naturally.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
First-order PDES for Graph Neural Networks: Advection And Burgers Equation Models
Authors:
Yifan Qu,
Oliver Krzysik,
Hans De Sterck,
Omer Ege Kara
Abstract:
Graph Neural Networks (GNNs) have established themselves as the preferred methodology in a multitude of domains, ranging from computer vision to computational biology, especially in contexts where data inherently conform to graph structures. While many existing methods have endeavored to model GNNs using various techniques, a prevalent challenge they grapple with is the issue of over-smoothing. Th…
▽ More
Graph Neural Networks (GNNs) have established themselves as the preferred methodology in a multitude of domains, ranging from computer vision to computational biology, especially in contexts where data inherently conform to graph structures. While many existing methods have endeavored to model GNNs using various techniques, a prevalent challenge they grapple with is the issue of over-smoothing. This paper presents new Graph Neural Network models that incorporate two first-order Partial Differential Equations (PDEs). These models do not increase complexity but effectively mitigate the over-smoothing problem. Our experimental findings highlight the capacity of our new PDE model to achieve comparable results with higher-order PDE models and fix the over-smoothing problem up to 64 layers. These results underscore the adaptability and versatility of GNNs, indicating that unconventional approaches can yield outcomes on par with established techniques.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Parallel-in-time solution of scalar nonlinear conservation laws
Authors:
H. De Sterck,
R. D. Falgout,
O. A. Krzysik,
J. B. Schroder
Abstract:
We consider the parallel-in-time solution of scalar nonlinear conservation laws in one spatial dimension. The equations are discretized in space with a conservative finite-volume method using weighted essentially non-oscillatory (WENO) reconstructions, and in time with high-order explicit Runge-Kutta methods. The solution of the global, discretized space-time problem is sought via a nonlinear iter…
▽ More
We consider the parallel-in-time solution of scalar nonlinear conservation laws in one spatial dimension. The equations are discretized in space with a conservative finite-volume method using weighted essentially non-oscillatory (WENO) reconstructions, and in time with high-order explicit Runge-Kutta methods. The solution of the global, discretized space-time problem is sought via a nonlinear iteration that uses a novel linearization strategy in cases of non-differentiable equations. Under certain choices of discretization and algorithmic parameters, the nonlinear iteration coincides with Newton's method, although, more generally, it is a preconditioned residual correction scheme. At each nonlinear iteration, the linearized problem takes the form of a certain discretization of a linear conservation law over the space-time domain in question. An approximate parallel-in-time solution of the linearized problem is computed with a single multigrid reduction-in-time (MGRIT) iteration. The MGRIT iteration employs a novel coarse-grid operator that is a modified conservative semi-Lagrangian discretization and generalizes those we have developed previously for non-conservative scalar linear hyperbolic problems. Numerical tests are performed for the inviscid Burgers and Buckley--Leverett equations. For many test problems, the solver converges in just a handful of iterations with convergence rate independent of mesh resolution, including problems with (interacting) shocks and rarefactions.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Asymptotic convergence of restarted Anderson acceleration for certain normal linear systems
Authors:
Hans De Sterck,
Oliver A. Krzysik,
Adam Smith
Abstract:
Anderson acceleration (AA) is widely used for accelerating the convergence of an underlying fixed-point iteration $\bm{x}_{k+1} = \bm{q}( \bm{x}_{k} )$, $k = 0, 1, \ldots$, with $\bm{x}_k \in \mathbb{R}^n$, $\bm{q} \colon \mathbb{R}^n \to \mathbb{R}^n$. Despite AA's widespread use, relatively little is understood theoretically about the extent to which it may accelerate the underlying fixed-point…
▽ More
Anderson acceleration (AA) is widely used for accelerating the convergence of an underlying fixed-point iteration $\bm{x}_{k+1} = \bm{q}( \bm{x}_{k} )$, $k = 0, 1, \ldots$, with $\bm{x}_k \in \mathbb{R}^n$, $\bm{q} \colon \mathbb{R}^n \to \mathbb{R}^n$. Despite AA's widespread use, relatively little is understood theoretically about the extent to which it may accelerate the underlying fixed-point iteration. To this end, we analyze a restarted variant of AA with a restart size of one, a method closely related to GMRES(1). We consider the case of $\bm{q}( \bm{x} ) = M \bm{x} + \bm{b}$ with matrix $M \in \mathbb{R}^{n \times n}$ either symmetric or skew-symmetric. For both classes of $M$ we compute the worst-case root-average asymptotic convergence factor of the AA method, partially relying on conjecture in the symmetric setting, proving that it is strictly smaller than that of the underlying fixed-point iteration. For symmetric $M$, we show that the AA residual iteration corresponds to a fixed-point iteration for solving an eigenvector-dependent nonlinear eigenvalue problem (NEPv), and we show how this can result in the convergence factor strongly depending on the initial iterate, which we quantify exactly in certain special cases. Conversely, for skew-symmetric $M$ we show that the AA residual iteration is closely related to a power iteration for $M$, and how this results in the convergence factor being independent of the initial iterate. Supporting numerical results are given, which also indicate the theory is applicable to the more general setting of nonlinear $\bm{q}$ with Jacobian at the fixed point that is symmetric or skew symmetric.
△ Less
Submitted 4 July, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences
Authors:
Yanming Kang,
Giang Tran,
Hans De Sterck
Abstract:
Transformer-based models have achieved state-of-the-art performance in many areas. However, the quadratic complexity of self-attention with respect to the input length hinders the applicability of Transformer-based models to long sequences. To address this, we present Fast Multipole Attention, a new attention mechanism that uses a divide-and-conquer strategy to reduce the time and memory complexit…
▽ More
Transformer-based models have achieved state-of-the-art performance in many areas. However, the quadratic complexity of self-attention with respect to the input length hinders the applicability of Transformer-based models to long sequences. To address this, we present Fast Multipole Attention, a new attention mechanism that uses a divide-and-conquer strategy to reduce the time and memory complexity of attention for sequences of length $n$ from $\mathcal{O}(n^2)$ to $\mathcal{O}(n \log n)$ or $O(n)$, while retaining a global receptive field. The hierarchical approach groups queries, keys, and values into $\mathcal{O}( \log n)$ levels of resolution, where groups at greater distances are increasingly larger in size and the weights to compute group quantities are learned. As such, the interaction between tokens far from each other is considered in lower resolution in an efficient hierarchical manner. The overall complexity of Fast Multipole Attention is $\mathcal{O}(n)$ or $\mathcal{O}(n \log n)$, depending on whether the queries are down-sampled or not. This multi-level divide-and-conquer strategy is inspired by fast summation methods from $n$-body physics and the Fast Multipole Method. We perform evaluation on autoregressive and bidirectional language modeling tasks and compare our Fast Multipole Attention model with other efficient attention variants on medium-size datasets. We find empirically that the Fast Multipole Transformer performs much better than other efficient transformers in terms of memory size and accuracy. The Fast Multipole Attention mechanism has the potential to empower large language models with much greater sequence lengths, taking the full context into account in an efficient, naturally hierarchical manner during training and when generating long sequences.
△ Less
Submitted 20 October, 2023; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Multilevel Monte Carlo methods for stochastic convection-diffusion eigenvalue problems
Authors:
Tiangang Cui,
Hans De Sterck,
Alexander D. Gilbert,
Stanislav Polishchuk,
Robert Scheichl
Abstract:
We develop new multilevel Monte Carlo (MLMC) methods to estimate the expectation of the smallest eigenvalue of a stochastic convection-diffusion operator with random coefficients. The MLMC method is based on a sequence of finite element (FE) discretizations of the eigenvalue problem on a hierarchy of increasingly finer meshes. For the discretized, algebraic eigenproblems we use both the Rayleigh q…
▽ More
We develop new multilevel Monte Carlo (MLMC) methods to estimate the expectation of the smallest eigenvalue of a stochastic convection-diffusion operator with random coefficients. The MLMC method is based on a sequence of finite element (FE) discretizations of the eigenvalue problem on a hierarchy of increasingly finer meshes. For the discretized, algebraic eigenproblems we use both the Rayleigh quotient (RQ) iteration and implicitly restarted Arnoldi (IRA), providing an analysis of the cost in each case. By studying the variance on each level and adapting classical FE error bounds to the stochastic setting, we are able to bound the total error of our MLMC estimator and provide a complexity analysis. As expected, the complexity bound for our MLMC estimator is superior to plain Monte Carlo. To improve the efficiency of the MLMC further, we exploit the hierarchy of meshes and use coarser approximations as starting values for the eigensolvers on finer ones. To improve the stability of the MLMC method for convection-dominated problems, we employ two additional strategies. First, we consider the streamline upwind Petrov--Galerkin formulation of the discrete eigenvalue problem, which allows us to start the MLMC method on coarser meshes than is possible with standard FEs. Second, we apply a homotopy method to add stability to the eigensolver for each sample. Finally, we present a multilevel quasi-Monte Carlo method that replaces Monte Carlo with a quasi-Monte Carlo (QMC) rule on each level. Due to the faster convergence of QMC, this improves the overall complexity. We provide detailed numerical results comparing our different strategies to demonstrate the practical feasibility of the MLMC method in different use cases. The results support our complexity analysis and further demonstrate the superiority over plain Monte Carlo in all cases.
△ Less
Submitted 12 February, 2024; v1 submitted 7 March, 2023;
originally announced March 2023.
-
MGProx: A nonsmooth multigrid proximal gradient method with adaptive restriction for strongly convex optimization
Authors:
Andersen Ang,
Hans De Sterck,
Stephen Vavasis
Abstract:
We study the combination of proximal gradient descent with multigrid for solving a class of possibly nonsmooth strongly convex optimization problems. We propose a multigrid proximal gradient method called MGProx, which accelerates the proximal gradient method by multigrid, based on using hierarchical information of the optimization problem. MGProx applies a newly introduced adaptive restriction op…
▽ More
We study the combination of proximal gradient descent with multigrid for solving a class of possibly nonsmooth strongly convex optimization problems. We propose a multigrid proximal gradient method called MGProx, which accelerates the proximal gradient method by multigrid, based on using hierarchical information of the optimization problem. MGProx applies a newly introduced adaptive restriction operator to simplify the Minkowski sum of subdifferentials of the nondifferentiable objective function across different levels. We provide a theoretical characterization of MGProx. First we show that the MGProx update operator exhibits a fixed-point property. Next, we show that the coarse correction is a descent direction for the fine variable of the original fine level problem in the general nonsmooth case. Lastly, under some assumptions we provide the convergence rate for the algorithm. In the numerical tests on the Elastic Obstacle Problem, which is an example of nonsmooth convex optimization problem where multigrid method can be applied, we show that MGProx has a faster convergence speed than competing methods.
△ Less
Submitted 10 May, 2024; v1 submitted 8 February, 2023;
originally announced February 2023.
-
Downlink Compression Improves TopK Sparsification
Authors:
William Zou,
Hans De Sterck,
Jun Liu
Abstract:
Training large neural networks is time consuming. To speed up the process, distributed training is often used. One of the largest bottlenecks in distributed training is communicating gradients across different nodes. Different gradient compression techniques have been proposed to alleviate the communication bottleneck, including topK gradient sparsification, which truncates the gradient to the lar…
▽ More
Training large neural networks is time consuming. To speed up the process, distributed training is often used. One of the largest bottlenecks in distributed training is communicating gradients across different nodes. Different gradient compression techniques have been proposed to alleviate the communication bottleneck, including topK gradient sparsification, which truncates the gradient to the largest K components before sending it to other nodes. While some authors have investigated topK gradient sparsification in the parameter-server framework by applying topK compression in both the worker-to-server (uplink) and server-to-worker (downlink) direction, the currently accepted belief says that adding extra compression degrades the convergence of the model. We demonstrate, on the contrary, that adding downlink compression can potentially improve the performance of topK sparsification: not only does it reduce the amount of communication per step, but also, counter-intuitively, can improve the upper bound in the convergence analysis. To show this, we revisit non-convex convergence analysis of topK stochastic gradient descent (SGD) and extend it from the unidirectional to the bidirectional setting. We also remove a restriction of the previous analysis that requires unrealistically large values of K. We experimentally evaluate bidirectional topK SGD against unidirectional topK SGD and show that models trained with bidirectional topK SGD will perform as well as models trained with unidirectional topK SGD while yielding significant communication benefits for large numbers of workers.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Efficient multigrid reduction-in-time for method-of-lines discretizations of linear advection
Authors:
H. De Sterck,
R. D. Falgout,
O. A. Krzysik,
J. B. Schroder
Abstract:
Parallel-in-time methods for partial differential equations (PDEs) have been the subject of intense development over recent decades, particularly for diffusion-dominated problems. It has been widely reported in the literature, however, that many of these methods perform quite poorly for advection-dominated problems. Here we analyze the particular iterative parallel-in-time algorithm of multigrid r…
▽ More
Parallel-in-time methods for partial differential equations (PDEs) have been the subject of intense development over recent decades, particularly for diffusion-dominated problems. It has been widely reported in the literature, however, that many of these methods perform quite poorly for advection-dominated problems. Here we analyze the particular iterative parallel-in-time algorithm of multigrid reduction-in-time (MGRIT) for discretizations of constant-wave-speed linear advection problems. We focus on common method-of-lines discretizations that employ upwind finite differences in space and Runge-Kutta methods in time. Using a convergence framework we developed in previous work, we prove for a subclass of these discretizations that, if using the standard approach of rediscretizing the fine-grid problem on the coarse grid, robust MGRIT convergence with respect to CFL number and coarsening factor is not possible. This poor convergence and non-robustness is caused, at least in part, by an inadequate coarse-grid correction for smooth Fourier modes known as characteristic components.We propose an alternative coarse-grid that provides a better correction of these modes. This coarse-grid operator is related to previous work and uses a semi-Lagrangian discretization combined with an implicitly treated truncation error correction. Theory and numerical experiments show the coarse-grid operator yields fast MGRIT convergence for many of the method-of-lines discretizations considered, including for both implicit and explicit discretizations of high order. Parallel results demonstrate substantial speed-up over sequential time-step**.
△ Less
Submitted 20 March, 2023; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Multigrid reduction-in-time convergence for advection problems: A Fourier analysis perspective
Authors:
H. De Sterck,
S. Friedhoff,
O. A. Krzysik,
Scott P. MacLachlan
Abstract:
A long-standing issue in the parallel-in-time community is the poor convergence of standard iterative parallel-in-time methods for hyperbolic partial differential equations (PDEs), and for advection-dominated PDEs more broadly. Here, a local Fourier analysis (LFA) convergence theory is derived for the two-level variant of the iterative parallel-in-time method of multigrid reduction-in-time (MGRIT)…
▽ More
A long-standing issue in the parallel-in-time community is the poor convergence of standard iterative parallel-in-time methods for hyperbolic partial differential equations (PDEs), and for advection-dominated PDEs more broadly. Here, a local Fourier analysis (LFA) convergence theory is derived for the two-level variant of the iterative parallel-in-time method of multigrid reduction-in-time (MGRIT). This closed-form theory allows for new insights into the poor convergence of MGRIT for advection-dominated PDEs when using the standard approach of rediscretizing the fine-grid problem on the coarse grid. Specifically, we show that this poor convergence arises, at least in part, from inadequate coarse-grid correction of certain smooth Fourier modes known as characteristic components, which was previously identified as causing poor convergence of classical spatial multigrid on steady-state advection-dominated PDEs. We apply this convergence theory to show that, for certain semi-Lagrangian discretizations of advection problems, MGRIT convergence using rediscretized coarse-grid operators cannot be robust with respect to CFL number or coarsening factor. A consequence of this analysis is that techniques developed for improving convergence in the spatial multigrid context can be re-purposed in the MGRIT context to develop more robust parallel-in-time solvers. This strategy has been used in recent work to great effect; here, we provide further theoretical evidence supporting the effectiveness of this approach.
△ Less
Submitted 6 October, 2023; v1 submitted 2 August, 2022;
originally announced August 2022.
-
Neural Lyapunov Control of Unknown Nonlinear Systems with Stability Guarantees
Authors:
Ruikun Zhou,
Thanin Quartz,
Hans De Sterck,
Jun Liu
Abstract:
Learning for control of dynamical systems with formal guarantees remains a challenging task. This paper proposes a learning framework to simultaneously stabilize an unknown nonlinear system with a neural controller and learn a neural Lyapunov function to certify a region of attraction (ROA) for the closed-loop system. The algorithmic structure consists of two neural networks and a satisfiability m…
▽ More
Learning for control of dynamical systems with formal guarantees remains a challenging task. This paper proposes a learning framework to simultaneously stabilize an unknown nonlinear system with a neural controller and learn a neural Lyapunov function to certify a region of attraction (ROA) for the closed-loop system. The algorithmic structure consists of two neural networks and a satisfiability modulo theories (SMT) solver. The first neural network is responsible for learning the unknown dynamics. The second neural network aims to identify a valid Lyapunov function and a provably stabilizing nonlinear controller. The SMT solver then verifies that the candidate Lyapunov function indeed satisfies the Lyapunov conditions. We provide theoretical guarantees of the proposed learning framework in terms of the closed-loop stability for the unknown nonlinear system. We illustrate the effectiveness of the approach with a set of numerical experiments.
△ Less
Submitted 15 October, 2022; v1 submitted 4 June, 2022;
originally announced June 2022.
-
Fast multigrid reduction-in-time for advection via modified semi-Lagrangian coarse-grid operators
Authors:
H. De Sterck,
R. D. Falgout,
O. A. Krzysik
Abstract:
Many iterative parallel-in-time algorithms have been shown to be highly efficient for diffusion-dominated partial differential equations (PDEs), but are inefficient or even divergent when applied to advection-dominated PDEs. We consider the application of the multigrid reduction-in-time (MGRIT) algorithm to linear advection PDEs. The key to efficient time integration with this method is using a co…
▽ More
Many iterative parallel-in-time algorithms have been shown to be highly efficient for diffusion-dominated partial differential equations (PDEs), but are inefficient or even divergent when applied to advection-dominated PDEs. We consider the application of the multigrid reduction-in-time (MGRIT) algorithm to linear advection PDEs. The key to efficient time integration with this method is using a coarse-grid operator that provides a sufficiently accurate approximation to the the so-called ideal coarse-grid operator. For certain classes of semi-Lagrangian discretizations, we present a novel semi-Lagrangian-based coarse-grid operator that leads to fast and scalable multilevel time integration of linear advection PDEs. The coarse-grid operator is composed of a semi-Lagrangian discretization followed by a correction term, with the correction designed so that the leading-order truncation error of the composite operator is approximately equal to that of the ideal coarse-grid operator. Parallel results show substantial speed-ups over sequential time integration for variable-wave-speed advection problems in one and two spatial dimensions, and using high-order discretizations up to order five. The proposed approach establishes the first practical method that provides small and scalable MGRIT iteration counts for advection problems.
△ Less
Submitted 22 April, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Anderson Acceleration as a Krylov Method with Application to Asymptotic Convergence Analysis
Authors:
Hans De Sterck,
Yunhui He,
Oliver A. Krzysik
Abstract:
Anderson acceleration (AA) is widely used for accelerating the convergence of nonlinear fixed-point methods $x_{k+1}=q(x_{k})$, $x_k \in \mathbb{R}^n$, but little is known about how to quantify the convergence acceleration provided by AA. As a roadway towards gaining more understanding of convergence acceleration by AA, we study AA($m$), i.e., Anderson acceleration with finite window size $m$, app…
▽ More
Anderson acceleration (AA) is widely used for accelerating the convergence of nonlinear fixed-point methods $x_{k+1}=q(x_{k})$, $x_k \in \mathbb{R}^n$, but little is known about how to quantify the convergence acceleration provided by AA. As a roadway towards gaining more understanding of convergence acceleration by AA, we study AA($m$), i.e., Anderson acceleration with finite window size $m$, applied to the case of linear fixed-point iterations $x_{k+1}=M x_{k}+b$. We write AA($m$) as a Krylov method with polynomial residual update formulas, and derive recurrence relations for the AA($m$) polynomials. Writing AA($m$) as a Krylov method immediately implies that $k$ iterations of AA($m$) cannot produce a smaller residual than $k$ iterations of GMRES without restart (but without implying anything about the relative convergence speed of (windowed) AA($m$) versus restarted GMRES($m$)). We find that the AA($m$) residual polynomials observe a periodic memory effect where increasing powers of the error iteration matrix $M$ act on the initial residual as the iteration number increases. We derive several further results based on these polynomial residual update formulas, including orthogonality relations, a lower bound on the AA(1) acceleration coefficient $β_k$, and explicit nonlinear recursions for the AA(1) residuals and residual polynomials that do not include the acceleration coefficient $β_k$. Using these recurrence relations we also derive new residual convergence bounds for AA(1) in the linear case, demonstrating how the per-iteration residual reduction $||r_{k+1}||/||r_{k}||$ depends strongly on the residual reduction in the previous iteration and on the angle between the prior residual vectors $r_k$ and $r_{k-1}$. We apply these results to study the influence of the initial guess on the asymptotic convergence factor of AA(1), and to study AA(1) residual convergence patterns.
△ Less
Submitted 23 February, 2023; v1 submitted 28 September, 2021;
originally announced September 2021.
-
Linear Asymptotic Convergence of Anderson Acceleration: Fixed-Point Analysis
Authors:
Hans De Sterck,
Yunhui He
Abstract:
We study the asymptotic convergence of AA($m$), i.e., Anderson acceleration with window size $m$ for accelerating fixed-point methods $x_{k+1}=q(x_{k})$, $x_k \in R^n$. Convergence acceleration by AA($m$) has been widely observed but is not well understood. We consider the case where the fixed-point iteration function $q(x)$ is differentiable and the convergence of the fixed-point method itself is…
▽ More
We study the asymptotic convergence of AA($m$), i.e., Anderson acceleration with window size $m$ for accelerating fixed-point methods $x_{k+1}=q(x_{k})$, $x_k \in R^n$. Convergence acceleration by AA($m$) has been widely observed but is not well understood. We consider the case where the fixed-point iteration function $q(x)$ is differentiable and the convergence of the fixed-point method itself is root-linear. We identify numerically several conspicuous properties of AA($m$) convergence: First, AA($m$) sequences $\{x_k\}$ converge root-linearly but the root-linear convergence factor depends strongly on the initial condition. Second, the AA($m$) acceleration coefficients $β^{(k)}$ do not converge but oscillate as $\{x_k\}$ converges to $x^*$. To shed light on these observations, we write the AA($m$) iteration as an augmented fixed-point iteration $z_{k+1} =Ψ(z_k)$, $z_k \in R^{n(m+1)}$ and analyze the continuity and differentiability properties of $Ψ(z)$ and $β(z)$. We find that the vector of acceleration coefficients $β(z)$ is not continuous at the fixed point $z^*$. However, we show that, despite the discontinuity of $β(z)$, the iteration function $Ψ(z)$ is Lipschitz continuous and directionally differentiable at $z^*$ for AA(1), and we generalize this to AA($m$) with $m>1$ for most cases. Furthermore, we find that $Ψ(z)$ is not differentiable at $z^*$. We then discuss how these theoretical findings relate to the observed convergence behaviour of AA($m$). The discontinuity of $β(z)$ at $z^*$ allows $β^{(k)}$ to oscillate as $\{x_k\}$ converges to $x^*$, and the non-differentiability of $Ψ(z)$ allows AA($m$) sequences to converge with root-linear convergence factors that strongly depend on the initial condition. Additional numerical results illustrate our findings.
△ Less
Submitted 2 May, 2022; v1 submitted 28 September, 2021;
originally announced September 2021.
-
Fast solution of fully implicit Runge-Kutta and discontinuous Galerkin in time for numerical PDEs, Part I: the linear setting
Authors:
Ben S. Southworth,
Oliver Krzysik,
Will Pazner,
Hans De Sterck
Abstract:
Fully implicit Runge-Kutta (IRK) methods have many desirable properties as time integration schemes in terms of accuracy and stability, but high-order IRK methods are not commonly used in practice with numerical PDEs due to the difficulty of solving the stage equations. This paper introduces a theoretical and algorithmic preconditioning framework for solving the systems of equations that arise fro…
▽ More
Fully implicit Runge-Kutta (IRK) methods have many desirable properties as time integration schemes in terms of accuracy and stability, but high-order IRK methods are not commonly used in practice with numerical PDEs due to the difficulty of solving the stage equations. This paper introduces a theoretical and algorithmic preconditioning framework for solving the systems of equations that arise from IRK methods applied to linear numerical PDEs (without algebraic constraints). This framework also naturally applies to discontinuous Galerkin discretizations in time. Under quite general assumptions on the spatial discretization that yield stable time integration, the preconditioned operator is proven to have condition number bounded by a small, order-one constant, independent of the spatial mesh and time-step size, and with only weak dependence on number of stages/polynomial order; for example, the preconditioned operator for 10th-order Gauss IRK has condition number less than two, independent of the spatial discretization and time step. The new method can be used with arbitrary existing preconditioners for backward Euler-type time step** schemes, and is amenable to the use of three-term recursion Krylov methods when the underlying spatial discretization is symmetric. The new method is demonstrated to be effective on various high-order finite-difference and finite-element discretizations of linear parabolic and hyperbolic problems, demonstrating fast, scalable solution of up to 10th order accuracy. The new method consistently outperforms existing block preconditioning approaches, and in several cases, the new method can achieve 4th-order accuracy using Gauss integration with roughly half the number of preconditioner applications and wallclock time as required using standard diagonally implicit RK methods.
△ Less
Submitted 5 October, 2021; v1 submitted 2 January, 2021;
originally announced January 2021.
-
N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations
Authors:
Aaron Baier-Reinio,
Hans De Sterck
Abstract:
We use neural ordinary differential equations to formulate a variant of the Transformer that is depth-adaptive in the sense that an input-dependent number of time steps is taken by the ordinary differential equation solver. Our goal in proposing the N-ODE Transformer is to investigate whether its depth-adaptivity may aid in overcoming some specific known theoretical limitations of the Transformer…
▽ More
We use neural ordinary differential equations to formulate a variant of the Transformer that is depth-adaptive in the sense that an input-dependent number of time steps is taken by the ordinary differential equation solver. Our goal in proposing the N-ODE Transformer is to investigate whether its depth-adaptivity may aid in overcoming some specific known theoretical limitations of the Transformer in handling nonlocal effects. Specifically, we consider the simple problem of determining the parity of a binary sequence, for which the standard Transformer has known limitations that can only be overcome by using a sufficiently large number of layers or attention heads. We find, however, that the depth-adaptivity of the N-ODE Transformer does not provide a remedy for the inherently nonlocal nature of the parity problem, and provide explanations for why this is so. Next, we pursue regularization of the N-ODE Transformer by penalizing the arclength of the ODE trajectories, but find that this fails to improve the accuracy or efficiency of the N-ODE Transformer on the challenging parity problem. We suggest future avenues of research for modifications and extensions of the N-ODE Transformer that may lead to improved accuracy and efficiency for sequence modelling tasks such as neural machine translation.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
On the Asymptotic Linear Convergence Speed of Anderson Acceleration Applied to ADMM
Authors:
Dawei Wang,
Yunhui He,
Hans De Sterck
Abstract:
Empirical results show that Anderson acceleration (AA) can be a powerful mechanism to improve the asymptotic linear convergence speed of the Alternating Direction Method of Multipliers (ADMM) when ADMM by itself converges linearly. However, theoretical results to quantify this improvement do not exist yet. In this paper we explain and quantify this improvement in linear asymptotic convergence spee…
▽ More
Empirical results show that Anderson acceleration (AA) can be a powerful mechanism to improve the asymptotic linear convergence speed of the Alternating Direction Method of Multipliers (ADMM) when ADMM by itself converges linearly. However, theoretical results to quantify this improvement do not exist yet. In this paper we explain and quantify this improvement in linear asymptotic convergence speed for the special case of a stationary version of AA applied to ADMM. We do so by considering the spectral properties of the Jacobians of ADMM and the stationary version of AA evaluated at the fixed point, where the coefficients of the stationary AA method are computed such that its asymptotic linear convergence factor is optimal. The optimal linear convergence factors of this stationary AA-ADMM method are computed analytically or by optimization, based on previous work on optimal stationary AA acceleration. Using this spectral picture and those analytical results, our approach provides new insight into how and by how much the stationary AA method can improve the asymptotic linear convergence factor of ADMM. Numerical results also indicate that the optimal linear convergence factor of the stationary AA methods gives a useful estimate for the asymptotic linear convergence speed of the non-stationary AA method that is used in practice.
△ Less
Submitted 29 November, 2020; v1 submitted 6 July, 2020;
originally announced July 2020.
-
On the Asymptotic Linear Convergence Speed of Anderson Acceleration, Nesterov Acceleration, and Nonlinear GMRES
Authors:
Hans De Sterck,
Yunhui He
Abstract:
We consider nonlinear convergence acceleration methods for fixed-point iteration $x_{k+1}=q(x_k)$, including Anderson acceleration (AA), nonlinear GMRES (NGMRES), and Nesterov-type acceleration (corresponding to AA with window size one). We focus on fixed-point methods that converge asymptotically linearly with convergence factor $ρ<1$ and that solve an underlying fully smooth and non-convex optim…
▽ More
We consider nonlinear convergence acceleration methods for fixed-point iteration $x_{k+1}=q(x_k)$, including Anderson acceleration (AA), nonlinear GMRES (NGMRES), and Nesterov-type acceleration (corresponding to AA with window size one). We focus on fixed-point methods that converge asymptotically linearly with convergence factor $ρ<1$ and that solve an underlying fully smooth and non-convex optimization problem. It is often observed that AA and NGMRES substantially improve the asymptotic convergence behavior of the fixed-point iteration, but this improvement has not been quantified theoretically. We investigate this problem under simplified conditions. First, we consider stationary versions of AA and NGMRES, and determine coefficients that result in optimal asymptotic convergence factors, given knowledge of the spectrum of $q'(x)$ at the fixed point $x^*$. This allows us to understand and quantify the asymptotic convergence improvement that can be provided by nonlinear convergence acceleration, viewing $x_{k+1}=q(x_k)$ as a nonlinear preconditioner for AA and NGMRES. Second, for the case of infinite window size, we consider linear asymptotic convergence bounds for GMRES applied to the fixed-point iteration linearized about $x^*$. Since AA and NGMRES are equivalent to GMRES in the linear case, one may expect the GMRES convergence factors to be relevant for AA and NGMRES as $x_k \rightarrow x^*$. Our results are illustrated numerically for a class of test problems from canonical tensor decomposition, comparing steepest descent and alternating least squares (ALS) as the fixed-point iterations that are accelerated by AA and NGMRES. Our numerical tests show that both approaches allow us to estimate asymptotic convergence speed for nonstationary AA and NGMRES with finite window size.
△ Less
Submitted 7 November, 2020; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Local Fourier analysis of multigrid for hybridized and embedded discontinuous Galerkin methods
Authors:
Yunhui He,
Sander Rhebergen,
Hans De Sterck
Abstract:
In this paper we present a geometric multigrid method with Jacobi and Vanka relaxation for hybridized and embedded discontinuous Galerkin discretizations of the Laplacian. We present a local Fourier analysis (LFA) of the two-grid error-propagation operator and show that the multigrid method applied to an embedded discontinuous Galerkin (EDG) discretization is almost as efficient as when applied to…
▽ More
In this paper we present a geometric multigrid method with Jacobi and Vanka relaxation for hybridized and embedded discontinuous Galerkin discretizations of the Laplacian. We present a local Fourier analysis (LFA) of the two-grid error-propagation operator and show that the multigrid method applied to an embedded discontinuous Galerkin (EDG) discretization is almost as efficient as when applied to a continuous Galerkin discretization. We furthermore show that multigrid applied to an EDG discretization outperforms multigrid applied to a hybridized discontinuous Galerkin (HDG) discretization. Numerical examples verify our LFA predictions.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
Optimizing multigrid reduction-in-time (MGRIT) and Parareal coarse-grid operators for linear advection
Authors:
Hans De Sterck,
Robert D. Falgout,
Stephanie Friedhoff,
Oliver A. Krzysik,
Scott P. MacLachlan
Abstract:
Parallel-in-time methods, such as multigrid reduction-in-time (MGRIT) and Parareal, provide an attractive option for increasing concurrency when simulating time-dependent PDEs in modern high-performance computing environments. While these techniques have been very successful for parabolic equations, it has often been observed that their performance suffers dramatically when applied to advection-do…
▽ More
Parallel-in-time methods, such as multigrid reduction-in-time (MGRIT) and Parareal, provide an attractive option for increasing concurrency when simulating time-dependent PDEs in modern high-performance computing environments. While these techniques have been very successful for parabolic equations, it has often been observed that their performance suffers dramatically when applied to advection-dominated problems or purely hyperbolic PDEs using standard rediscretization approaches on coarse grids. In this paper, we apply MGRIT or Parareal to the constant-coefficient linear advection equation, appealing to existing convergence theory to provide insight into the typically non-scalable or even divergent behavior of these solvers for this problem. To overcome these failings, we replace rediscretization on coarse grids with improved coarse-grid operators that are computed by applying optimization techniques to approximately minimize error estimates from the convergence theory. One of our main findings is that, in order to obtain fast convergence as for parabolic problems, coarse-grid operators should take into account the behavior of the hyperbolic problem by tracking the characteristic curves. Our approach is tested for schemes of various orders using explicit or implicit Runge-Kutta methods combined with upwind-finite-difference spatial discretizations. In all cases, we obtain scalable convergence in just a handful of iterations, with parallel tests also showing significant speed-ups over sequential time-step**. Our insight of tracking characteristics on coarse grids provides a key idea for solving the long-standing problem of efficient parallel-in-time integration for hyperbolic PDEs.
△ Less
Submitted 2 March, 2021; v1 submitted 8 October, 2019;
originally announced October 2019.
-
Convergence analysis for parallel-in-time solution of hyperbolic systems
Authors:
Hans De Sterck,
Stephanie Friedhoff,
Alexander J. M. Howse,
Scott P. MacLachlan
Abstract:
Parallel-in-time algorithms have been successfully employed for reducing time-to-solution of a variety of partial differential equations, especially for diffusive (parabolic-type) equations. A major failing of parallel-in-time approaches to date, however, is that most methods show instabilities or poor convergence for hyperbolic problems. This paper focuses on the analysis of the convergence behav…
▽ More
Parallel-in-time algorithms have been successfully employed for reducing time-to-solution of a variety of partial differential equations, especially for diffusive (parabolic-type) equations. A major failing of parallel-in-time approaches to date, however, is that most methods show instabilities or poor convergence for hyperbolic problems. This paper focuses on the analysis of the convergence behavior of multigrid methods for the parallel-in-time solution of hyperbolic problems. Three analysis tools are considered that differ, in particular, in the treatment of the time dimension: (1) space-time local Fourier analysis, using a Fourier ansatz in space and time, (2) semi-algebraic mode analysis, coupling standard local Fourier analysis approaches in space with algebraic computation in time, and (3) a two-level reduction analysis, considering error propagation only on the coarse time grid. In this paper, we show how insights from reduction analysis can be used to improve feasibility of the semi-algebraic mode analysis, resulting in a tool that offers the best features of both analysis techniques. Following validating numerical results, we investigate what insights the combined analysis framework can offer for two model hyperbolic problems, the linear advection equation in one space dimension and linear elasticity in two space dimensions.
△ Less
Submitted 27 August, 2019; v1 submitted 21 March, 2019;
originally announced March 2019.
-
On selecting coarse-grid operators for Parareal and MGRIT applied to linear advection
Authors:
Oliver A. Krzysik,
Hans De Sterck,
Scott P. MacLachlan,
Stephanie Friedhoff
Abstract:
We consider the parallel time integration of the linear advection equation with the Parareal and two-level multigrid-reduction-in-time (MGRIT) algorithms. Our aim is to develop a better understanding of the convergence behaviour of these algorithms for this problem, which is known to be poor relative to the diffusion equation, its model parabolic counterpart. Using Fourier analysis, we derive new…
▽ More
We consider the parallel time integration of the linear advection equation with the Parareal and two-level multigrid-reduction-in-time (MGRIT) algorithms. Our aim is to develop a better understanding of the convergence behaviour of these algorithms for this problem, which is known to be poor relative to the diffusion equation, its model parabolic counterpart. Using Fourier analysis, we derive new convergence estimates for these algorithms which, in conjunction with existing convergence theory, provide insight into the origins of this poor performance. We then use this theory to explore improved coarse-grid time-step** operators. For several high-order discretizations of the advection equation, we demonstrate that there exist non-standard coarse-grid time step** operators that yield significant improvements over the standard choice of rediscretization.
△ Less
Submitted 13 October, 2019; v1 submitted 20 February, 2019;
originally announced February 2019.
-
Nesterov Acceleration of Alternating Least Squares for Canonical Tensor Decomposition: Momentum Step Size Selection and Restart Mechanisms
Authors:
Drew Mitchell,
Nan Ye,
Hans De Sterck
Abstract:
We present Nesterov-type acceleration techniques for Alternating Least Squares (ALS) methods applied to canonical tensor decomposition. While Nesterov acceleration turns gradient descent into an optimal first-order method for convex problems by adding a momentum term with a specific weight sequence, a direct application of this method and weight sequence to ALS results in erratic convergence behav…
▽ More
We present Nesterov-type acceleration techniques for Alternating Least Squares (ALS) methods applied to canonical tensor decomposition. While Nesterov acceleration turns gradient descent into an optimal first-order method for convex problems by adding a momentum term with a specific weight sequence, a direct application of this method and weight sequence to ALS results in erratic convergence behaviour. This is so because the tensor decomposition problem is non-convex and ALS is accelerated instead of gradient descent. Instead, we consider various restart mechanisms and suitable choices of momentum weights that enable effective acceleration. Our extensive empirical results show that the Nesterov-accelerated ALS methods with restart can be dramatically more efficient than the stand-alone ALS or Nesterov accelerated gradient methods, when problems are ill-conditioned or accurate solutions are desired. The resulting methods perform competitively with or superior to existing acceleration methods for ALS, including ALS acceleration by NCG, NGMRES, or LBFGS, and additionally enjoy the benefit of being much easier to implement. We also compare with Nesterov-type updates where the momentum weight is determined by a line search, which are equivalent or closely related to existing line search methods for ALS. On a large and ill-conditioned 71$\times$1000$\times$900 tensor consisting of readings from chemical sensors to track hazardous gases, the restarted Nesterov-ALS method shows desirable robustness properties and outperforms any of the existing methods by a large factor. There is clear potential for extending our Nesterov-type acceleration approach to accelerating other optimization algorithms than ALS applied to other non-convex problems, such as Tucker tensor decomposition. Our Matlab code is available at https://github.com/hansdesterck/nonlinear-preconditioning-for-optimization.
△ Less
Submitted 30 November, 2019; v1 submitted 13 October, 2018;
originally announced October 2018.
-
Nonlinearly Preconditioned L-BFGS as an Acceleration Mechanism for Alternating Least Squares, with Application to Tensor Decomposition
Authors:
Hans De Sterck,
Alexander J. M. Howse
Abstract:
We derive nonlinear acceleration methods based on the limited memory BFGS (L-BFGS) update formula for accelerating iterative optimization methods of alternating least squares (ALS) type applied to canonical polyadic (CP) and Tucker tensor decompositions. Our approach starts from linear preconditioning ideas that use linear transformations encoded by matrix multiplications, and extends these ideas…
▽ More
We derive nonlinear acceleration methods based on the limited memory BFGS (L-BFGS) update formula for accelerating iterative optimization methods of alternating least squares (ALS) type applied to canonical polyadic (CP) and Tucker tensor decompositions. Our approach starts from linear preconditioning ideas that use linear transformations encoded by matrix multiplications, and extends these ideas to the case of genuinely nonlinear preconditioning, where the preconditioning operation involves fully nonlinear transformations. As such, the ALS-type iterations are used as fully nonlinear preconditioners for L-BFGS, or, equivalently, L-BFGS is used as a nonlinear accelerator for ALS. Numerical results show that the resulting methods perform much better than either stand-alone L-BFGS or stand-alone ALS, offering substantial improvements in terms of time-to-solution and robustness over state-of-the-art methods for large and noisy tensor problems, including previously described acceleration methods based on nonlinear conjugate gradients and nonlinear GMRES. Our approach provides a general L-BFGS-based acceleration mechanism for nonlinear optimization.
△ Less
Submitted 27 June, 2018; v1 submitted 23 March, 2018;
originally announced March 2018.
-
Random Spatial Networks: Small Worlds without Clustering, Traveling Waves, and Hop-and-Spread Disease Dynamics
Authors:
John Lang,
Hans De Sterck,
Jamieson L. Kaiser,
Joel C. Miller
Abstract:
Random network models play a prominent role in modeling, analyzing and understanding complex phenomena on real-life networks. However, a key property of networks is often neglected: many real-world networks exhibit spatial structure, the tendency of a node to select neighbors with a probability depending on physical distance. Here, we introduce a class of random spatial networks (RSNs) which gener…
▽ More
Random network models play a prominent role in modeling, analyzing and understanding complex phenomena on real-life networks. However, a key property of networks is often neglected: many real-world networks exhibit spatial structure, the tendency of a node to select neighbors with a probability depending on physical distance. Here, we introduce a class of random spatial networks (RSNs) which generalizes many existing random network models but adds spatial structure. In these networks, nodes are placed randomly in space and joined in edges with a probability depending on their distance and their individual expected degrees, in a manner that crucially remains analytically tractable. We use this network class to propose a new generalization of small-world networks, where the average shortest path lengths in the graph are small, as in classical Watts-Strogatz small-world networks, but with close spatial proximity of nodes that are neighbors in the network playing the role of large clustering. Small-world effects are demonstrated on these spatial small-world networks without clustering. We are able to derive partial integro-differential equations governing susceptible-infectious-recovered disease spreading through an RSN, and we demonstrate the existence of traveling wave solutions. If the distance kernel governing edge placement decays slower than exponential, the population-scale dynamics are dominated by long-range hops followed by local spread of traveling waves. This provides a theoretical modeling framework for recent observations of how epidemics like Ebola evolve in modern connected societies, with long-range connections seeding new focal points from which the epidemic locally spreads in a wavelike manner.
△ Less
Submitted 4 February, 2017;
originally announced February 2017.
-
Research and Education in Computational Science and Engineering
Authors:
Ulrich Rüde,
Karen Willcox,
Lois Curfman McInnes,
Hans De Sterck,
George Biros,
Hans Bungartz,
James Corones,
Evin Cramer,
James Crowley,
Omar Ghattas,
Max Gunzburger,
Michael Hanke,
Robert Harrison,
Michael Heroux,
Jan Hesthaven,
Peter Jimack,
Chris Johnson,
Kirk E. Jordan,
David E. Keyes,
Rolf Krause,
Vipin Kumar,
Stefan Mayer,
Juan Meza,
Knut Martin Mørken,
J. Tinsley Oden
, et al. (8 additional authors not shown)
Abstract:
Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that…
▽ More
Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers of all persuasions with algorithmic inventions and software systems that transcend disciplines and scales. Carried on a wave of digital technology, CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society; and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution that engulfs the planet, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. This report describes the rapid expansion of CSE and the challenges to sustaining its bold advances. The report also presents strategies and directions for CSE research and education for the next decade.
△ Less
Submitted 31 December, 2017; v1 submitted 8 October, 2016;
originally announced October 2016.
-
The Statistical Mechanics of Human Weight Change
Authors:
John C. Lang,
Hans De Sterck,
Daniel M. Abrams
Abstract:
In the context of the global obesity epidemic, it is important to know who becomes obese and why. However, the processes that determine the changing shape of Body Mass Index (BMI) distributions in high-income societies are not well-understood. Here we establish the statistical mechanics of human weight change, providing a fundamental new understanding of human weight distributions. By compiling an…
▽ More
In the context of the global obesity epidemic, it is important to know who becomes obese and why. However, the processes that determine the changing shape of Body Mass Index (BMI) distributions in high-income societies are not well-understood. Here we establish the statistical mechanics of human weight change, providing a fundamental new understanding of human weight distributions. By compiling and analysing the largest data set so far of year-over-year BMI changes, we find, strikingly, that heavy people on average strongly decrease their weight year-over-year, and light people increase their weight. This drift towards the centre of the BMI distribution is balanced by diffusion resulting from random fluctuations in diet and physical activity that are, notably, proportional in size to BMI. We formulate a stochastic mathematical model for BMI dynamics, deriving a theoretical shape for the BMI distribution and offering a mechanism to explain the ongoing right-skewed broadening of BMI distributions over time. The model also provides new quantitative support for the hypothesis that peer-to-peer social influence plays a measurable role in BMI dynamics. More broadly, our results demonstrate a remarkable analogy with drift-diffusion mechanisms that are well-known from the physical sciences and finance.
△ Less
Submitted 29 September, 2016;
originally announced October 2016.
-
GeoTextTagger: High-Precision Location Tagging of Textual Documents using a Natural Language Processing Approach
Authors:
Shawn Brunsting,
Hans De Sterck,
Remco Dolman,
Teun van Sprundel
Abstract:
Location tagging, also known as geotagging or geolocation, is the process of assigning geographical coordinates to input data. In this paper we present an algorithm for location tagging of textual documents. Our approach makes use of previous work in natural language processing by using a state-of-the-art part-of-speech tagger and named entity recognizer to find blocks of text which may refer to l…
▽ More
Location tagging, also known as geotagging or geolocation, is the process of assigning geographical coordinates to input data. In this paper we present an algorithm for location tagging of textual documents. Our approach makes use of previous work in natural language processing by using a state-of-the-art part-of-speech tagger and named entity recognizer to find blocks of text which may refer to locations. A knowledge base (OpenStreatMap) is then used to find a list of possible locations for each block. Finally, one location is chosen for each block by assigning distance-based scores to each location and repeatedly selecting the location and block with the best score. We tested our geolocation algorithm with Wikipedia articles about topics with a well-defined geographical location that are geotagged by the articles' authors, where classification approaches have achieved median errors as low as 11 km, with attainable accuracy limited by the class size. Our approach achieved a 10th percentile error of 490 metres and median error of 54 kilometres on the Wikipedia dataset we used. When considering the five location tags with the greatest scores, 50% of articles were assigned at least one tag within 8.5 kilometres of the article's author-assigned true location. We also tested our approach on Twitter messages that are tagged with the location from which the message was sent. Twitter texts are challenging because they are short and unstructured and often do not contain words referring to the location they were sent from, but we obtain potentially useful results. We explain how we use the Spark framework for data analytics to collect and process our test data. In general, classification-based approaches for location tagging may be reaching their upper accuracy limit, but our precision-focused approach has high accuracy for some texts and shows significant potential for improvement overall.
△ Less
Submitted 22 January, 2016;
originally announced January 2016.
-
A polynomial expansion line search for large-scale unconstrained minimization of smooth L2-regularized loss functions, with implementation in Apache Spark
Authors:
Michael B Hynes,
Hans De Sterck
Abstract:
In large-scale unconstrained optimization algorithms such as limited memory BFGS (LBFGS), a common subproblem is a line search minimizing the loss function along a descent direction. Commonly used line searches iteratively find an approximate solution for which the Wolfe conditions are satisfied, typically requiring multiple function and gradient evaluations per line search, which is expensive in…
▽ More
In large-scale unconstrained optimization algorithms such as limited memory BFGS (LBFGS), a common subproblem is a line search minimizing the loss function along a descent direction. Commonly used line searches iteratively find an approximate solution for which the Wolfe conditions are satisfied, typically requiring multiple function and gradient evaluations per line search, which is expensive in parallel due to communication requirements. In this paper we propose a new line search approach for cases where the loss function is analytic, as in least squares regression, logistic regression, or low rank matrix factorization. We approximate the loss function by a truncated Taylor polynomial, whose coefficients may be computed efficiently in parallel with less communication than evaluating the gradient, after which this polynomial may be minimized with high accuracy in a neighbourhood of the expansion point. Our Polynomial Expansion Line Search (PELS) was implemented in the Apache Spark framework and used to accelerate the training of a logistic regression model on binary classification datasets from the LIBSVM repository with LBFGS and the Nonlinear Conjugate Gradient (NCG) method. In large-scale numerical experiments in parallel on a 16-node cluster with 256 cores using the URL, KDDA, and KDDB datasets, the PELS approach produced significant convergence improvements compared to the use of classical Wolfe line searches. For example, to reach the final training label prediction accuracies, LBFGS using PELS had speedup factors of 1.8--2 over LBFGS using a Wolfe line search, measured by both the number of iterations and the time required, due to the better accuracy of step sizes computed in the line search. PELS has the potential to significantly accelerate large-scale regression and factorization computations, and is applicable to continuous optimization problems with smooth loss functions.
△ Less
Submitted 26 January, 2016; v1 submitted 28 October, 2015;
originally announced October 2015.
-
Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: Speeding up Distributed Big Data Recommendation in Spark
Authors:
Manda Winlaw,
Michael B. Hynes,
Anthony Caterini,
Hans De Sterck
Abstract:
Collaborative filtering algorithms are important building blocks in many practical recommendation systems. For example, many large-scale data processing environments include collaborative filtering models for which the Alternating Least Squares (ALS) algorithm is used to compute latent factor matrix decompositions. In this paper, we propose an approach to accelerate the convergence of parallel ALS…
▽ More
Collaborative filtering algorithms are important building blocks in many practical recommendation systems. For example, many large-scale data processing environments include collaborative filtering models for which the Alternating Least Squares (ALS) algorithm is used to compute latent factor matrix decompositions. In this paper, we propose an approach to accelerate the convergence of parallel ALS-based optimization methods for collaborative filtering using a nonlinear conjugate gradient (NCG) wrapper around the ALS iterations. We also provide a parallel implementation of the accelerated ALS-NCG algorithm in the Apache Spark distributed data processing environment, and an efficient line search technique as part of the ALS-NCG implementation that requires only one pass over the data on distributed datasets. In serial numerical experiments on a linux workstation and parallel numerical experiments on a 16 node cluster with 256 computing cores, we demonstrate that the combined ALS-NCG method requires many fewer iterations and less time than standalone ALS to reach movie rankings with high accuracy on the MovieLens 20M dataset. In parallel, ALS-NCG can achieve an acceleration factor of 4 or greater in clock time when an accurate solution is desired; furthermore, the acceleration factor increases as greater numerical precision is required in the solution. In addition, the NCG acceleration mechanism is efficient in parallel and scales linearly with problem size on synthetic datasets with up to nearly 1 billion ratings. The acceleration mechanism is general and may also be applicable to other optimization methods for collaborative filtering.
△ Less
Submitted 10 January, 2016; v1 submitted 12 August, 2015;
originally announced August 2015.
-
A Hierarchy of Linear Threshold Models for the Spread of Political Revolutions on Social Networks
Authors:
John C. Lang,
Hans De Sterck
Abstract:
We study a linear threshold agent-based model (ABM) for the spread of political revolutions on social networks using empirical network data. We propose new techniques for building a hierarchy of simplified ordinary differential equation (ODE) based models that aim to capture essential features of the ABM, including effects of the actual networks, and give insight in the parameter regime transition…
▽ More
We study a linear threshold agent-based model (ABM) for the spread of political revolutions on social networks using empirical network data. We propose new techniques for building a hierarchy of simplified ordinary differential equation (ODE) based models that aim to capture essential features of the ABM, including effects of the actual networks, and give insight in the parameter regime transitions of the ABM. We relate the ABM and the hierarchy of models to a population-level compartmental ODE model that we proposed previously for the spread of political revolutions [1], which is shown to be mathematically consistent with the proposed ABM and provides a way to analyze the global behaviour of the ABM. This consistency with the linear threshold ABM also provides further justification a posteriori for the compartmental model of [1]. Extending concepts from epidemiological modelling, we define a basic reproduction number $R_0$ for the linear threshold ABM and apply it to predict ABM behaviour on empirical networks. In small-scale numerical tests we investigate experimentally the differences in spreading behaviour that occur under the linear threshold ABM model when applied to some empirical online and offline social networks, searching for quantitative evidence that political revolutions may be facilitated by the modern online social networks of social media.
△ Less
Submitted 16 January, 2015;
originally announced January 2015.
-
A Nonlinearly Preconditioned Conjugate Gradient Algorithm for Rank-R Canonical Tensor Approximation
Authors:
Hans De Sterck,
Manda Winlaw
Abstract:
Alternating least squares (ALS) is often considered the workhorse algorithm for computing the rank-R canonical tensor approximation, but for certain problems its convergence can be very slow. The nonlinear conjugate gradient (NCG) method was recently proposed as an alternative to ALS, but the results indicated that NCG is usually not faster than ALS. To improve the convergence speed of NCG, we con…
▽ More
Alternating least squares (ALS) is often considered the workhorse algorithm for computing the rank-R canonical tensor approximation, but for certain problems its convergence can be very slow. The nonlinear conjugate gradient (NCG) method was recently proposed as an alternative to ALS, but the results indicated that NCG is usually not faster than ALS. To improve the convergence speed of NCG, we consider a nonlinearly preconditioned nonlinear conjugate gradient (PNCG) algorithm for computing the rank-R canonical tensor decomposition. Our approach uses ALS as a nonlinear preconditioner in the NCG algorithm. Alternatively, NCG can be viewed as an acceleration process for ALS. We demonstrate numerically that the convergence acceleration mechanism in PNCG often leads to important pay-offs for difficult tensor decomposition problems, with convergence that is significantly faster and more robust than for the stand-alone NCG or ALS algorithms. We consider several approaches for incorporating the nonlinear preconditioner into the NCG algorithm that have been described in the literature previously and have met with success in certain application areas. However, it appears that the nonlinearly preconditioned NCG approach has received relatively little attention in the broader community and remains underexplored both theoretically and experimentally. Thus, this paper serves several additional functions, by providing in one place a concise overview of several PNCG variants and their properties that have only been described in a few places scattered throughout the literature, by systematically comparing the performance of these PNCG variants for the tensor decomposition problem, and by drawing further attention to the usefulness of nonlinearly preconditioned NCG as a general tool. In addition, we briefly discuss the convergence of the PNCG algorithm.
△ Less
Submitted 19 July, 2014;
originally announced July 2014.
-
The influence of societal individualism on a century of tobacco use: modelling the prevalence of smoking
Authors:
John C. Lang,
Daniel M. Abrams,
Hans De Sterck
Abstract:
Smoking of tobacco is predicted to cause approximately six million deaths worldwide in 2014. Responding effectively to this epidemic requires a thorough understanding of how smoking behaviour is transmitted and modified. Here, we present a new mathematical model of the social dynamics that cause cigarette smoking to spread in a population. Our model predicts that more individualistic societies wil…
▽ More
Smoking of tobacco is predicted to cause approximately six million deaths worldwide in 2014. Responding effectively to this epidemic requires a thorough understanding of how smoking behaviour is transmitted and modified. Here, we present a new mathematical model of the social dynamics that cause cigarette smoking to spread in a population. Our model predicts that more individualistic societies will show faster adoption and cessation of smoking. Evidence from a new century-long composite data set on smoking prevalence in 25 countries supports the model, with direct implications for public health interventions around the world. Our results suggest that differences in culture between societies can measurably affect the temporal dynamics of a social spreading process, and that these effects can be understood via a quantitative mathematical model matched to observations.
△ Less
Submitted 8 July, 2014;
originally announced July 2014.
-
The Arab Spring: A Simple Compartmental Model for the Dynamics of a Revolution
Authors:
John Lang,
Hans De Sterck
Abstract:
The self-immolation of Mohamed Bouazizi on December 17, 2011 in the small Tunisian city of Sidi Bouzid, set off a sequence of events culminating in the revolutions of the Arab Spring. It is widely believed that the Internet and social media played a critical role in the growth and success of protests that led to the downfall of the regimes in Egypt and Tunisia. However, the precise mechanisms by w…
▽ More
The self-immolation of Mohamed Bouazizi on December 17, 2011 in the small Tunisian city of Sidi Bouzid, set off a sequence of events culminating in the revolutions of the Arab Spring. It is widely believed that the Internet and social media played a critical role in the growth and success of protests that led to the downfall of the regimes in Egypt and Tunisia. However, the precise mechanisms by which these new media affected the course of events remain unclear. We introduce a simple compartmental model for the dynamics of a revolution in a dictatorial regime such as Tunisia or Egypt which takes into account the role of the Internet and social media. An elementary mathematical analysis of the model identifies four main parameter regions: stable police state, meta-stable police state, unstable police state, and failed state. We illustrate how these regions capture, at least qualitatively, a wide range of scenarios observed in the context of revolutionary movements by considering the revolutions in Tunisia and Egypt, as well as the situation in Iran, China, and Somalia, as case studies. We pose four questions about the dynamics of the Arab Spring revolutions and formulate answers informed by the model. We conclude with some possible directions for future work.
△ Less
Submitted 5 October, 2012;
originally announced October 2012.
-
An adaptive algebraic multigrid algorithm for low-rank canonical tensor decomposition
Authors:
Hans De Sterck,
Killian Miller
Abstract:
This paper presents a multigrid algorithm for the computation of the rank-R canonical decomposition of a tensor for low rank R. Standard alternating least squares (ALS) is used as the relaxation method. Transfer operators and coarse-level tensors are constructed in an adaptive setup phase based on multiplicative correction and on Bootstrap algebraic multigrid. An accurate solution is then computed…
▽ More
This paper presents a multigrid algorithm for the computation of the rank-R canonical decomposition of a tensor for low rank R. Standard alternating least squares (ALS) is used as the relaxation method. Transfer operators and coarse-level tensors are constructed in an adaptive setup phase based on multiplicative correction and on Bootstrap algebraic multigrid. An accurate solution is then computed by an additive solve phase based on the Full Approximation Scheme. Numerical tests show that for certain test problems the multilevel method significantly outperforms standalone ALS when a high level of accuracy is required.
△ Less
Submitted 25 November, 2011;
originally announced November 2011.
-
Steepest Descent Preconditioning for Nonlinear GMRES Optimization
Authors:
Hans De Sterck
Abstract:
Steepest descent preconditioning is considered for the recently proposed nonlinear generalized minimal residual (N-GMRES) optimization algorithm for unconstrained nonlinear optimization. Two steepest descent preconditioning variants are proposed. The first employs a line search, while the second employs a predefined small step. A simple global convergence proof is provided for the N-GMRES optimiza…
▽ More
Steepest descent preconditioning is considered for the recently proposed nonlinear generalized minimal residual (N-GMRES) optimization algorithm for unconstrained nonlinear optimization. Two steepest descent preconditioning variants are proposed. The first employs a line search, while the second employs a predefined small step. A simple global convergence proof is provided for the N-GMRES optimization algorithm with the first steepest descent preconditioner (with line search), under mild standard conditions on the objective function and the line search processes. Steepest descent preconditioning for N-GMRES optimization is also motivated by relating it to standard non-preconditioned GMRES for linear systems in the case of a quadratic optimization problem with symmetric positive definite operator. Numerical tests on a variety of model problems show that the N-GMRES optimization algorithm is able to very significantly accelerate convergence of stand-alone steepest descent optimization. Moreover, performance of steepest-descent preconditioned N-GMRES is shown to be competitive with standard nonlinear conjugate gradient and limited-memory Broyden-Fletcher-Goldfarb-Shanno methods for the model problems considered. These results serve to theoretically and numerically establish steepest-descent preconditioned N-GMRES as a general optimization method for unconstrained nonlinear optimization, with performance that appears promising compared to established techniques. In addition, it is argued that the real potential of the N-GMRES optimization framework lies in the fact that it can make use of problem-dependent nonlinear preconditioners that are more powerful than steepest descent (or, equivalently, N-GMRES can be used as a simple wrapper around any other iterative optimization process to seek acceleration of that process), and this potential is illustrated with a further application example.
△ Less
Submitted 24 July, 2011; v1 submitted 22 June, 2011;
originally announced June 2011.
-
A Nonlinear GMRES Optimization Algorithm for Canonical Tensor Decomposition
Authors:
Hans De Sterck
Abstract:
A new algorithm is presented for computing a canonical rank-R tensor approximation that has minimal distance to a given tensor in the Frobenius norm, where the canonical rank-R tensor consists of the sum of R rank-one components. Each iteration of the method consists of three steps. In the first step, a tentative new iterate is generated by a stand-alone one-step process, for which we use alternat…
▽ More
A new algorithm is presented for computing a canonical rank-R tensor approximation that has minimal distance to a given tensor in the Frobenius norm, where the canonical rank-R tensor consists of the sum of R rank-one components. Each iteration of the method consists of three steps. In the first step, a tentative new iterate is generated by a stand-alone one-step process, for which we use alternating least squares (ALS). In the second step, an accelerated iterate is generated by a nonlinear generalized minimal residual (GMRES) approach, recombining previous iterates in an optimal way, and essentially using the stand-alone one-step process as a preconditioner. In particular, the nonlinear extension of GMRES is used that was proposed by Washio and Oosterlee in [ETNA Vol. 15 (2003), pp. 165-185] for nonlinear partial differential equation problems. In the third step, a line search is performed for globalization. The resulting nonlinear GMRES (N-GMRES) optimization algorithm is applied to dense and sparse tensor decomposition test problems. The numerical tests show that ALS accelerated by N-GMRES may significantly outperform both stand-alone ALS and a standard nonlinear conjugate gradient optimization method, especially when highly accurate stationary points are desired for difficult problems. The proposed N-GMRES optimization algorithm is based on general concepts and may be applied to other nonlinear optimization problems.
△ Less
Submitted 26 May, 2011;
originally announced May 2011.
-
A Self-learning Algebraic Multigrid Method for Extremal Singular Triplets and Eigenpairs
Authors:
Hans De Sterck
Abstract:
A self-learning algebraic multigrid method for dominant and minimal singular triplets and eigenpairs is described. The method consists of two multilevel phases. In the first, multiplicative phase (setup phase), tentative singular triplets are calculated along with a multigrid hierarchy of interpolation operators that approximately fit the tentative singular vectors in a collective and self-learnin…
▽ More
A self-learning algebraic multigrid method for dominant and minimal singular triplets and eigenpairs is described. The method consists of two multilevel phases. In the first, multiplicative phase (setup phase), tentative singular triplets are calculated along with a multigrid hierarchy of interpolation operators that approximately fit the tentative singular vectors in a collective and self-learning manner, using multiplicative update formulas. In the second, additive phase (solve phase), the tentative singular triplets are improved up to the desired accuracy by using an additive correction scheme with fixed interpolation operators, combined with a Ritz update. A suitable generalization of the singular value decomposition is formulated that applies to the coarse levels of the multilevel cycles. The proposed algorithm combines and extends two existing multigrid approaches for symmetric positive definite eigenvalue problems to the case of dominant and minimal singular triplets. Numerical tests on model problems from different areas show that the algorithm converges to high accuracy in a modest number of iterations, and is flexible enough to deal with a variety of problems due to its self-learning properties.
△ Less
Submitted 4 February, 2011;
originally announced February 2011.
-
A generalized Monte Carlo loop algorithm for frustrated Ising models
Authors:
Yuan Wang,
Hans De Sterck,
Roger G. Melko
Abstract:
We introduce a Generalized Loop Move (GLM) update for Monte Carlo simulations of frustrated Ising models on two-dimensional lattices with bond-sharing plaquettes. The GLM updates are designed to enhance Monte Carlo sampling efficiency when the system's low-energy states consist of an extensive number of degenerate or near-degenerate spin configurations, separated by large energy barriers to single…
▽ More
We introduce a Generalized Loop Move (GLM) update for Monte Carlo simulations of frustrated Ising models on two-dimensional lattices with bond-sharing plaquettes. The GLM updates are designed to enhance Monte Carlo sampling efficiency when the system's low-energy states consist of an extensive number of degenerate or near-degenerate spin configurations, separated by large energy barriers to single spin flips. Through implementation on several frustrated Ising models, we demonstrate the effectiveness of the GLM updates in cases where both degenerate and near-degenerate sets of configurations are favored at low temperatures. The GLM update's potential to be straightforwardly extended to different lattices and spin interactions allow it to be readily adopted on many other frustrated Ising models of physical relevance.
△ Less
Submitted 21 July, 2010;
originally announced July 2010.
-
Monte Carlo study of degenerate groundstates and residual entropy in a frustrated honeycomb lattice Ising model
Authors:
Shawn Andrews,
Hans De Sterck,
Stephen Inglis,
Roger G. Melko
Abstract:
We study a classical fully-frustrated honeycomb lattice Ising model using Markov chain Monte Carlo methods and exact calculations . The Hamiltonian realizes a degenerate ground state manifold of equal-energy states, where each hexagonal plaquette of the lattice has one and only one frustrated bond, with an extensive residual entropy that grows as the number of spins N. Traditional single-spin fl…
▽ More
We study a classical fully-frustrated honeycomb lattice Ising model using Markov chain Monte Carlo methods and exact calculations . The Hamiltonian realizes a degenerate ground state manifold of equal-energy states, where each hexagonal plaquette of the lattice has one and only one frustrated bond, with an extensive residual entropy that grows as the number of spins N. Traditional single-spin flip Monte Carlo methods fail to sample all possible spin configurations in this ground state efficiently, due to their separation by large energy barriers. We develop a non-local "chain-flip" algorithm that solves this problem, and demonstrate its effectiveness on the Ising Hamiltonian with and without perturbative interactions. The two perturbations considered are a slightly weakened bond, and an external magnetic field h. For some cases, the chain-flip move is necessary for the simulation to find an ordered ground state. In the case of the magnetic field, two magnetized ground states with non-extensive entropy are found, and two special values of h exist where the residual entropy again becomes extensive, scaling proportional to N ln phi, where phi is the golden ratio.
△ Less
Submitted 27 March, 2009; v1 submitted 17 December, 2008;
originally announced December 2008.