-
Multigrid-in-time preconditioners for KKT systems
Authors:
Radoslav Vuchkov,
Eric C. Cyr,
Denis Ridzal
Abstract:
We develop multigrid-in-time preconditioners for Karush-Kuhn-Tucker (KKT) systems that arise in the solution of time-dependent optimization problems. We focus on a specific instance of KKT systems, known as augmented systems, which underpin the composite-step sequential quadratic programming framework [1]. To enable time-domain decomposition, our approach introduces virtual state variables and con…
▽ More
We develop multigrid-in-time preconditioners for Karush-Kuhn-Tucker (KKT) systems that arise in the solution of time-dependent optimization problems. We focus on a specific instance of KKT systems, known as augmented systems, which underpin the composite-step sequential quadratic programming framework [1]. To enable time-domain decomposition, our approach introduces virtual state variables and continuity constraints at each discrete time interval. The virtual state variables not only facilitate a decoupling in time but also give rise to fixed-point iterations that aid the solution of KKT systems. These fixed-point schemes can be used either as preconditioners for Krylov subspace methods or as smoothers for multigrid-in-time schemes. For the latter, we develop a block-Jacobi scheme that parallelizes trivially in the time domain. To complete the multigrid construction, we use simple prolongation and restriction operators based on geometric multigrid ideas, and a coarse-grid solver based on a GMRES iteration preconditioned with the symmetric block Gauss-Seidel scheme. We present two optimal control examples, involving the viscous Burgers' and van der Pol oscillator equations, respectively, and demonstrate algorithmic scalability.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Graph Neural Networks and Applied Linear Algebra
Authors:
Nicholas S. Moore,
Eric C. Cyr,
Peter Ohm,
Christopher M. Siefert,
Raymond S. Tuminaro
Abstract:
Sparse matrix computations are ubiquitous in scientific computing. With the recent interest in scientific machine learning, it is natural to ask how sparse matrix computations can leverage neural networks (NN). Unfortunately, multi-layer perceptron (MLP) neural networks are typically not natural for either graph or sparse matrix computations. The issue lies with the fact that MLPs require fixed-si…
▽ More
Sparse matrix computations are ubiquitous in scientific computing. With the recent interest in scientific machine learning, it is natural to ask how sparse matrix computations can leverage neural networks (NN). Unfortunately, multi-layer perceptron (MLP) neural networks are typically not natural for either graph or sparse matrix computations. The issue lies with the fact that MLPs require fixed-sized inputs while scientific applications generally generate sparse matrices with arbitrary dimensions and a wide range of nonzero patterns (or matrix graph vertex interconnections). While convolutional NNs could possibly address matrix graphs where all vertices have the same number of nearest neighbors, a more general approach is needed for arbitrary sparse matrices, e.g. arising from discretized partial differential equations on unstructured meshes. Graph neural networks (GNNs) are one approach suitable to sparse matrices. GNNs define aggregation functions (e.g., summations) that operate on variable size input data to produce data of a fixed output size so that MLPs can be applied. The goal of this paper is to provide an introduction to GNNs for a numerical linear algebra audience. Concrete examples are provided to illustrate how many common linear algebra tasks can be accomplished using GNNs. We focus on iterative methods that employ computational kernels such as matrix-vector products, interpolation, relaxation methods, and strength-of-connection measures. Our GNN examples include cases where parameters are determined a-priori as well as cases where parameters must be learned. The intent with this article is to help computational scientists understand how GNNs can be used to adapt machine learning concepts to computational tasks associated with sparse matrices. It is hoped that this understanding will stimulate data-driven extensions of classical sparse linear algebra tasks.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Solar Energetic Particle-Associated Coronal Mass Ejections Observed by the Mauna Loa Solar Observatory Mk3 and Mk4 Coronameters
Authors:
I. G. Richardson,
O. C. St Cyr,
J. T. Burkepile,
H. Xie,
B. J. Thompson
Abstract:
We report on the first comprehensive study of the coronal mass ejections (CMEs) associated with $\sim$25 MeV solar energetic proton (SEP) events in 1980-2013 observed in the low/inner corona by the Mauna Loa Solar Observatory (MLSO) Mk3 and Mk4 coronameters. Where possible, these observations are combined with spacebased observations from the Solar Maximum Mission C/P, P78-1 SOLWIND or SOHO/LASCO…
▽ More
We report on the first comprehensive study of the coronal mass ejections (CMEs) associated with $\sim$25 MeV solar energetic proton (SEP) events in 1980-2013 observed in the low/inner corona by the Mauna Loa Solar Observatory (MLSO) Mk3 and Mk4 coronameters. Where possible, these observations are combined with spacebased observations from the Solar Maximum Mission C/P, P78-1 SOLWIND or SOHO/LASCO coronagraphs. The aim of the study is to understand directly-measured (rather than inferred from proxies) CME motions in the low to middle corona and their association with SEP acceleration, and hence attempt to identify early signatures that are characteristic of SEP acceleration in ground-based CME observations that may be used to warn of impending SEP events. Although we find that SEP events are associated with CMEs that are on average faster and wider than typical CMEs observed by MLSO, a major challenge turns out to be determining reliable estimates of the CME dynamics in the low corona from the 3-minute cadence Mk3/4 observations since different analysis techniques can produce inconsistent results. This complicates the assessment of what early information on a possible SEP event is available from these low coronal observations
△ Less
Submitted 24 August, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.
-
A 2-Level Domain Decomposition Preconditioner for KKT Systems with Heat-Equation Constraints
Authors:
Eric C. Cyr
Abstract:
Solving optimization problems with transient PDE-constraints is computationally costly due to the number of nonlinear iterations and the cost of solving large-scale KKT matrices. These matrices scale with the size of the spatial discretization times the number of time steps. We propose a new two level domain decomposition preconditioner to solve these linear systems when constrained by the heat eq…
▽ More
Solving optimization problems with transient PDE-constraints is computationally costly due to the number of nonlinear iterations and the cost of solving large-scale KKT matrices. These matrices scale with the size of the spatial discretization times the number of time steps. We propose a new two level domain decomposition preconditioner to solve these linear systems when constrained by the heat equation. Our approach leverages the observation that the Schur-complement is elliptic in time, and thus amenable to classical domain decomposition methods. Further, the application of the preconditioner uses existing time integration routines to facilitate implementation and maximize software reuse. The performance of the preconditioner is examined in an empirical study demonstrating the approach is scalable with respect to the number of time steps and subdomains.
△ Less
Submitted 7 May, 2023;
originally announced May 2023.
-
A Robust, Performance-Portable Discontinuous Galerkin Method for Relativistic Hydrodynamics
Authors:
Forrest W. Glines,
Kristian R. C. Beckwith,
Joshua R. Braun,
Eric C. Cyr,
Curtis C. Ober,
Matthew Bettencourt,
Keith L. Cartwright,
Sidafa Conde,
Sean T. Miller,
Nicholas Roberds,
Nathan V. Roberts,
Matthew S. Swan,
Roger Pawlowski
Abstract:
In this work, we present a discontinuous-Galerkin method for evolving relativistic hydrodynamics. We include an exploration of analytical and iterative methods to recover the primitive variables from the conserved variables for the ideal equation of state and the Taub-Matthews approximation to the Synge equation of state. We also present a new operator for enforcing a physically permissible conser…
▽ More
In this work, we present a discontinuous-Galerkin method for evolving relativistic hydrodynamics. We include an exploration of analytical and iterative methods to recover the primitive variables from the conserved variables for the ideal equation of state and the Taub-Matthews approximation to the Synge equation of state. We also present a new operator for enforcing a physically permissible conserved state at all basis points within an element while preserving the volume average of the conserved state. We implement this method using the Kokkos performance-portability library to enable running at performance on both CPUs and GPUs. We use this method to explore the relativistic Kelvin- Helmholtz instability compared to a finite volume method. Last, we explore the performance of our implementation on CPUs and GPUs.
△ Less
Submitted 29 April, 2022;
originally announced May 2022.
-
Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences
Authors:
Gordon Euhyun Moon,
Eric C. Cyr
Abstract:
Parallelizing Gated Recurrent Unit (GRU) networks is a challenging task, as the training procedure of GRU is inherently sequential. Prior efforts to parallelize GRU have largely focused on conventional parallelization strategies such as data-parallel and model-parallel training algorithms. However, when the given sequences are very long, existing approaches are still inevitably performance limited…
▽ More
Parallelizing Gated Recurrent Unit (GRU) networks is a challenging task, as the training procedure of GRU is inherently sequential. Prior efforts to parallelize GRU have largely focused on conventional parallelization strategies such as data-parallel and model-parallel training algorithms. However, when the given sequences are very long, existing approaches are still inevitably performance limited in terms of training time. In this paper, we present a novel parallel training scheme (called parallel-in-time) for GRU based on a multigrid reduction in time (MGRIT) solver. MGRIT partitions a sequence into multiple shorter sub-sequences and trains the sub-sequences on different processors in parallel. The key to achieving speedup is a hierarchical correction of the hidden state to accelerate end-to-end communication in both the forward and backward propagation phases of gradient descent. Experimental results on the HMDB51 dataset, where each video is an image sequence, demonstrate that the new parallel training scheme achieves up to 6.5$\times$ speedup over a serial approach. As efficiency of our new parallelization strategy is associated with the sequence length, our parallel GRU algorithm achieves significant performance improvement as the sequence length increases.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
Direct First PSP Observation of the Interaction of Two Successive Interplanetary Coronal Mass Ejections in November 2020
Authors:
Teresa Nieves-Chinchilla,
Nathalia Alzate,
Hebe Cremades,
Laura Rodriguez-Garcia,
Luiz F. G. Dos Santos,
Ayris Narock,
Hong Xie,
Adam Szabo Vratislav Krupar,
Marc Pulupa,
David Lario,
Michael L. Stevens,
Erika Palmerio,
Lynn B. Wilson III,
Katharine K. Reeves Ryun-Young Kwon,
M. Leila Mays,
O. Chris St. Cyr,
Phillip Hess,
Daniel B. Seaton,
Tatiana Niembro,
Stuart D. Bale,
Justin C. Kasper
Abstract:
We investigate the effects of the evolutionary processes in the internal magnetic structure of two interplanetary coronal mass ejections (ICMEs) detected in situ between 2020 November 29 and December 1 by Parker Solar Probe (PSP). The sources of the ICMEs were observed remotely at the Sun in EUV and subsequently tracked to their coronal counterparts in white light. This period is of particular int…
▽ More
We investigate the effects of the evolutionary processes in the internal magnetic structure of two interplanetary coronal mass ejections (ICMEs) detected in situ between 2020 November 29 and December 1 by Parker Solar Probe (PSP). The sources of the ICMEs were observed remotely at the Sun in EUV and subsequently tracked to their coronal counterparts in white light. This period is of particular interest to the community since it has been identified as the first widespread solar energetic particle event of Solar Cycle 25. The distribution of various solar and heliospheric-dedicated spacecraft throughout the inner heliosphere during PSP observations of these large-scale magnetic structures enables a comprehensive analysis of the internal evolution and topology of such structures. By assembling different models and techniques, we identify the signatures of interaction between the two consecutive ICMEs and the implications for their internal structure. We use multispacecraft observations in combination with a remote-sensing forward modeling technique, numerical propagation models, and in-situ reconstruction techniques. The outcome, from the full reconciliations, demonstrates that the two CMEs are interacting in the vicinity of PSP. Thus, we identify the in-situ observations based on the physical processes that are associated with the interaction and collision of both CMEs. We also expand the flux rope modeling and in-situ reconstruction technique to incorporate the aging and expansion effects in a distorted internal magnetic structure and explore the implications of both effects in the magnetic configuration of the ICMEs.
△ Less
Submitted 26 January, 2022;
originally announced January 2022.
-
Reduced Basis Approximations of Parameterized Dynamical Partial Differential Equations via Neural Networks
Authors:
Peter Sentz,
Kristian Beckwith,
Eric C. Cyr,
Luke N. Olson,
Ravi Patel
Abstract:
Projection-based reduced order models are effective at approximating parameter-dependent differential equations that are parametrically separable. When parametric separability is not satisfied, which occurs in both linear and nonlinear problems, projection-based methods fail to adequately reduce the computational complexity. Devising alternative reduced order models is crucial for obtaining effici…
▽ More
Projection-based reduced order models are effective at approximating parameter-dependent differential equations that are parametrically separable. When parametric separability is not satisfied, which occurs in both linear and nonlinear problems, projection-based methods fail to adequately reduce the computational complexity. Devising alternative reduced order models is crucial for obtaining efficient and accurate approximations to expensive high-fidelity models. In this work, we develop a time-step** procedure for dynamical parameter-dependent problems, in which a neural-network is trained to propagate the coefficients of a reduced basis expansion. This results in an online stage with a computational cost independent of the size of the underlying problem. We demonstrate our method on several parabolic partial differential equations, including a problem that is not parametrically separable.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
A Monolithic Algebraic Multigrid Framework for Multiphysics Applications with Examples from Resistive MHD
Authors:
Peter Ohm,
Tobias Wiesner,
Eric C. Cyr,
Jonathan J. Hu,
John N. Shadid,
Raymond S. Tuminaro
Abstract:
A multigrid framework is described for multiphysics applications. The framework allows one to construct, adapt, and tailor a monolithic multigrid methodology to different linear systems coming from discretized partial differential equations. The main idea centers on develo** multigrid components in a blocked fashion where each block corresponds to separate sets of physical unknowns and equations…
▽ More
A multigrid framework is described for multiphysics applications. The framework allows one to construct, adapt, and tailor a monolithic multigrid methodology to different linear systems coming from discretized partial differential equations. The main idea centers on develo** multigrid components in a blocked fashion where each block corresponds to separate sets of physical unknowns and equations within the larger discretization matrix. Once defined, these components are ultimately assembled into a monolithic multigrid solver for the entire system. We demonstrate the potential of the framework by applying it to representative linear solution sub-problems arising from resistive MHD.
△ Less
Submitted 22 March, 2021; v1 submitted 12 March, 2021;
originally announced March 2021.
-
Partition of unity networks: deep hp-approximation
Authors:
Kook** Lee,
Nathaniel A. Trask,
Ravi G. Patel,
Mamikon A. Gulian,
Eric C. Cyr
Abstract:
Approximation theorists have established best-in-class optimal approximation rates of deep neural networks by utilizing their ability to simultaneously emulate partitions of unity and monomials. Motivated by this, we propose partition of unity networks (POUnets) which incorporate these elements directly into the architecture. Classification architectures of the type used to learn probability measu…
▽ More
Approximation theorists have established best-in-class optimal approximation rates of deep neural networks by utilizing their ability to simultaneously emulate partitions of unity and monomials. Motivated by this, we propose partition of unity networks (POUnets) which incorporate these elements directly into the architecture. Classification architectures of the type used to learn probability measures are used to build a meshfree partition of space, while polynomial spaces with learnable coefficients are associated to each partition. The resulting hp-element-like approximation allows use of a fast least-squares optimizer, and the resulting architecture size need not scale exponentially with spatial dimension, breaking the curse of dimensionality. An abstract approximation result establishes desirable properties to guide network design. Numerical results for two choices of architecture demonstrate that POUnets yield hp-convergence for smooth functions and consistently outperform MLPs for piecewise polynomial functions with large numbers of discontinuities.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
Thermodynamically consistent physics-informed neural networks for hyperbolic systems
Authors:
Ravi G. Patel,
Indu Manickam,
Nathaniel A. Trask,
Mitchell A. Wood,
Myoungkyu Lee,
Ignacio Tomas,
Eric C. Cyr
Abstract:
Physics-informed neural network architectures have emerged as a powerful tool for develo** flexible PDE solvers which easily assimilate data, but face challenges related to the PDE discretization underpinning them. By instead adapting a least squares space-time control volume scheme, we circumvent issues particularly related to imposition of boundary conditions and conservation while reducing so…
▽ More
Physics-informed neural network architectures have emerged as a powerful tool for develo** flexible PDE solvers which easily assimilate data, but face challenges related to the PDE discretization underpinning them. By instead adapting a least squares space-time control volume scheme, we circumvent issues particularly related to imposition of boundary conditions and conservation while reducing solution regularity requirements. Additionally, connections to classical finite volume methods allows application of biases toward entropy solutions and total variation diminishing properties. For inverse problems, we may impose further thermodynamic biases, allowing us to fit shock hydrodynamics models to molecular simulation of rarefied gases and metals. The resulting data-driven equations of state may be incorporated into traditional shock hydrodynamics codes.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
A physics-informed operator regression framework for extracting data-driven continuum models
Authors:
Ravi G. Patel,
Nathaniel A. Trask,
Mitchell A. Wood,
Eric C. Cyr
Abstract:
The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a…
▽ More
The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a characterization of differential operators while providing structure which may be used to impose biases related to symmetry, isotropy, and conservation form. We demonstrate the effectiveness of our framework for a variety of physics, including local and nonlocal diffusion processes and single and multiphase flows. For the flow physics we demonstrate this approach leads to a learned operator that generalizes to system characteristics not included in the training sets, such as variable particle sizes, densities, and concentration.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
The Solar Orbiter Science Activity Plan: translating solar and heliospheric physics questions into action
Authors:
I. Zouganelis,
A. De Groof,
A. P. Walsh,
D. R. Williams,
D. Mueller,
O. C. St Cyr,
F. Auchere,
D. Berghmans,
A. Fludra,
T. S. Horbury,
R. A. Howard,
S. Krucker,
M. Maksimovic,
C. J. Owen,
J. Rodriiguez-Pacheco,
M. Romoli,
S. K. Solanki,
C. Watson,
L. Sanchez,
J. Lefort,
P. Osuna,
H. R. Gilbert,
T. Nieves-Chinchilla,
L. Abbo,
O. Alexandrova
, et al. (160 additional authors not shown)
Abstract:
Solar Orbiter is the first space mission observing the solar plasma both in situ and remotely, from a close distance, in and out of the ecliptic. The ultimate goal is to understand how the Sun produces and controls the heliosphere, filling the Solar System and driving the planetary environments. With six remote-sensing and four in-situ instrument suites, the coordination and planning of the operat…
▽ More
Solar Orbiter is the first space mission observing the solar plasma both in situ and remotely, from a close distance, in and out of the ecliptic. The ultimate goal is to understand how the Sun produces and controls the heliosphere, filling the Solar System and driving the planetary environments. With six remote-sensing and four in-situ instrument suites, the coordination and planning of the operations are essential to address the following four top-level science questions: (1) What drives the solar wind and where does the coronal magnetic field originate? (2) How do solar transients drive heliospheric variability? (3) How do solar eruptions produce energetic particle radiation that fills the heliosphere? (4) How does the solar dynamo work and drive connections between the Sun and the heliosphere? Maximising the mission's science return requires considering the characteristics of each orbit, including the relative position of the spacecraft to Earth (affecting downlink rates), trajectory events (such as gravitational assist manoeuvres), and the phase of the solar activity cycle. Furthermore, since each orbit's science telemetry will be downloaded over the course of the following orbit, science operations must be planned at mission level, rather than at the level of individual orbits. It is important to explore the way in which those science questions are translated into an actual plan of observations that fits into the mission, thus ensuring that no opportunities are missed. First, the overarching goals are broken down into specific, answerable questions along with the required observations and the so-called Science Activity Plan (SAP) is developed to achieve this. The SAP groups objectives that require similar observations into Solar Orbiter Observing Plans (SOOPs), resulting in a strategic, top-level view of the optimal opportunities for science observations during the mission lifetime.
△ Less
Submitted 22 September, 2020;
originally announced September 2020.
-
The Solar Orbiter mission -- Science overview
Authors:
D. Müller,
O. C. St. Cyr,
I. Zouganelis,
H. R. Gilbert,
R. Marsden,
T. Nieves-Chinchilla,
E. Antonucci,
F. Auchère,
D. Berghmans,
T. Horbury,
R. A. Howard,
S. Krucker,
M. Maksimovic,
C. J. Owen,
P. Rochus,
J. Rodriguez-Pacheco,
M. Romoli,
S. K. Solanki,
R. Bruno,
M. Carlsson,
A. Fludra,
L. Harra,
D. M. Hassler,
S. Livi,
P. Louarn
, et al. (10 additional authors not shown)
Abstract:
Solar Orbiter, the first mission of ESA's Cosmic Vision 2015-2025 programme and a mission of international collaboration between ESA and NASA, will explore the Sun and heliosphere from close up and out of the ecliptic plane. It was launched on 10 February 2020 04:03 UTC from Cape Canaveral and aims to address key questions of solar and heliospheric physics pertaining to how the Sun creates and con…
▽ More
Solar Orbiter, the first mission of ESA's Cosmic Vision 2015-2025 programme and a mission of international collaboration between ESA and NASA, will explore the Sun and heliosphere from close up and out of the ecliptic plane. It was launched on 10 February 2020 04:03 UTC from Cape Canaveral and aims to address key questions of solar and heliospheric physics pertaining to how the Sun creates and controls the Heliosphere, and why solar activity changes with time. To answer these, the mission carries six remote-sensing instruments to observe the Sun and the solar corona, and four in-situ instruments to measure the solar wind, energetic particles, and electromagnetic fields. In this paper, we describe the science objectives of the mission, and how these will be addressed by the joint observations of the instruments onboard. The paper first summarises the mission-level science objectives, followed by an overview of the spacecraft and payload. We report the observables and performance figures of each instrument, as well as the trajectory design. This is followed by a summary of the science operations concept. The paper concludes with a more detailed description of the science objectives. Solar Orbiter will combine in-situ measurements in the heliosphere with high-resolution remote-sensing observations of the Sun to address fundamental questions of solar and heliospheric physics. The performance of the Solar Orbiter payload meets the requirements derived from the mission's science objectives. Its science return will be augmented further by coordinated observations with other space missions and ground-based observatories.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
The Coronal Mass Ejection Visibility Function of Modern Coronagraphs
Authors:
Angelos Vourlidas,
L. A. Balmaceda,
H. Xie,
O. C. St. Cyr
Abstract:
We analyze the detection capability of Coronal Mass Ejections (CMEs) for all currently operating coronagraphs in space. We define as CMEs events that propagate beyond 10 solar radii with morphologies broadly consistent with a magnetic flux rope presence. We take advantage of multi-viewpoint observations over five month-long intervals, corresponding to special orbital configurations of the coronagr…
▽ More
We analyze the detection capability of Coronal Mass Ejections (CMEs) for all currently operating coronagraphs in space. We define as CMEs events that propagate beyond 10 solar radii with morphologies broadly consistent with a magnetic flux rope presence. We take advantage of multi-viewpoint observations over five month-long intervals, corresponding to special orbital configurations of the coronagraphs aboard the STEREO and SOHO missions. This allows us to sort out CMEs from other outward-propagating features (e.g. waves or outflows), and thus to identify the total number of unique CMEs ejected during those periods. We determine the CME visibility functions of the STEREO COR2-A/B and LASCO C2/C3 coronagraphs directly as the ratio of observed to unique CMEs. The visibility functions range from 0.71 to 0.92 for a 95% confidence interval. By comparing detections between coronagraphs on the same spacecraft and from multiple spacecraft, we assess the influence of field of view, instrument performance, and projection effects on the CME detection ability without resorting to proxies, such as flares or radio bursts. We find that no major CMEs are missed by any of the coronagraphs, that a few slow halo-like events may be missed in synoptic cadence movies and, that narrow field of view coronagraphs have difficulties discriminating between CMEs and other ejections leading to false detection rates. We conclude that CME detection can only be validated with multi-viewpoint imaging-- two coronagraphs in quadrature offer adequate detection capability. Finally, we apply the visibility functions to observed CME rates resulting in upward corrections of 40%.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.
-
Variational Filtering with Copula Models for SLAM
Authors:
John D. Martin,
Kevin Doherty,
Caralyn Cyr,
Brendan Englot,
John Leonard
Abstract:
The ability to infer map variables and estimate pose is crucial to the operation of autonomous mobile robots. In most cases the shared dependency between these variables is modeled through a multivariate Gaussian distribution, but there are many situations where that assumption is unrealistic. Our paper shows how it is possible to relax this assumption and perform simultaneous localization and map…
▽ More
The ability to infer map variables and estimate pose is crucial to the operation of autonomous mobile robots. In most cases the shared dependency between these variables is modeled through a multivariate Gaussian distribution, but there are many situations where that assumption is unrealistic. Our paper shows how it is possible to relax this assumption and perform simultaneous localization and map** (SLAM) with a larger class of distributions, whose multivariate dependency is represented with a copula model. We integrate the distribution model with copulas into a Sequential Monte Carlo estimator and show how unknown model parameters can be learned through gradient-based optimization. We demonstrate our approach is effective in settings where Gaussian assumptions are clearly violated, such as environments with uncertain data association and nonlinear transition models.
△ Less
Submitted 2 August, 2020;
originally announced August 2020.
-
Monolithic Multigrid for Magnetohydrodynamics
Authors:
J. H. Adler,
T. Benson,
E. C. Cyr,
P. E. Farrell,
S. MacLachlan,
R. Tuminaro
Abstract:
The magnetohydrodynamics (MHD) equations model a wide range of plasma physics applications and are characterized by a nonlinear system of partial differential equations that strongly couples a charged fluid with the evolution of electromagnetic fields. After discretization and linearization, the resulting system of equations is generally difficult to solve due to the coupling between variables, an…
▽ More
The magnetohydrodynamics (MHD) equations model a wide range of plasma physics applications and are characterized by a nonlinear system of partial differential equations that strongly couples a charged fluid with the evolution of electromagnetic fields. After discretization and linearization, the resulting system of equations is generally difficult to solve due to the coupling between variables, and the heterogeneous coefficients induced by the linearization process. In this paper, we investigate multigrid preconditioners for this system based on specialized relaxation schemes that properly address the system structure and coupling. Three extensions of Vanka relaxation are proposed and applied to problems with up to 170 million degrees of freedom and fluid and magnetic Reynolds numbers up to 400 for stationary problems and up to 20,000 for time-dependent problems.
△ Less
Submitted 28 June, 2020;
originally announced June 2020.
-
A block coordinate descent optimizer for classification problems exploiting convexity
Authors:
Ravi G. Patel,
Nathaniel A. Trask,
Mamikon A. Gulian,
Eric C. Cyr
Abstract:
Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid N…
▽ More
Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid Newton/Gradient Descent (NGD) method is consistent with the interpretation of hidden layers as providing an adaptive basis and the linear layer as providing an optimal fit of the basis to data. By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to train the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training. The size of the Hessian in the second-order step scales only with the number weights in the linear layer and not the depth and width of the hidden layers; furthermore, the approach is applicable to arbitrary hidden layer architecture. Previous work applying this adaptive basis perspective to regression problems demonstrated significant improvements in accuracy at reduced training cost, and this work can be viewed as an extension of this approach to classification problems. We first prove that the resulting Hessian matrix is symmetric semi-definite, and that the Newton step realizes a global minimizer. By studying classification of manufactured two-dimensional point cloud data, we demonstrate both an improvement in validation error and a striking qualitative difference in the basis functions encoded in the hidden layer when trained using NGD. Application to image classification benchmarks for both dense and convolutional architectures reveals improved training accuracy, suggesting possible gains of second-order methods over gradient descent.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
Multilevel Initialization for Layer-Parallel Deep Neural Network Training
Authors:
Eric C. Cyr,
Stefanie Günther,
Jacob B. Schroder
Abstract:
This paper investigates multilevel initialization strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on the continuous interpretation of the training problem as a problem of optimal control, in which neural networks are represented as discretizations of time-dependent ordinary differential equations. A key goal is to develop a method able…
▽ More
This paper investigates multilevel initialization strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on the continuous interpretation of the training problem as a problem of optimal control, in which neural networks are represented as discretizations of time-dependent ordinary differential equations. A key goal is to develop a method able to intelligently initialize the network parameters for the very deep networks enabled by scalable layer-parallel training. To do this, we apply a refinement strategy across the time domain, that is equivalent to refining in the layer dimension. The resulting refinements create deep networks, with good initializations for the network parameters coming from the coarser trained networks. We investigate the effectiveness of such multilevel "nested iteration" strategies for network training, showing supporting numerical evidence of reduced run time for equivalent accuracy. In addition, we study whether the initialization strategies provide a regularizing effect on the overall training process and reduce sensitivity to hyperparameters and randomness in initial network parameters.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
Authors:
Eric C. Cyr,
Mamikon A. Gulian,
Ravi G. Patel,
Mauro Perego,
Nathaniel A. Trask
Abstract:
Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dram…
▽ More
Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dramatic increases in accuracy and convergence rate for benchmarks characterizing scientific applications where DNNs are currently used, including regression problems and physics-informed neural networks for the solution of partial differential equations.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Layer-Parallel Training of Deep Residual Neural Networks
Authors:
S. Günther,
L. Ruthotto,
J. B. Schroder,
E. C. Cyr,
N. R. Gauger
Abstract:
Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be interpreted as forward Euler discretizations of a nonlinear initial value problem whose time-dependent control variables represent the weights of the neural netw…
▽ More
Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be interpreted as forward Euler discretizations of a nonlinear initial value problem whose time-dependent control variables represent the weights of the neural network. Hence, training a ResNet can be cast as an optimal control problem of the associated dynamical system. For similar time-dependent optimal control problems arising in engineering applications, parallel-in-time methods have shown notable improvements in scalability. This paper demonstrates the use of those techniques for efficient and effective training of ResNets. The proposed algorithms replace the classical (sequential) forward and backward propagation through the network layers by a parallel nonlinear multigrid iteration applied to the layer domain. This adds a new dimension of parallelism across layers that is attractive when training very deep networks. From this basic idea, we derive multiple layer-parallel methods. The most efficient version employs a simultaneous optimization approach where updates to the network parameters are based on inexact gradient information in order to speed up the training process. Using numerical examples from supervised classification, we demonstrate that the new approach achieves similar training performance to traditional methods, but enables layer-parallelism and thus provides speedup over layer-serial methods through greater concurrency.
△ Less
Submitted 25 July, 2019; v1 submitted 11 December, 2018;
originally announced December 2018.
-
Variation of Coronal Activity from the Minimum to Maximum of Solar Cycle 24 using Three Dimensional Coronal Electron Density Reconstructions from STEREO/COR1
Authors:
Tongjiang Wang,
Nelson L. Reginald,
Joseph M. Davila,
O. Chris St. Cyr,
William T. Thompson
Abstract:
Three dimensional electron density distributions in the solar corona are reconstructed for 100 Carrington Rotations (CR 2054$-$2153) during 2007/03$-$2014/08 using the spherically symmetric method from polarized white-light observations with the STEREO/COR1. These three-dimensional electron density distributions are validated by comparison with similar density models derived using other methods su…
▽ More
Three dimensional electron density distributions in the solar corona are reconstructed for 100 Carrington Rotations (CR 2054$-$2153) during 2007/03$-$2014/08 using the spherically symmetric method from polarized white-light observations with the STEREO/COR1. These three-dimensional electron density distributions are validated by comparison with similar density models derived using other methods such as tomography and a MHD model as well as using data from SOHO/LASCO-C2. Uncertainties in the estimated total mass of the global corona are analyzed based on differences between the density distributions for COR1-A and -B. Long-term variations of coronal activity in terms of the global and hemispheric average electron densities (equivalent to the total coronal mass) reveal a hemispheric asymmetry during the rising phase of Solar Cycle 24, with the northern hemisphere leading the southern hemisphere by a phase shift of 7$-$9 months. Using 14-CR (~13-month) running averages, the amplitudes of the variation in average electron density between Cycle 24 maximum and Cycle 23/24 minimum (called the modulation factors) are found to be in the range of 1.6$-$4.3. These modulation factors are latitudinally dependent, being largest in polar regions and smallest in the equatorial region. These modulation factors also show a hemispheric asymmetry, being somewhat larger in the southern hemisphere. The wavelet analysis shows that the short-term quasi-periodic oscillations during the rising and maximum phases of Cycle 24 have a dominant period of 7$-$8 months. In addition, it is found that the radial distribution of mean electron density for streamers at Cycle 24 maximum is only slightly larger (by ~30%) than at cycle minimum.
△ Less
Submitted 15 June, 2017;
originally announced June 2017.
-
Low-frequency type II radio detections and coronagraph data to describe and forecast the propagation of 71 CMEs/shocks
Authors:
H. Cremades,
F. A. Iglesias,
O. C. St. Cyr,
H. Xie,
M. L. Kaiser,
N. Gopalswamy
Abstract:
The vulnerability of technology on which present society relies demands that a solar event, its time of arrival at Earth, and its degree of geoeffectiveness be promptly forecasted. Motivated by improving predictions of arrival times at Earth of shocks driven by coronal mass ejections (CMEs), we have analyzed 71 Earth-directed events in different stages of their propagation. The study is primarily…
▽ More
The vulnerability of technology on which present society relies demands that a solar event, its time of arrival at Earth, and its degree of geoeffectiveness be promptly forecasted. Motivated by improving predictions of arrival times at Earth of shocks driven by coronal mass ejections (CMEs), we have analyzed 71 Earth-directed events in different stages of their propagation. The study is primarily based on approximated locations of interplanetary (IP) shocks derived from type II radio emissions detected by the Wind/WAVES experiment during 1997-2007. Distance-time diagrams resulting from the combination of white-light corona, IP type II radio, and in situ data lead to the formulation of descriptive profiles of each CME's journey toward Earth. Furthermore, two different methods to track and predict the location of CME-driven IP shocks are presented. The linear method, solely based on Wind/WAVES data, arises after key modifications to a pre-existing technique that linearly projects the drifting low-frequency type II emissions to 1 AU. This upgraded method improves forecasts of shock arrival time by almost 50%. The second predictive method is proposed on the basis of information derived from the descriptive profiles, and relies on a single CME height-time point and on low-frequency type II radio emissions to obtain an approximate value of the shock arrival time at Earth. In addition, we discuss results on CME-radio emission associations, characteristics of IP propagation, and the relative success of the forecasting methods.
△ Less
Submitted 7 May, 2015;
originally announced May 2015.
-
A new class of finite element variational multiscale turbulence models for incompressible magnetohydrodynamics
Authors:
David Sondak,
John N. Shadid,
Assad A. Oberai,
Roger P. Pawlowski,
Eric C. Cyr,
Tom M. Smith
Abstract:
New large eddy simulation (LES) turbulence models for incompressible magnetohydrodynamics (MHD) derived from the variational multiscale (VMS) formulation for finite element simulations are introduced. The new models include the variational multiscale formulation, a residual-based eddy viscosity model, and a mixed model that combines both of these component models. Each model contains terms that ar…
▽ More
New large eddy simulation (LES) turbulence models for incompressible magnetohydrodynamics (MHD) derived from the variational multiscale (VMS) formulation for finite element simulations are introduced. The new models include the variational multiscale formulation, a residual-based eddy viscosity model, and a mixed model that combines both of these component models. Each model contains terms that are proportional to the residual of the incompressible MHD equations and is therefore numerically consistent. Moreover, each model is also dynamic, in that its effect vanishes when this residual is small. The new models are tested on the decaying MHD Taylor Green vortex at low and high Reynolds numbers. The evaluation of the models is based on comparisons with available data from direct numerical simulations (DNS) of the time evolution of energies as well as energy spectra at various discrete times. A numerical study, on a sequence of meshes, is presented that demonstrates that the large eddy simulation approaches the DNS solution for these quantities with spatial mesh refinement.
△ Less
Submitted 2 December, 2014;
originally announced December 2014.
-
On involutions and generalized symmetric spaces of dicyclic groups
Authors:
Abigail Bishop,
Christopher Cyr,
John Hutchens,
Clover May,
Nathaniel Schwartz,
Bethany Turner
Abstract:
Let $G=\Dc_{n}$ be the dicyclic group of order $4n$. Let $\varphi$ be an automorphism of $G$ of order $k$. We describe $\varphi$ and the generalized symmetric space $Q$ of $G$ associated with $\varphi$. When $\varphi$ is an involution, we describe its fixed point group $H=G^{\varphi}$ along with the $H$-orbits and $G$-orbits of $Q$ corresponding to the action of $\varphi$-twisted conjugation.
Let $G=\Dc_{n}$ be the dicyclic group of order $4n$. Let $\varphi$ be an automorphism of $G$ of order $k$. We describe $\varphi$ and the generalized symmetric space $Q$ of $G$ associated with $\varphi$. When $\varphi$ is an involution, we describe its fixed point group $H=G^{\varphi}$ along with the $H$-orbits and $G$-orbits of $Q$ corresponding to the action of $\varphi$-twisted conjugation.
△ Less
Submitted 30 September, 2013;
originally announced October 2013.
-
Near-Sun Flux Rope Structure of CMEs
Authors:
H. Xie,
N. Gopalswamy,
O. C. St. Cyr
Abstract:
We have used the Krall flux-rope model (Krall and St. Cyr, Astrophys. J. 2006, 657, 1740) (KFR) to fit 23 magnetic cloud (MC)-CMEs and 30 non-cloud ejecta (EJ)-CMEs in the Living With a Star (LWS) Coordinated Data Analysis Workshop (CDAW) 2011 list. The KFR-fit results shows that the CMEs associated with MCs (EJs) have been deflected closer to (away from) the solar disk center (DC), likely by both…
▽ More
We have used the Krall flux-rope model (Krall and St. Cyr, Astrophys. J. 2006, 657, 1740) (KFR) to fit 23 magnetic cloud (MC)-CMEs and 30 non-cloud ejecta (EJ)-CMEs in the Living With a Star (LWS) Coordinated Data Analysis Workshop (CDAW) 2011 list. The KFR-fit results shows that the CMEs associated with MCs (EJs) have been deflected closer to (away from) the solar disk center (DC), likely by both the intrinsic magnetic structures inside an active region (AR) and ambient magnetic structures (e.g. nearby ARs, coronal holes, and streamers, etc.). The mean propagation latitudes and longitudes of the EJ-CMEs (18, 11) were larger than those of the MC-CMEs (11, 6) by 7 and 5, respectively. Furthermore, the KFR-fit widths showed that the MC- CMEs are wider than the EJ-CMEs. The mean fitting face-on width and edge-on width of the MC-CMEs (EJ-CMEs) were 87 (85) and 70 (63), respectively. The deflection away from DC and narrower angular widths of the EJ-CMEs have caused the observing spacecraft to pass over only their flanks and miss the central flux-rope structures. The results of this work support the idea that all CMEs have a flux-rope structure.
△ Less
Submitted 6 December, 2012;
originally announced December 2012.
-
The distribution of interplanetary dust between 0.96 and 1.04 AU as inferred from impacts on the STEREO spacecraft observed by the Heliospheric Imagers
Authors:
C. J. Davis,
J. A. Davies,
O. C St Cyr,
M. Campbell-Brown,
A. Skelt,
M. Kaiser,
Nicole Meyer-Vernet,
S. Crothers,
C. Lintott,
A. Smith,
S. Bamford,
E. M. L. Baeten
Abstract:
The distribution of dust in the ecliptic plane between 0.96 and 1.04 AU has been inferred from impacts on the two STEREO spacecraft through observation of secondary particle trails and unexpected off-points in the Heliospheric Imager (HI) cameras. This study made use of analysis carried out by members of a distributed web-based project, Solar Stormwatch. A comparison between observations of the br…
▽ More
The distribution of dust in the ecliptic plane between 0.96 and 1.04 AU has been inferred from impacts on the two STEREO spacecraft through observation of secondary particle trails and unexpected off-points in the Heliospheric Imager (HI) cameras. This study made use of analysis carried out by members of a distributed web-based project, Solar Stormwatch. A comparison between observations of the brightest particle trails and a survey of fainter trails shows consistent distributions. While there is no obvious correlation between this distribution and the occurrence of individual meteor streams at Earth, there are some broad longitudinal features in these distributions that are also observed in sources of the sporadic meteor population. The asymmetry in the number of trails seen by each spacecraft and the fact that there are many more unexpected off-points in the HI-B than in HI-A, indicates that the majority of impacts are coming from the apex direction. For impacts causing off-points in the HI-B camera these dust particles are estimated to have masses in excess of 10-17 kg with radii exceeding 0.1 μm. For off-points observed in the HI-A images, which can only have been caused by particles travelling from the anti-apex direction, the distribution is consistent with that of secondary 'storm' trails observed by HI-B, providing evidence that these trails also result from impacts with primary particles from an anti-apex source. It is apparent that the differential mass index of particles from the apex direction is consistently above 2. This indicates that the majority of the mass is within the smaller particles of this population. In contrast, the differential mass index of particles from the anti-apex direction (causing off-points in HI-A) is consistently below 2, indicating that the majority of the mass is to be found in larger particles of this distribution.
△ Less
Submitted 18 November, 2011;
originally announced November 2011.
-
Dust detection by the wave instrument on STEREO: nanoparticles picked up by the solar wind?
Authors:
N. Meyer-Vernet,
M. Maksimovic,
A. Czechowski,
I. Mann,
I. Zouganelis,
K. Goetz,
M. L. Kaiser,
O. C. St. Cyr,
J. L. Bougeret,
S. D. Bale
Abstract:
The STEREO/WAVES instrument has detected a very large number of intense voltage pulses. We suggest that these events are produced by impact ionisation of nanoparticles striking the spacecraft at a velocity of the order of magnitude of the solar wind speed. Nanoparticles, which are half-way between micron-sized dust and atomic ions, have such a large charge-to-mass ratio that the electric field i…
▽ More
The STEREO/WAVES instrument has detected a very large number of intense voltage pulses. We suggest that these events are produced by impact ionisation of nanoparticles striking the spacecraft at a velocity of the order of magnitude of the solar wind speed. Nanoparticles, which are half-way between micron-sized dust and atomic ions, have such a large charge-to-mass ratio that the electric field induced by the solar wind magnetic field accelerates them very efficiently. Since the voltage produced by dust impacts increases very fast with speed, such nanoparticles produce signals as high as do much larger grains of smaller speeds. The flux of 10-nm radius grains inferred in this way is compatible with the interplanetary dust flux model. The present results may represent the first detection of fast nanoparticles in interplanetary space near Earth orbit.
△ Less
Submitted 4 April, 2009; v1 submitted 24 March, 2009;
originally announced March 2009.