-
Emergent super-antiferromagnetic correlations in monolayers of Fe3O4 nanoparticles throughout the superparamagnetic blocking transition
Authors:
Johnathon Rackham,
Brittni Pratt,
Dalton Griner,
Dallin Smith,
Yan** Cai,
Roger G. Harrison,
Alex Reid,
Jeffrey Kortright,
Mark K. Transtrum,
Karine Chesnel
Abstract:
We report nanoscale inter-particle magnetic orderings in self-assemblies of Fe3O4 nanoparticles (NPs), and the emergence of inter-particle antiferromagnetic (AF) (super-antiferromagnetic) correlations near the coercive field at low temperature. The magnetic ordering is probed via x-ray resonant magnetic scattering (XRMS), with the x-ray energy tuned to the Fe-L3 edge and using circular polarized l…
▽ More
We report nanoscale inter-particle magnetic orderings in self-assemblies of Fe3O4 nanoparticles (NPs), and the emergence of inter-particle antiferromagnetic (AF) (super-antiferromagnetic) correlations near the coercive field at low temperature. The magnetic ordering is probed via x-ray resonant magnetic scattering (XRMS), with the x-ray energy tuned to the Fe-L3 edge and using circular polarized light. By exploiting dichroic effects, a magnetic scattering signal is isolated from the charge scattering signal. The magnetic signal informs about nanoscale spatial orderings at various stages throughout the magnetization process and at various temperatures throughout the superparamagnetic blocking transition, for two different sizes of NPs, 5 and 11 nm, with blocking temperatures TB of 28 K and 170 K, respectively. At 300 K, while the magnetometry data essentially shows superparamagnetism and absence of hysteresis for both particle sizes, the XRMS data reveals the presence of non-zero (up to 9/100) inter-particle AF couplings when the applied field is released to zero for the 11 nm NPs. These AF couplings are drastically amplified when the NPs are cooled down below TB and reach up to 12/100 for the 5 nm NPs and 48/100 for the 11 nm NPs, near the coercive point. The data suggests that the particle size affects the prevalence of the AF couplings: compared to ferromagnetic (F) couplings, the relative prevalence of AF couplings at the coercive point increases from a factor ~ 1.6 to 3.8 when the NP size increases from 5 to 11 nm.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold
Authors:
Jialin Mao,
Itay Griniasty,
Han Kheng Teoh,
Rahul Ramesh,
Rubing Yang,
Mark K. Transtrum,
James P. Sethna,
Pratik Chaudhari
Abstract:
We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization technique…
▽ More
We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
△ Less
Submitted 19 March, 2024; v1 submitted 2 May, 2023;
originally announced May 2023.
-
ZrNb(CO) RF superconducting thin film with high critical temperature in the theoretical limit
Authors:
Zeming Sun,
Thomas Oseroff,
Zhaslan Baraissov,
Darrah K. Dare,
Katrina Howard,
Benjamin Francis,
A**kya C. Hire,
Nathan Sitaraman,
Tomas A. Arias,
Mark K. Transtrum,
Richard Hennig,
Michael O. Thompson,
David A. Muller,
Matthias U. Liepe
Abstract:
Superconducting radio-frequency (SRF) resonators are critical components for particle accelerator applications, such as free-electron lasers, and for emerging technologies in quantum computing. Develo** advanced materials and their deposition processes to produce RF superconductors that yield nanoohms surface resistances is a key metric for the wider adoption of SRF technology. Here we report Zr…
▽ More
Superconducting radio-frequency (SRF) resonators are critical components for particle accelerator applications, such as free-electron lasers, and for emerging technologies in quantum computing. Develo** advanced materials and their deposition processes to produce RF superconductors that yield nanoohms surface resistances is a key metric for the wider adoption of SRF technology. Here we report ZrNb(CO) RF superconducting films with high critical temperatures (Tc) achieved for the first time under ambient pressure. The attainment of a Tc near the theoretical limit for this material without applied pressure is promising for its use in practical applications. A range of Tc, likely arising from Zr do** variation, may allow a tunable superconducting coherence length that lowers the sensitivity to material defects when an ultra-low surface resistance is required. Our ZrNb(CO) films are synthesized using a low-temperature (100 - 200 C) electrochemical recipe combined with thermal annealing. The phase transformation as a function of annealing temperature and time is optimized by the evaporated Zr-Nb diffusion couples. Through phase control, we avoid hexagonal Zr phases that are equilibrium-stable but degrade Tc. X-ray and electron diffraction combined with photoelectron spectroscopy reveal a system containing cubic ZrNb mixed with rocksalt NbC and low-dielectric-loss ZrO2. We demonstrate proof-of-concept RF performance of ZrNb(CO) on an SRF sample test system. BCS resistance trends lower than reference Nb, while quench fields occur at approximately 35 mT. Our results demonstrate the potential of ZrNb(CO) thin films for particle accelerator and other SRF applications.
△ Less
Submitted 12 June, 2023; v1 submitted 28 February, 2023;
originally announced February 2023.
-
A picture of the space of typical learnable tasks
Authors:
Rahul Ramesh,
Jialin Mao,
Itay Griniasty,
Rubing Yang,
Han Kheng Teoh,
Mark Transtrum,
James P. Sethna,
Pratik Chaudhari
Abstract:
We develop information geometric techniques to understand the representations learned by deep networks when they are trained on different tasks using supervised, meta-, semi-supervised and contrastive learning. We shed light on the following phenomena that relate to the structure of the space of tasks: (1) the manifold of probabilistic models trained on different tasks using different representati…
▽ More
We develop information geometric techniques to understand the representations learned by deep networks when they are trained on different tasks using supervised, meta-, semi-supervised and contrastive learning. We shed light on the following phenomena that relate to the structure of the space of tasks: (1) the manifold of probabilistic models trained on different tasks using different representation learning methods is effectively low-dimensional; (2) supervised learning on one task results in a surprising amount of progress even on seemingly dissimilar tasks; progress on other tasks is larger if the training task has diverse classes; (3) the structure of the space of tasks indicated by our analysis is consistent with parts of the Wordnet phylogenetic tree; (4) episodic meta-learning algorithms and supervised learning traverse different trajectories during training but they fit similar models eventually; (5) contrastive and semi-supervised learning methods traverse trajectories similar to those of supervised learning. We use classification tasks constructed from the CIFAR-10 and Imagenet datasets to study these phenomena.
△ Less
Submitted 21 July, 2023; v1 submitted 30 October, 2022;
originally announced October 2022.
-
Theory of Nb-Zr Alloy Superconductivity and First Experimental Demonstration for Superconducting Radio-Frequency Cavity Applications
Authors:
Nathan S. Sitaraman,
Zeming Sun,
Ben Francis,
A**kya C. Hire,
Thomas Oseroff,
Zhaslan Baraissov,
Tomás A. Arias,
Richard Hennig,
Matthias U. Liepe,
David A. Muller,
Mark K. Transtrum
Abstract:
Niobium-zirconium (Nb-Zr) alloy is an old superconductor that is a promising new candidate for superconducting radio-frequency (SRF) cavity applications. Using density-functional and Eliashberg theories, we show that addition of Zr to a Nb surface in small concentrations increases the critical temperature $T_c$ and improves other superconducting properties. Furthermore, we calculate $T_c$ for Nb-Z…
▽ More
Niobium-zirconium (Nb-Zr) alloy is an old superconductor that is a promising new candidate for superconducting radio-frequency (SRF) cavity applications. Using density-functional and Eliashberg theories, we show that addition of Zr to a Nb surface in small concentrations increases the critical temperature $T_c$ and improves other superconducting properties. Furthermore, we calculate $T_c$ for Nb-Zr alloys across a broad range of Zr concentrations, showing good agreement with the literature for disordered alloys as well as the potential for significantly higher $T_c$ in ordered alloys near 75%Nb/25%Zr composition. We provide experimental verification on Nb-Zr alloy samples and SRF sample test cavities prepared with either physical vapor or our novel electrochemical deposition recipes. These samples have the highest measured $T_c$ of any Nb-Zr superconductor to date and indicate a reduction in BCS resistance compared to the conventional Nb reference sample; they represent the first steps along a new pathway to greatly enhanced SRF performance. Finally, we use Ginzburg-Landau theory to show that the addition of Zr to a Nb surface increases the superheating field $B_{sh}$, a key figure of merit for SRF which determines the maximum accelerating gradient at which cavities can operate.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Extending OpenKIM with an Uncertainty Quantification Toolkit for Molecular Modeling
Authors:
Yonatan Kurniawan,
Cody L. Petrie,
Mark K. Transtrum,
Ellad B. Tadmor,
Ryan S. Elliott,
Daniel S. Karls,
Mingjian Wen
Abstract:
Atomistic simulations are an important tool in materials modeling. Interatomic potentials (IPs) are at the heart of such molecular models, and the accuracy of a model's predictions depends strongly on the choice of IP. Uncertainty quantification (UQ) is an emerging tool for assessing the reliability of atomistic simulations. The Open Knowledgebase of Interatomic Models (OpenKIM) is a cyberinfrastr…
▽ More
Atomistic simulations are an important tool in materials modeling. Interatomic potentials (IPs) are at the heart of such molecular models, and the accuracy of a model's predictions depends strongly on the choice of IP. Uncertainty quantification (UQ) is an emerging tool for assessing the reliability of atomistic simulations. The Open Knowledgebase of Interatomic Models (OpenKIM) is a cyberinfrastructure project whose goal is to collect and standardize the study of IPs to enable transparent, reproducible research. Part of the OpenKIM framework is the Python package, KIM-based Learning-Integrated Fitting Framework (KLIFF), that provides tools for fitting parameters in an IP to data. This paper introduces a UQ toolbox extension to KLIFF. We focus on two sources of uncertainty: variations in parameters and inadequacy of the functional form of the IP. Our implementation uses parallel-tempered Markov chain Monte Carlo (PTMCMC), adjusting the sampling temperature to estimate the uncertainty due to the functional form of the IP. We demonstrate on a Stillinger--Weber potential that makes predictions for the atomic energies and forces for silicon in a diamond configuration. Finally, we highlight some potential subtleties in applying and using these tools with recommendations for practitioners and IP developers.
△ Less
Submitted 22 August, 2022; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Sloppy model analysis identifies bifurcation parameters without Normal Form analysis
Authors:
Christian N. K. Anderson,
Mark K. Transtrum
Abstract:
Bifurcation phenomena are common in multi-dimensional multi-parameter dynamical systems. Normal form theory suggests that the bifurcations themselves are driven by relatively few parameters; however, these are often nonlinear combinations of the bare parameters in which the equations are expressed. Discovering reparameterizations to transform such complex original equations into normal-form is oft…
▽ More
Bifurcation phenomena are common in multi-dimensional multi-parameter dynamical systems. Normal form theory suggests that the bifurcations themselves are driven by relatively few parameters; however, these are often nonlinear combinations of the bare parameters in which the equations are expressed. Discovering reparameterizations to transform such complex original equations into normal-form is often very difficult, and the reparameterization may not even exist in a closed-form. Recent advancements have tied both information geometry and bifurcations to the Renormalization Group. Here, we show that sloppy model analysis (a method of information geometry) can be used directly on bifurcations of increasing time scales to rapidly characterize the system's topological inhomogeneities, whether the system is in normal form or not. We anticipate that this novel analytical method, which we call time-widening information geometry (TWIG), will be useful in applied network analysis.
△ Less
Submitted 27 November, 2023; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Bayesian, frequentist, and information geometric approaches to parametric uncertainty quantification of classical empirical interatomic potentials
Authors:
Yonatan Kurniawan,
Cody L. Petrie,
Kinamo J. Williams,
Mark K. Transtrum,
Ellad B. Tadmor,
Ryan S. Elliott,
Daniel S. Karls,
Mingjian Wen
Abstract:
In this paper, we consider the problem of quantifying parametric uncertainty in classical empirical interatomic potentials (IPs) using both Bayesian (Markov Chain Monte Carlo) and frequentist (profile likelihood) methods. We interface these tools with the Open Knowledgebase of Interatomic Models and study three models based on the Lennard-Jones, Morse, and Stillinger--Weber potentials. We confirm…
▽ More
In this paper, we consider the problem of quantifying parametric uncertainty in classical empirical interatomic potentials (IPs) using both Bayesian (Markov Chain Monte Carlo) and frequentist (profile likelihood) methods. We interface these tools with the Open Knowledgebase of Interatomic Models and study three models based on the Lennard-Jones, Morse, and Stillinger--Weber potentials. We confirm that IPs are typically sloppy, i.e., insensitive to coordinated changes in some parameter combinations. Because the inverse problem in such models is ill-conditioned, parameters are unidentifiable. This presents challenges for traditional statistical methods, as we demonstrate and interpret within both Bayesian and frequentist frameworks. We use information geometry to illuminate the underlying cause of this phenomenon and show that IPs have global properties similar to those of sloppy models from fields such as systems biology, power systems, and critical phenomena. IPs correspond to bounded manifolds with a hierarchy of widths, leading to low effective dimensionality in the model. We show how information geometry can motivate new, natural parameterizations that improve the stability and interpretation of uncertainty quantification analysis and further suggest simplified, less-sloppy models.
△ Less
Submitted 14 June, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Information geometry for multiparameter models: New perspectives on the origin of simplicity
Authors:
Katherine N. Quinn,
Michael C. Abbott,
Mark K. Transtrum,
Benjamin B. Machta,
James P. Sethna
Abstract:
Complex models in physics, biology, economics, and engineering are often sloppy, meaning that the model parameters are not well determined by the model predictions for collective behavior. Many parameter combinations can vary over decades without significant changes in the predictions. This review uses information geometry to explore sloppiness and its deep relation to emergent theories. We introd…
▽ More
Complex models in physics, biology, economics, and engineering are often sloppy, meaning that the model parameters are not well determined by the model predictions for collective behavior. Many parameter combinations can vary over decades without significant changes in the predictions. This review uses information geometry to explore sloppiness and its deep relation to emergent theories. We introduce the model manifold of predictions, whose coordinates are the model parameters. Its hyperribbon structure explains why only a few parameter combinations matter for the behavior. We review recent rigorous results that connect the hierarchy of hyperribbon widths to approximation theory, and to the smoothness of model predictions under changes of the control variables. We discuss recent geodesic methods to find simpler models on nearby boundaries of the model manifold -- emergent theories with fewer parameters that explain the behavior equally well. We discuss a Bayesian prior which optimizes the mutual information between model parameters and experimental data, naturally favoring points on the emergent boundary theories and thus simpler models. We introduce a `projected maximum likelihood' prior that efficiently approximates this optimal prior, and contrast both to the poor behavior of the traditional Jeffreys prior. We discuss the way the renormalization group coarse-graining in statistical mechanics introduces a flow of the model manifold, and connect stiff and sloppy directions along the model manifold with relevant and irrelevant eigendirections of the renormalization group. Finally, we discuss recently developed `intensive' embedding methods, allowing one to visualize the predictions of arbitrary probabilistic models as low-dimensional projections of an isometric embedding, and illustrate our method by generating the model manifold of the Ising model.
△ Less
Submitted 22 September, 2022; v1 submitted 13 November, 2021;
originally announced November 2021.
-
The supremum principle selects simple, transferable models
Authors:
Cody Petrie,
Christian Anderson,
Casie Maekawa,
Travis Maekawa,
Mark K. Transtrum
Abstract:
We consider how mathematical models enable predictions for conditions that are qualitatively different from the training data. We propose techniques based on information topology to find models that can apply their learning in regimes for which there is no data. The first step is to use the Manifold Boundary Approximation Method to construct simple, reduced models of target phenomena in a data-dri…
▽ More
We consider how mathematical models enable predictions for conditions that are qualitatively different from the training data. We propose techniques based on information topology to find models that can apply their learning in regimes for which there is no data. The first step is to use the Manifold Boundary Approximation Method to construct simple, reduced models of target phenomena in a data-driven way. We consider the set of all such reduced models and use the topological relationships among them to reason about model selection for new, unobserved phenomena. Given minimal models for several target behaviors, we introduce the supremum principle as a criterion for selecting a new, transferable model. The supremal model, i.e., the least upper bound, is the simplest model that reduces to each of the target behaviors. We illustrate how to discover supremal models with several examples; in each case, the supremal model unifies causal mechanisms to transfer successfully to new target domains. We use these examples to motivate a general algorithm that has formal connections to theories of analogical reasoning in cognitive psychology.
△ Less
Submitted 25 May, 2022; v1 submitted 21 September, 2021;
originally announced September 2021.
-
Analysis of Magnetic Vortex Dissipation in Sn-Segregated Boundaries in Nb$_3$Sn Superconducting RF Cavities
Authors:
Jared Carlson,
Alden Pack,
Mark K. Transtrum,
Jaeyel Lee,
David N. Seidman,
Danilo B. Liarte,
Nathan Sitaraman,
Alen Senanian,
Michelle Kelley,
James P. Sethna,
Tomas Arias,
Sam Posen
Abstract:
We study mechanisms of vortex nucleation in Nb$_3$Sn Superconducting RF (SRF) cavities using a combination of experimental, theoretical, and computational methods. Scanning transmission electron microscopy (STEM) image and energy dispersive spectroscopy (EDS) of some Nb$_3$Sn cavities show Sn segregation at grain boundaries in Nb$_3$Sn with Sn concentration as high as $\sim$35 at.\% and widths…
▽ More
We study mechanisms of vortex nucleation in Nb$_3$Sn Superconducting RF (SRF) cavities using a combination of experimental, theoretical, and computational methods. Scanning transmission electron microscopy (STEM) image and energy dispersive spectroscopy (EDS) of some Nb$_3$Sn cavities show Sn segregation at grain boundaries in Nb$_3$Sn with Sn concentration as high as $\sim$35 at.\% and widths $\sim$3 nm in chemical composition. Using ab initio calculations, we estimate the effect excess tin has on the local superconducting properties of the material. We model Sn segregation as a lowering of the local critical temperature. We then use time-dependent Ginzburg-Landau theory to understand the role of segregation on magnetic vortex nucleation. Our simulations indicate that the grain boundaries act as both nucleation sites for vortex penetration and pinning sites for vortices after nucleation. Depending on the magnitude of the applied field, vortices may remain pinned in the grain boundary or penetrate the grain itself. We estimate the superconducting losses due to vortices filling grain boundaries and compare with observed performance degradation with higher magnetic fields. We estimate that the quality factor may decrease by an order of magnitude ($10^{10}$ to $10^9$) at typical operating fields if 0.03\% of the grain boundaries actively nucleate vortices. We additionally estimate the volume that would need to be filled with vortices to match experimental observations of cavity heating.
△ Less
Submitted 20 December, 2020; v1 submitted 6 March, 2020;
originally announced March 2020.
-
$\textit{Ab Initio}$ Study of Antisite Defects in Nb$_3$Sn: Phase Diagram and Impact on Superconductivity
Authors:
Nathan S. Sitaraman,
Jared Carlson,
Alden R. Pack,
Ryan D. Porter,
Matthias U. Liepe,
Mark K. Transtrum,
Tomás A. Arias
Abstract:
Antisite defects play a critical role in Nb$_3$Sn superconducting radio frequency (SRF) cavity physics. Such defects are the primary form of disorder in Nb$_3$Sn, and are responsible for stoichiometry variations, including experimentally observed tin-depleted regions within grains and tin-rich regions around grain boundaries. But why they cluster to form regions of different stoichiometries and ho…
▽ More
Antisite defects play a critical role in Nb$_3$Sn superconducting radio frequency (SRF) cavity physics. Such defects are the primary form of disorder in Nb$_3$Sn, and are responsible for stoichiometry variations, including experimentally observed tin-depleted regions within grains and tin-rich regions around grain boundaries. But why they cluster to form regions of different stoichiometries and how they affect the SRF properties of Nb$_3$Sn cavities are not fully understood. Using $\textit{ab initio}$ techniques, we calculate the A15 region of the Nb-Sn phase diagram, discuss a possible modification to the phase diagram near grain boundaries, and calculate $T_c$ as a function of stoichiometry, including experimentally inaccessible tin-rich stoichiometry. We find that the impact of antisite defects on the density of states near the Fermi-level of Nb$_3$Sn plays a key role in determining many of their properties. These results improve our understanding of the obstacles facing Nb$_3$Sn SRF systems, and how modified growth processes might overcome them.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.
-
Role of surface defects and material inhomogeneities for vortex nucleation in superconductors within time-dependent Ginzburg-Landau theory in 2 and 3 dimensions
Authors:
Alden R. Pack,
Jared Carlson,
Spencer Wadsworth,
Mark K. Transtrum
Abstract:
We use Time-Dependent Ginzburg-Landau theory to study the nucleation of vortices in type II superconductors in the presence of both geometric and material inhomogeneities. The superconducting Meissner state is meta-stable up to a critical magnetic field, known as the superheating field. For a uniform surface and homogenous material, the superheating transition is driven by a non-local critical mod…
▽ More
We use Time-Dependent Ginzburg-Landau theory to study the nucleation of vortices in type II superconductors in the presence of both geometric and material inhomogeneities. The superconducting Meissner state is meta-stable up to a critical magnetic field, known as the superheating field. For a uniform surface and homogenous material, the superheating transition is driven by a non-local critical mode in which an array of vortices simultaneously penetrate the surface. In contrast, we show that even a small amount of disorder localizes the critical mode and can have a significant reduction in the effective superheating field for a particular sample. Vortices can be nucleated by either surface roughness or local variations in material parameters, such as Tc. Our approach uses a finite element method to simulate a cylindrical geometry in 2 dimensions and a film geometry in 2 and 3 dimensions. We combine saddle node bifurcation analysis along with a novel fitting procedure to evaluate the superheating field and identify the unstable mode. We demonstrate agreement with previous results for homogenous geometries and surface roughness and extend the analysis to include variations in material properties. Finally, we show that in three dimensions, suface divots not aligned with the applied field can increase the super heating field. We discuss implications for fabrication and performance of superconducting resonant frequency cavities in particle accelerators.
△ Less
Submitted 18 February, 2020; v1 submitted 5 November, 2019;
originally announced November 2019.
-
Model Boundary Approximation Method as a Unifying Framework for Balanced Truncation and Singular Perturbation Approximation
Authors:
Philip E. Paré,
David Grimsman,
Alma T. Wilson,
Mark K. Transtrum,
Sean Warnick
Abstract:
We show that two widely accepted model reduction techniques, Balanced Truncation and Balanced Singular Perturbation Approximation, can be derived as limiting approximations of a carefully constructed parameterization of Linear Time Invariant (LTI) systems by employing the Model Boundary Approximation Method (MBAM), a recent development in the Physics literature. This unifying framework of these po…
▽ More
We show that two widely accepted model reduction techniques, Balanced Truncation and Balanced Singular Perturbation Approximation, can be derived as limiting approximations of a carefully constructed parameterization of Linear Time Invariant (LTI) systems by employing the Model Boundary Approximation Method (MBAM), a recent development in the Physics literature. This unifying framework of these popular model reduction techniques shows that Balanced Truncation and Balanced Singular Perturbation Approximation each correspond to a particular boundary point on a manifold, the "model manifold," which is associated with the specific choice of model parameterization and initial condition, and is embedded in a sample space of measured outputs, which can be chosen arbitrarily, provided that the number of samples exceeds the number of parameters. We also show that MBAM provides a novel way to interpolate between Balanced Truncation and Balanced Singular Perturbation Approximation, by exploring the set of approximations on the boundary of the manifold between the elements that correspond to the two model reduction techniques; this allows for alternative approximations of a given system to be found that may be better under certain conditions. The work herein suggests similar types of approximations may be obtainable in topologically similar places (i.e. on certain boundaries) on the model manifold of nonlinear systems if analogous parameterizations can be achieved, therefore extending these widely accepted model reduction techniques to nonlinear systems.
△ Less
Submitted 8 January, 2019;
originally announced January 2019.
-
Unwinding the model manifold: choosing similarity measures to remove local minima in sloppy dynamical systems
Authors:
Benjamin L. Francis,
Mark K. Transtrum
Abstract:
In this paper, we consider the problem of parameter sensitivity in models of complex dynamical systems through the lens of information geometry. We calculate the sensitivity of model behavior to variations in parameters. In most cases, models are sloppy, that is, exhibit an exponential hierarchy of parameter sensitivities. We propose a parameter classification scheme based on how the sensitivities…
▽ More
In this paper, we consider the problem of parameter sensitivity in models of complex dynamical systems through the lens of information geometry. We calculate the sensitivity of model behavior to variations in parameters. In most cases, models are sloppy, that is, exhibit an exponential hierarchy of parameter sensitivities. We propose a parameter classification scheme based on how the sensitivities scale at long observation times. We show that for oscillatory models, either with a limit cycle or a strange attractor, sensitivities can become arbitrarily large, which implies a high effective-dimensionality on the model manifold. Sloppy models with a single fixed point have model manifolds with low effective-dimensionality, previously described as a "hyper-ribbon". In contrast, models with high effective dimensionality translate into multimodal fitting problems. We define a measure of curvature on the model manifold which we call the \emph{winding frequency} that estimates the linear density of local minima in the model's parameter space. We then show how alternative choices of fitting metrics can "unwind" the model manifold and give low winding frequencies. This prescription translates the model manifold from one of high effective-dimensionality into the "hyper-ribbon" structures observed elsewhere. This translation opens the door for applications of sloppy model analysis and model reduction methods developed for models with low effective-dimensionality.
△ Less
Submitted 22 February, 2019; v1 submitted 30 May, 2018;
originally announced May 2018.
-
The Spectrum of Mechanism-oriented Models for Explanations of Biological Phenomena
Authors:
C. Anthony Hunt,
Ahmet Erdemir,
Feilim Mac Gabhann,
William W. Lytton,
Edward A. Sander,
Mark K. Transtrum,
Lealem Mulugeta
Abstract:
Within the diverse interdisciplinary life sciences domains, semantic, workflow, and methodological ambiguities can prevent the appreciation of explanations of phenomena, handicap the use of computational models, and hamper communication among scientists, engineers, and the public. Members of the life sciences community commonly, and too often loosely, draw on "mechanistic model" and similar phrase…
▽ More
Within the diverse interdisciplinary life sciences domains, semantic, workflow, and methodological ambiguities can prevent the appreciation of explanations of phenomena, handicap the use of computational models, and hamper communication among scientists, engineers, and the public. Members of the life sciences community commonly, and too often loosely, draw on "mechanistic model" and similar phrases when referring to the processes of discovering and establishing causal explanations of biological phenomena. Ambiguities in modeling and simulation terminology and methods diminish clarity, credibility, and the perceived significance of research findings. To encourage improved semantic and methodological clarity, we describe the spectrum of Mechanism-oriented Models being used to develop explanations of biological phenomena. We cluster them into three broad groups. We then expand the three groups into a total of seven workflow-related model types having clearly distinguishable features. We name each type and illustrate with diverse examples drawn from the literature. These model types are intended to contribute to the foundation of an ontology of mechanism-based simulation research in the life sciences. We show that it is within the model-development workflows that the different model types manifest and exert their scientific usefulness by enhancing and extending different forms and degrees of explanation. The process starts with knowledge about the phenomenon and continues with explanatory and mathematical descriptions. Those descriptions are transformed into software and used to perform experimental explorations by running and examining simulation output. The credibility of inferences is thus linked to having easy access to the scientific and technical provenance from each workflow stage.
△ Less
Submitted 15 January, 2018;
originally announced January 2018.
-
SRF Theory Developments from the Center for Bright Beams
Authors:
Danilo B. Liarte,
Tomas Arias,
Daniel L. Hall,
Matthias Liepe,
James P. Sethna,
Nathan Sitaraman,
Alden Pack,
Mark K. Transtrum
Abstract:
We present theoretical studies of SRF materials from the Center for Bright Beams. First, we discuss the effects of disorder, inhomogeneities, and materials anisotropy on the maximum parallel surface field that a superconductor can sustain in an SRF cavity, using linear stability in conjunction with Ginzburg-Landau and Eilenberger theory. We connect our disorder mediated vortex nucleation model to…
▽ More
We present theoretical studies of SRF materials from the Center for Bright Beams. First, we discuss the effects of disorder, inhomogeneities, and materials anisotropy on the maximum parallel surface field that a superconductor can sustain in an SRF cavity, using linear stability in conjunction with Ginzburg-Landau and Eilenberger theory. We connect our disorder mediated vortex nucleation model to current experimental developments of Nb$_3$Sn and other cavity materials. Second, we use time-dependent Ginzburg-Landau simulations to explore the role of inhomogeneities in nucleating vortices, and discuss the effects of trapped magnetic flux on the residual resistance of weakly- pinned Nb$_3$Sn cavities. Third, we present first-principles density-functional theory (DFT) calculations to uncover and characterize the key fundamental materials processes underlying the growth of Nb$_3$Sn. Our calculations give us key information about how, where, and when the observed tin-depletedregions form. Based on this we plan to develop new coating protocols to mitigate the formation of tin depleted regions.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.
-
Maximizing the information learned from finite data selects a simple model
Authors:
Henry H. Mattingly,
Mark K. Transtrum,
Michael C. Abbott,
Benjamin B. Machta
Abstract:
We use the language of uninformative Bayesian prior choice to study the selection of appropriately simple effective models. We advocate for the prior which maximizes the mutual information between parameters and predictions, learning as much as possible from limited data. When many parameters are poorly constrained by the available data, we find that this prior puts weight only on boundaries of th…
▽ More
We use the language of uninformative Bayesian prior choice to study the selection of appropriately simple effective models. We advocate for the prior which maximizes the mutual information between parameters and predictions, learning as much as possible from limited data. When many parameters are poorly constrained by the available data, we find that this prior puts weight only on boundaries of the parameter manifold. Thus it selects a lower-dimensional effective theory in a principled way, ignoring irrelevant parameter directions. In the limit where there is sufficient data to tightly constrain any number of parameters, this reduces to Jeffreys prior. But we argue that this limit is pathological when applied to the hyper-ribbon parameter manifolds generic in science, because it leads to dramatic dependence on effects invisible to experiment.
△ Less
Submitted 14 February, 2018; v1 submitted 2 May, 2017;
originally announced May 2017.
-
Theoretical estimates of maximum fields in superconducting resonant radio frequency cavities: Stability theory, disorder, and laminates
Authors:
Danilo B. Liarte,
Sam Posen,
Mark K. Transtrum,
Gianluigi Catelani,
Matthias Liepe,
James P. Sethna
Abstract:
Theoretical limits to the performance of superconductors in high magnetic fields parallel to their surfaces are of key relevance to current and future accelerating cavities, especially those made of new higher-Tc materials such as Nb$_3$Sn, NbN, and MgB$_2$. Indeed, beyond the so-called superheating field $H_{\mathcal{sh}}$, flux will spontaneously penetrate even a perfect superconducting surface…
▽ More
Theoretical limits to the performance of superconductors in high magnetic fields parallel to their surfaces are of key relevance to current and future accelerating cavities, especially those made of new higher-Tc materials such as Nb$_3$Sn, NbN, and MgB$_2$. Indeed, beyond the so-called superheating field $H_{\mathcal{sh}}$, flux will spontaneously penetrate even a perfect superconducting surface and ruin the performance. We present intuitive arguments and simple estimates for $H_{\mathcal{sh}}$, and combine them with our previous rigorous calculations, which we summarize. We briefly discuss experimental measurements of the superheating field, comparing to our estimates. We explore the effects of materials anisotropy and the danger of disorder in nucleating vortex entry. Will we need to control surface orientation in the layered compound MgB$_2$? Can we estimate theoretically whether dirt and defects make these new materials fundamentally more challenging to optimize than niobium? Finally, we discuss and analyze recent proposals to use thin superconducting layers or laminates to enhance the performance of superconducting cavities. Flux entering a laminate can lead to so-called pancake vortices; we consider the physics of the dislocation motion and potential re-annihilation or stabilization of these vortices after their entry.
△ Less
Submitted 26 October, 2016; v1 submitted 30 July, 2016;
originally announced August 2016.
-
Manifold boundaries give "gray-box" approximations of complex models
Authors:
Mark K. Transtrum
Abstract:
We discuss a method of parameter reduction in complex models known as the Manifold Boundary Approximation Method (MBAM). This approach, based on a geometric interpretation of statistics, maps the model reduction problem to a geometric approximation problem. It operates iteratively, removing one parameter at a time, by approximating a high-dimension, but thin manifold by its boundary. Although the…
▽ More
We discuss a method of parameter reduction in complex models known as the Manifold Boundary Approximation Method (MBAM). This approach, based on a geometric interpretation of statistics, maps the model reduction problem to a geometric approximation problem. It operates iteratively, removing one parameter at a time, by approximating a high-dimension, but thin manifold by its boundary. Although the method makes no explicit assumption about the functional form of the model, it does require that the model manifold exhibit a hierarchy of boundaries, i.e., faces, edges, corners, hyper-corners, etc. We empirically show that a variety of model classes have this curious feature, making them amenable to MBAM. These model classes include models composed of elementary functions (e.g., rational functions, exponentials, and partition functions), a variety of dynamical system (e.g., chemical and biochemical kinetics, Linear Time Invariant (LTI) systems, and compartment models), network models (e.g., Bayesian networks, Markov chains, artificial neural networks, and Markov random fields), log-linear probability distributions, and models with symmetries. We discuss how MBAM recovers many common approximation methods for each model class and discuss potential pitfalls and limitations.
△ Less
Submitted 27 May, 2016;
originally announced May 2016.
-
The limitations of model-based experimental design and parameter estimation in sloppy systems
Authors:
Andrew White,
Malachi Tolman,
Howard D. Thames,
Hubert Rodney Withers,
Kathy A. Mason,
Mark K. Transtrum
Abstract:
We explore the relationship among model fidelity, experimental design, and parameter estimation in sloppy models. We show that the approximate nature of mathematical models poses challenges for experimental design in sloppy models. In many models of complex biological processes it is unknown what are the relevant physics that must be included to explain collective behaviors. As a consequence, mode…
▽ More
We explore the relationship among model fidelity, experimental design, and parameter estimation in sloppy models. We show that the approximate nature of mathematical models poses challenges for experimental design in sloppy models. In many models of complex biological processes it is unknown what are the relevant physics that must be included to explain collective behaviors. As a consequence, models are often overly complex, with many practically unidentifiable parameters. Furthermore, which details are relevant/irrelevant vary among potential experiments. By selecting complementary experiments, experimental design may inadvertently make details that were ommitted from the model become relevant. When this occurs, the model will fail to give a good fit to the data. We use a simple hyper-model of model error to quantify a model's inadequacy and apply it to two models of complex biological processes (EGFR signaling and DNA repair) with optimally selected experiments. We find that although parameters may be accurately estimated, the error in the model renders it less predictive than it was in the sloppy regime where model error is small. We introduce the concept of a \emph{sloppy system}--a sequence of models of increasing complexity that become sloppy in the limit of microscopic accuracy. We explore the limits of accurate parameter estimation in sloppy systems and argue that system identification better approached by considering a hierarchy of models of varying detail rather than focusing parameter estimation in a single model.
△ Less
Submitted 14 June, 2016; v1 submitted 16 February, 2016;
originally announced February 2016.
-
Ginzburg-Landau theory of the superheating field anisotropy of layered superconductors
Authors:
Danilo B. Liarte,
Mark K. Transtrum,
James P. Sethna
Abstract:
We investigate the effects of material anisotropy on the superheating field of layered superconductors. We provide an intuitive argument both for the existence of a superheating field, and its dependence on anisotropy, for $κ= λ/ ξ$ (the ratio of magnetic to superconducting healing lengths) both large and small. On the one hand, the combination of our estimates with published results using a two-g…
▽ More
We investigate the effects of material anisotropy on the superheating field of layered superconductors. We provide an intuitive argument both for the existence of a superheating field, and its dependence on anisotropy, for $κ= λ/ ξ$ (the ratio of magnetic to superconducting healing lengths) both large and small. On the one hand, the combination of our estimates with published results using a two-gap model for MgB${}_2$ suggests high anisotropy of the superheating field near zero temperature. On the other hand, within Ginzburg-Landau theory for a single gap, we see that the superheating field shows significant anisotropy only when the crystal anisotropy is large and the Ginzburg-Landau parameter $κ$ is small. We then conclude that only small anisotropies in the superheating field are expected for typical unconventional superconductors near the critical temperature. Using a generalized form of Ginzburg Landau theory, we do a quantitative calculation for the anisotropic superheating field by map** the problem to the isotropic case, and present a phase diagram in terms of anisotropy and $κ$, showing type I, type II, or mixed behavior (within Ginzburg-Landau theory), and regions where each asymptotic solution is expected. We estimate anisotropies for a number of different materials, and discuss the importance of these results for radio-frequency cavities for particle accelerators.
△ Less
Submitted 28 September, 2016; v1 submitted 11 February, 2016;
originally announced February 2016.
-
Bridging Mechanistic and Phenomenological Models of Complex Biological Systems
Authors:
Mark K. Transtrum,
Peng Qiu
Abstract:
The inherent complexity of biological systems gives rise to complicated mechanistic models with a large number of parameters. On the other hand, the collective behavior of these systems can often be characterized by a relatively small number of phenomenological parameters. We use the Manifold Boundary Approximation Method (MBAM) as a tool for deriving simple phenomenological models from complicate…
▽ More
The inherent complexity of biological systems gives rise to complicated mechanistic models with a large number of parameters. On the other hand, the collective behavior of these systems can often be characterized by a relatively small number of phenomenological parameters. We use the Manifold Boundary Approximation Method (MBAM) as a tool for deriving simple phenomenological models from complicated mechanistic models. The resulting models are not black boxes, but remain expressed in terms of the microscopic parameters. In this way, we explicitly connect the macroscopic and microscopic descriptions, characterize the equivalence class of distinct systems exhibiting the same range of collective behavior, and identify the combinations of components that function as tunable control knobs for the behavior. We demonstrate the procedure for adaptation behavior exhibited by the EGFR pathway. From a 48 parameter mechanistic model, the system can be effectively described by a single adaptation parameter $τ$ characterizing the ratio of time scales for the initial response and recovery time of the system which can in turn be expressed as a combination of microscopic reaction rates, Michaelis-Menten constants, and biochemical concentrations. The situation is not unlike modeling in physics in which microscopically complex processes can often be renormalized into simple phenomenological models with only a few effective parameters. The proposed method additionally provides a mechanistic explanation for non-universal features of the behavior.
△ Less
Submitted 9 February, 2016; v1 submitted 21 September, 2015;
originally announced September 2015.
-
Shielding superconductors with thin films
Authors:
Sam Posen,
Mark K. Transtrum,
Gianluigi Catelani,
Matthias U. Liepe,
James P. Sethna
Abstract:
Determining the optimal arrangement of superconducting layers to withstand large amplitude AC magnetic fields is important for certain applications such as superconducting radiofrequency cavities. In this paper, we evaluate the shielding potential of the superconducting film/insulating film/superconductor (SIS') structure, a configuration that could provide benefits in screening large AC magnetic…
▽ More
Determining the optimal arrangement of superconducting layers to withstand large amplitude AC magnetic fields is important for certain applications such as superconducting radiofrequency cavities. In this paper, we evaluate the shielding potential of the superconducting film/insulating film/superconductor (SIS') structure, a configuration that could provide benefits in screening large AC magnetic fields. After establishing that for high frequency magnetic fields, flux penetration must be avoided, the superheating field of the structure is calculated in the London limit both numerically and, for thin films, analytically. For intermediate film thicknesses and realistic material parameters we also solve numerically the Ginzburg-Landau equations. It is shown that a small enhancement of the superheating field is possible, on the order of a few percent, for the SIS' structure relative to a bulk superconductor of the film material, if the materials and thicknesses are chosen appropriately.
△ Less
Submitted 28 June, 2015;
originally announced June 2015.
-
Sloppiness and Emergent Theories in Physics, Biology, and Beyond
Authors:
Mark K. Transtrum,
Benjamin Machta,
Kevin Brown,
Bryan C. Daniels,
Christopher R. Myers,
James P. Sethna
Abstract:
Large scale models of physical phenomena demand the development of new statistical and computational tools in order to be effective. Many such models are `sloppy', i.e., exhibit behavior controlled by a relatively small number of parameter combinations. We review an information theoretic framework for analyzing sloppy models. This formalism is based on the Fisher Information Matrix, which we inter…
▽ More
Large scale models of physical phenomena demand the development of new statistical and computational tools in order to be effective. Many such models are `sloppy', i.e., exhibit behavior controlled by a relatively small number of parameter combinations. We review an information theoretic framework for analyzing sloppy models. This formalism is based on the Fisher Information Matrix, which we interpret as a Riemannian metric on a parameterized space of models. Distance in this space is a measure of how distinguishable two models are based on their predictions. Sloppy model manifolds are bounded with a hierarchy of widths and extrinsic curvatures. We show how the manifold boundary approximation can extract the simple, hidden theory from complicated sloppy models. We attribute the success of simple effective models in physics as likewise emerging from complicated processes exhibiting a low effective dimensionality. We discuss the ramifications and consequences of sloppy models for biochemistry and science more generally. We suggest that the reason our complex world is understandable is due to the same fundamental reason: simple theories of macroscopic behavior are hidden inside complicated microscopic processes.
△ Less
Submitted 30 January, 2015;
originally announced January 2015.
-
Information topology identifies emergent model classes
Authors:
Mark K. Transtrum,
Gus Hart,
Peng Qiu
Abstract:
We develop a language for describing the relationship among observations, mathematical models, and the underlying principles from which they are derived. Using Information Geometry, we consider geometric properties of statistical models for different observations. As observations are varied, the model manifold may be stretched, compressed, or even collapsed. Observations that preserve the structur…
▽ More
We develop a language for describing the relationship among observations, mathematical models, and the underlying principles from which they are derived. Using Information Geometry, we consider geometric properties of statistical models for different observations. As observations are varied, the model manifold may be stretched, compressed, or even collapsed. Observations that preserve the structural identifiability of the parameters also preserve certain topological features (such as edges and corners) that characterize the model's underlying physical principles. We introduce Information Topology in analogy with information geometry as characterizing the "abstract model" of which statistical models are realizations. Observations that change the topology, i.e., "manifold collapse," require a modification of the abstract model in order to construct identifiable statistical models. Often, the essential topological feature is a hierarchical structure of boundaries (faces, edges, corners, etc.) which we represent as a hierarchical graph known as a Hasse diagram. Low-dimensional elements of this diagram are simple models that describe the dominant behavioral modes, what we call emergent model classes. Observations that preserve the Hasse diagram are diffeomorphically related and form a group, the collection of which form a partially ordered set. All possible observations have a semi-group structure. For hierarchical models, we consider how the topology of simple models is embedded in that of larger models. When emergent model classes are unstable to the introduction of new parameters, we classify the new parameters as relevant. Conversely, the emergent model classes are stable to the introduction of irrelevant parameters. In this way, information topology provides a general language for exploring representations of physical systems and their relationships to observations.
△ Less
Submitted 13 July, 2016; v1 submitted 22 September, 2014;
originally announced September 2014.
-
Response to comment on theoretical RF field limits of multilayer coating structures of superconducting resonator cavities
Authors:
Sam Posen,
Gianluigi Catelani,
Matthias U. Liepe,
James P. Sethna,
Mark K. Transtrum
Abstract:
A comment to the authors' SRF Conference pre-print [1] was submitted by A. Gurevich to the arXiv [2]. In this response, we show that the arguments used in the comment are not valid.
[1] arXiv:1309.3239 [2] arXiv:1309.5626
A comment to the authors' SRF Conference pre-print [1] was submitted by A. Gurevich to the arXiv [2]. In this response, we show that the arguments used in the comment are not valid.
[1] arXiv:1309.3239 [2] arXiv:1309.5626
△ Less
Submitted 16 October, 2013;
originally announced October 2013.
-
Theoretical Field Limits for Multi-Layer Superconductors
Authors:
Sam Posen,
Gianluigi Catelani,
Matthias U. Liepe,
James P. Sethna,
Mark K. Transtrum
Abstract:
The SIS structure---a thin superconducting film on a bulk superconductor separated by a thin insulating film---was propsed as a method to protect alternative SRF materials from flux penetration by enhancing the first critical field $B_{c1}$. In this work, we show that in fact $B_{c1}$ = 0 for a SIS structure. We calculate the superheating field $B_{sh}$, and we show that it can be enhanced slightl…
▽ More
The SIS structure---a thin superconducting film on a bulk superconductor separated by a thin insulating film---was propsed as a method to protect alternative SRF materials from flux penetration by enhancing the first critical field $B_{c1}$. In this work, we show that in fact $B_{c1}$ = 0 for a SIS structure. We calculate the superheating field $B_{sh}$, and we show that it can be enhanced slightly using the SIS structure, but only for a small range of film thicknesses and only if the film and the bulk are different materials. We also show that using a multilayer instead of a single thick layer is detrimental, as this decreases $B_{sh}$ of the film. We calculate the dissipation due to vortex penetration above the $B_{sh}$ of the film, and find that it is unmanageable for SRF applications. However, we find that if a gradient in the phase of the order parameter is introduced, SIS structures may be able to shield large DC and low frequency fields. We argue that the SIS structure is not beneficial for SRF cavities, but due to recent experiments showing low-surface-resistance performance above $B_{c1}$ in cavities made of superconductors with small coherence lengths, we argue that enhancement of $B_{c1}$ is not necessary, and that bulk films of alternative materials show great promise.
△ Less
Submitted 16 September, 2013; v1 submitted 12 September, 2013;
originally announced September 2013.
-
Parameter Space Compression Underlies Emergent Theories and Predictive Models
Authors:
Benjamin B. Machta,
Ricky Chachra,
Mark K. Transtrum,
James P. Sethna
Abstract:
We report a similarity between the microscopic parameter dependance of emergent theories in physics and that of multiparameter models common in other areas of science. In both cases, predictions are possible despite large uncertainties in the microscopic parameters because these details are compressed into just a few governing parameters that are sufficient to describe relevant observables. We mak…
▽ More
We report a similarity between the microscopic parameter dependance of emergent theories in physics and that of multiparameter models common in other areas of science. In both cases, predictions are possible despite large uncertainties in the microscopic parameters because these details are compressed into just a few governing parameters that are sufficient to describe relevant observables. We make this commonality explicit by examining parameter sensitivity in a hop** model of diffusion and a generalized Ising model of ferromagnetism. We trace the emergence of a smaller effective model to the development of a hierarchy of parameter importance quantified by the eigenvalues of the Fisher Information Matrix. Strikingly, the same hierarchy appears ubiquitously in models taken from diverse areas of science. We conclude that the emergence of effective continuum and universal theories in physics is due to the same parameter space hierarchy that underlies predictive modeling in other areas of science.
△ Less
Submitted 27 March, 2013;
originally announced March 2013.
-
Geodesic acceleration and the small-curvature approximation for nonlinear least squares
Authors:
Mark K. Transtrum,
James P. Sethna
Abstract:
It has been shown numerically that the performance of the Levenberg-Marquardt algorithm can be improved by including a second order correction known as the geodesic acceleration. In this paper we give the method a more sound theoretical foundation by deriving the geodesic acceleration correction without using differential geometry and showing that the traditional convergence proofs can be adapted…
▽ More
It has been shown numerically that the performance of the Levenberg-Marquardt algorithm can be improved by including a second order correction known as the geodesic acceleration. In this paper we give the method a more sound theoretical foundation by deriving the geodesic acceleration correction without using differential geometry and showing that the traditional convergence proofs can be adapted to incorporate geodesic acceleration. Unlike other methods which include second derivative information, the geodesic acceleration does not attempt to improve the Gauss-Newton approximate Hessian, but rather is an extension of the small-residual approximation to cubic order. In deriving geodesic acceleration, we note that the small-residual approximation is complemented by a small-curvature approximation. This latter approximation provides a much broader justification for the Gauss-Newton approximate Hessian and Levenberg-Marquardt algorithm. In particular, it is justifiable even if the best fit residuals are large, is dependent only on the model and not on the data being fit, and is applicable for the entire course of the algorithm and not just the region near the minimum.
△ Less
Submitted 20 July, 2012;
originally announced July 2012.
-
Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization
Authors:
Mark K. Transtrum,
James P. Sethna
Abstract:
When minimizing a nonlinear least-squares function, the Levenberg-Marquardt algorithm can suffer from a slow convergence, particularly when it must navigate a narrow canyon en route to a best fit. On the other hand, when the least-squares function is very flat, the algorithm may easily become lost in parameter space. We introduce several improvements to the Levenberg-Marquardt algorithm in order t…
▽ More
When minimizing a nonlinear least-squares function, the Levenberg-Marquardt algorithm can suffer from a slow convergence, particularly when it must navigate a narrow canyon en route to a best fit. On the other hand, when the least-squares function is very flat, the algorithm may easily become lost in parameter space. We introduce several improvements to the Levenberg-Marquardt algorithm in order to improve both its convergence speed and robustness to initial parameter guesses. We update the usual step to include a geodesic acceleration correction term, explore a systematic way of accepting uphill steps that may increase the residual sum of squares due to Umrigar and Nightingale, and employ the Broyden method to update the Jacobian matrix. We test these changes by comparing their performance on a number of test problems with standard implementations of the algorithm. We suggest that these two particular challenges, slow convergence and robustness to initial guesses, are complimentary problems. Schemes that improve convergence speed often make the algorithm less robust to the initial guess, and vice versa. We provide an open source implementation of our improvements that allow the user to adjust the algorithm parameters to suit particular needs.
△ Less
Submitted 27 January, 2012;
originally announced January 2012.
-
Structural Susceptibility and Separation of Time Scales in the van der Pol Oscillator
Authors:
Ricky Chachra,
Mark K. Transtrum,
James P. Sethna
Abstract:
We use an extension of the van der Pol oscillator as an example of a system with multiple time scales to study the susceptibility of its trajectory to polynomial perturbations in the dynamics. A striking feature of many nonlinear, multi-parameter models is an apparently inherent insensitivity to large magnitude variations in certain linear combinations of parameters. This phenomenon of "sloppiness…
▽ More
We use an extension of the van der Pol oscillator as an example of a system with multiple time scales to study the susceptibility of its trajectory to polynomial perturbations in the dynamics. A striking feature of many nonlinear, multi-parameter models is an apparently inherent insensitivity to large magnitude variations in certain linear combinations of parameters. This phenomenon of "sloppiness" is quantified by calculating the eigenvalues of the Hessian matrix of the least-squares cost function which typically span many orders of magnitude. The van der Pol system is no exception: Perturbations in its dynamics show that most directions in parameter space weakly affect the limit cycle, whereas only a few directions are stiff. With this study we show that separating the time scales in the van der Pol system leads to a further separation of eigenvalues. Parameter combinations which perturb the slow manifold are stiffer and those which solely affect the transients in the dynamics are sloppier.
△ Less
Submitted 1 August, 2012; v1 submitted 22 December, 2011;
originally announced December 2011.
-
The geometry of nonlinear least squares with applications to sloppy models and optimization
Authors:
Mark K. Transtrum,
Benjamin B. Machta,
James P. Sethna
Abstract:
Parameter estimation by nonlinear least squares minimization is a common problem with an elegant geometric interpretation: the possible parameter values of a model induce a manifold in the space of data predictions. The minimization problem is then to find the point on the manifold closest to the data. We show that the model manifolds of a large class of models, known as sloppy models, have many u…
▽ More
Parameter estimation by nonlinear least squares minimization is a common problem with an elegant geometric interpretation: the possible parameter values of a model induce a manifold in the space of data predictions. The minimization problem is then to find the point on the manifold closest to the data. We show that the model manifolds of a large class of models, known as sloppy models, have many universal features; they are characterized by a geometric series of widths, extrinsic curvatures, and parameter-effects curvatures. A number of common difficulties in optimizing least squares problems are due to this common structure. First, algorithms tend to run into the boundaries of the model manifold, causing parameters to diverge or become unphysical. We introduce the model graph as an extension of the model manifold to remedy this problem. We argue that appropriate priors can remove the boundaries and improve convergence rates. We show that typical fits will have many evaporated parameters. Second, bare model parameters are usually ill-suited to describing model behavior; cost contours in parameter space tend to form hierarchies of plateaus and canyons. Geometrically, we understand this inconvenient parametrization as an extremely skewed coordinate basis and show that it induces a large parameter-effects curvature on the manifold. Using coordinates based on geodesic motion, these narrow canyons are transformed in many cases into a single quadratic, isotropic basin. We interpret the modified Gauss-Newton and Levenberg-Marquardt fitting algorithms as an Euler approximation to geodesic motion in these natural coordinates on the model manifold and the model graph respectively. By adding a geodesic acceleration adjustment to these algorithms, we alleviate the difficulties from parameter-effects curvature, improving both efficiency and success rates at finding good fits.
△ Less
Submitted 7 October, 2010;
originally announced October 2010.
-
Superheating field of superconductors within Ginzburg-Landau theory
Authors:
Mark K. Transtrum,
Gianluigi Catelani,
James P. Sethna
Abstract:
We study the superheating field of a bulk superconductor within Ginzburg-Landau theory, which is valid near the critical temperature. We calculate, as functions of the Ginzburg-Landau parameter $κ$, the superheating field $\Hsh$ and the critical momentum $k_c$ characterizing the wavelength of the instability of the Meissner state to flux penetration. By map** the two-dimensional linear stability…
▽ More
We study the superheating field of a bulk superconductor within Ginzburg-Landau theory, which is valid near the critical temperature. We calculate, as functions of the Ginzburg-Landau parameter $κ$, the superheating field $\Hsh$ and the critical momentum $k_c$ characterizing the wavelength of the instability of the Meissner state to flux penetration. By map** the two-dimensional linear stability theory into a one-dimensional eigenfunction problem for an ordinary differential equation, we solve the problem numerically. We demonstrate agreement between the numerics and analytics, and show convergence to the known results at both small and large $κ$. We discuss the implications of the results for superconducting RF cavities used in particle accelerators.
△ Less
Submitted 26 August, 2010;
originally announced August 2010.
-
Why are nonlinear fits so challenging?
Authors:
M. K. Transtrum,
B. B. Machta,
J. P. Sethna
Abstract:
Fitting model parameters to experimental data is a common yet often challenging task, especially if the model contains many parameters. Typically, algorithms get lost in regions of parameter space in which the model is unresponsive to changes in parameters, and one is left to make adjustments by hand. We explain this difficulty by interpreting the fitting process as a generalized interpretation…
▽ More
Fitting model parameters to experimental data is a common yet often challenging task, especially if the model contains many parameters. Typically, algorithms get lost in regions of parameter space in which the model is unresponsive to changes in parameters, and one is left to make adjustments by hand. We explain this difficulty by interpreting the fitting process as a generalized interpretation procedure. By considering the manifold of all model predictions in data space, we find that cross sections have a hierarchy of widths and are typically very narrow. Algorithms become stuck as they move near the boundaries. We observe that the model manifold, in addition to being tightly bounded, has low extrinsic curvature, leading to the use of geodesics in the fitting process. We improve the convergence of the Levenberg-Marquardt algorithm by adding the geodesic acceleration to the usual Levenberg-Marquardt step.
△ Less
Submitted 16 December, 2009; v1 submitted 21 September, 2009;
originally announced September 2009.