-
Valid Conformal Prediction for Dynamic GNNs
Authors:
Ed Davis,
Ian Gallagher,
Daniel John Lawson,
Patrick Rubin-Delanchy
Abstract:
Graph neural networks (GNNs) are powerful black-box models which have shown impressive empirical performance. However, without any form of uncertainty quantification, it can be difficult to trust such models in high-risk scenarios. Conformal prediction aims to address this problem, however, an assumption of exchangeability is required for its validity which has limited its applicability to static…
▽ More
Graph neural networks (GNNs) are powerful black-box models which have shown impressive empirical performance. However, without any form of uncertainty quantification, it can be difficult to trust such models in high-risk scenarios. Conformal prediction aims to address this problem, however, an assumption of exchangeability is required for its validity which has limited its applicability to static graphs and transductive regimes. We propose to use unfolding, which allows any existing static GNN to output a dynamic graph embedding with exchangeability properties. Using this, we extend the validity of conformal prediction to dynamic GNNs in both transductive and semi-inductive regimes. We provide a theoretical guarantee of valid conformal prediction in these cases and demonstrate the empirical validity, as well as the performance gains, of unfolded GNNs against standard GNN architectures on both simulated and real datasets.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
SB-ETAS: using simulation based inference for scalable, likelihood-free inference for the ETAS model of earthquake occurrences
Authors:
Samuel Stockman,
Daniel J. Lawson,
Maximilian J. Werner
Abstract:
Performing Bayesian inference for the Epidemic-Type Aftershock Sequence (ETAS) model of earthquakes typically requires MCMC sampling using the likelihood function or estimating the latent branching structure. These tasks have computational complexity $O(n^2)$ with the number of earthquakes and therefore do not scale well with new enhanced catalogs, which can now contain an order of $10^6$ events.…
▽ More
Performing Bayesian inference for the Epidemic-Type Aftershock Sequence (ETAS) model of earthquakes typically requires MCMC sampling using the likelihood function or estimating the latent branching structure. These tasks have computational complexity $O(n^2)$ with the number of earthquakes and therefore do not scale well with new enhanced catalogs, which can now contain an order of $10^6$ events. On the other hand, simulation from the ETAS model can be done more quickly $O(n \log n )$. We present SB-ETAS: simulation-based inference for the ETAS model. This is an approximate Bayesian method which uses Sequential Neural Posterior Estimation (SNPE), a machine learning based algorithm for learning posterior distributions from simulations. SB-ETAS can successfully approximate ETAS posterior distributions on shorter catalogues where it is computationally feasible to compare with MCMC sampling. Furthermore, the scaling of SB-ETAS makes it feasible to fit to very large earthquake catalogs, such as one for Southern California dating back to 1932. SB-ETAS can find Bayesian estimates of ETAS parameters for this catalog in less than 10 hours on a standard laptop, which would have taken over 2 weeks using MCMC. Looking beyond the standard ETAS model, this simulation based framework would allow earthquake modellers to define and infer parameters for much more complex models that have intractable likelihood functions.
△ Less
Submitted 28 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
The Directed Van Kampen Theorem in Lean
Authors:
Henning Basold,
Peter Bruin,
Dominique Lawson
Abstract:
Directed topology is an area of mathematics with applications in concurrency. It extends the concept of a topological space by adding a notion of directedness, which restricts how paths can evolve through a space and enables thereby a faithful representation of computation with their direction. In this paper, we present a Lean formalisation of directed spaces and a Van Kampen theorem for them. Thi…
▽ More
Directed topology is an area of mathematics with applications in concurrency. It extends the concept of a topological space by adding a notion of directedness, which restricts how paths can evolve through a space and enables thereby a faithful representation of computation with their direction. In this paper, we present a Lean formalisation of directed spaces and a Van Kampen theorem for them. This theorem allows the calculation of the homotopy type of a space by combining local knowledge the homotopy type of subspaces. With this theorem, the reasoning about spaces can be reduced to subspaces and, by representing concurrent systems as directed spaces, we can reduce the deduction of properties of a composed system to that of subsystems. The formalisation in Lean can serve to support computer-assisted reasoning about the behaviour of concurrent systems.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
A Simple and Powerful Framework for Stable Dynamic Network Embedding
Authors:
Ed Davis,
Ian Gallagher,
Daniel John Lawson,
Patrick Rubin-Delanchy
Abstract:
In this paper, we address the problem of dynamic network embedding, that is, representing the nodes of a dynamic network as evolving vectors within a low-dimensional space. While the field of static network embedding is wide and established, the field of dynamic network embedding is comparatively in its infancy. We propose that a wide class of established static network embedding methods can be us…
▽ More
In this paper, we address the problem of dynamic network embedding, that is, representing the nodes of a dynamic network as evolving vectors within a low-dimensional space. While the field of static network embedding is wide and established, the field of dynamic network embedding is comparatively in its infancy. We propose that a wide class of established static network embedding methods can be used to produce interpretable and powerful dynamic network embeddings when they are applied to the dilated unfolded adjacency matrix. We provide a theoretical guarantee that, regardless of embedding dimension, these unfolded methods will produce stable embeddings, meaning that nodes with identical latent behaviour will be exchangeable, regardless of their position in time or space. We additionally define a hypothesis testing framework which can be used to evaluate the quality of a dynamic network embedding by testing for planted structure in simulated networks. Using this, we demonstrate that, even in trivial cases, unstable methods are often either conservative or encode incorrect structure. In contrast, we demonstrate that our suite of stable unfolded methods are not only more interpretable but also more powerful in comparison to their unstable counterparts.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Optical switching beyond a million cycles of low-loss phase change material Sb$_2$Se$_3$
Authors:
Daniel Lawson,
Sophie Blundell,
Martin Ebert,
Otto L. Muskens,
Ioannis Zeimpekis
Abstract:
The development of the next generation of optical phase change technologies for integrated photonic and free-space platforms relies on the availability of materials that can be switched repeatedly over large volumes and with low optical losses. In recent years, the antimony-based chalcogenide phase-change material Sb$_2$Se$_3$ has been identified as particularly promising for a number of applicati…
▽ More
The development of the next generation of optical phase change technologies for integrated photonic and free-space platforms relies on the availability of materials that can be switched repeatedly over large volumes and with low optical losses. In recent years, the antimony-based chalcogenide phase-change material Sb$_2$Se$_3$ has been identified as particularly promising for a number of applications owing to good optical transparency in the near-infrared part of the spectrum and a high refractive index close to silicon. The crystallization temperature of Sb$_2$Se$_3$ of around 460 K allows switching to be achieved at moderate energies using optical or electrical control signals while providing sufficient data retention time for non-volatile storage. Here, we investigate the parameter space for optical switching of films of Sb$_2$Se$_3$ for a range of film thicknesses relevant for optical applications. By identifying optimal switching conditions, we demonstrate endurance of up to 10$^7$ cycles at reversible switching rates of 20 kHz. Our work demonstrates that the combination of intrinsic film parameters with pum** conditions is particularly critical for achieving high endurance in optical phase change applications.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
NAS-X: Neural Adaptive Smoothing via Twisting
Authors:
Dieterich Lawson,
Michael Li,
Scott Linderman
Abstract:
Sequential latent variable models (SLVMs) are essential tools in statistics and machine learning, with applications ranging from healthcare to neuroscience. As their flexibility increases, analytic inference and model learning can become challenging, necessitating approximate methods. Here we introduce neural adaptive smoothing via twisting (NAS-X), a method that extends reweighted wake-sleep (RWS…
▽ More
Sequential latent variable models (SLVMs) are essential tools in statistics and machine learning, with applications ranging from healthcare to neuroscience. As their flexibility increases, analytic inference and model learning can become challenging, necessitating approximate methods. Here we introduce neural adaptive smoothing via twisting (NAS-X), a method that extends reweighted wake-sleep (RWS) to the sequential setting by using smoothing sequential Monte Carlo (SMC) to estimate intractable posterior expectations. Combining RWS and smoothing SMC allows NAS-X to provide low-bias and low-variance gradient estimates, and fit both discrete and continuous latent variable models. We illustrate the theoretical advantages of NAS-X over previous methods and explore these advantages empirically in a variety of tasks, including a challenging application to mechanistic models of neuronal dynamics. These experiments show that NAS-X substantially outperforms previous VI- and RWS-based methods in inference and model learning, achieving lower parameter error and tighter likelihood bounds.
△ Less
Submitted 30 October, 2023; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies
Authors:
Daniel Lawson,
Ahmed H. Qureshi
Abstract:
Recent work has shown the promise of creating generalist, transformer-based, models for language, vision, and sequential decision-making problems. To create such models, we generally require centralized training objectives, data, and compute. It is of interest if we can more flexibly create generalist policies by merging together multiple, task-specific, individually trained policies. In this work…
▽ More
Recent work has shown the promise of creating generalist, transformer-based, models for language, vision, and sequential decision-making problems. To create such models, we generally require centralized training objectives, data, and compute. It is of interest if we can more flexibly create generalist policies by merging together multiple, task-specific, individually trained policies. In this work, we take a preliminary step in this direction through merging, or averaging, subsets of Decision Transformers in parameter space trained on different MuJoCo locomotion problems, forming multi-task models without centralized training. We also demonstrate the importance of various methodological choices when merging policies, such as utilizing common pre-trained initializations, increasing model capacity, and utilizing Fisher information for weighting parameter importance. In general, we believe research in this direction could help democratize and distribute the process that forms multi-task robotics policies. Our implementation is available at https://github.com/daniellawson9999/merging-decision-transformers.
△ Less
Submitted 22 September, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Co-learning Planning and Control Policies Constrained by Differentiable Logic Specifications
Authors:
Zikang Xiong,
Daniel Lawson,
Joe Eappen,
Ahmed H. Qureshi,
Suresh Jagannathan
Abstract:
Synthesizing planning and control policies in robotics is a fundamental task, further complicated by factors such as complex logic specifications and high-dimensional robot dynamics. This paper presents a novel reinforcement learning approach to solving high-dimensional robot navigation tasks with complex logic specifications by co-learning planning and control policies. Notably, this approach sig…
▽ More
Synthesizing planning and control policies in robotics is a fundamental task, further complicated by factors such as complex logic specifications and high-dimensional robot dynamics. This paper presents a novel reinforcement learning approach to solving high-dimensional robot navigation tasks with complex logic specifications by co-learning planning and control policies. Notably, this approach significantly reduces the sample complexity in training, allowing us to train high-quality policies with much fewer samples compared to existing reinforcement learning algorithms. In addition, our methodology streamlines complex specification extraction from map images and enables the efficient generation of long-horizon robot motion paths across different map layouts. Moreover, our approach also demonstrates capabilities for high-dimensional control and avoiding suboptimal policies via policy alignment. The efficacy of our approach is demonstrated through experiments involving simulated high-dimensional quadruped robot dynamics and a real-world differential drive robot (TurtleBot3) under different types of task specifications.
△ Less
Submitted 1 October, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
Forecasting the 2016-2017 Central Apennines Earthquake Sequence with a Neural Point Process
Authors:
Samuel Stockman,
Daniel J. Lawson,
Maximilian J. Werner
Abstract:
Point processes have been dominant in modeling the evolution of seismicity for decades, with the Epidemic Type Aftershock Sequence (ETAS) model being most popular. Recent advances in machine learning have constructed highly flexible point process models using neural networks to improve upon existing parametric models. We investigate whether these flexible point process models can be applied to sho…
▽ More
Point processes have been dominant in modeling the evolution of seismicity for decades, with the Epidemic Type Aftershock Sequence (ETAS) model being most popular. Recent advances in machine learning have constructed highly flexible point process models using neural networks to improve upon existing parametric models. We investigate whether these flexible point process models can be applied to short-term seismicity forecasting by extending an existing temporal neural model to the magnitude domain and we show how this model can forecast earthquakes above a target magnitude threshold. We first demonstrate that the neural model can fit synthetic ETAS data, however, requiring less computational time because it is not dependent on the full history of the sequence. By artificially emulating short-term aftershock incompleteness in the synthetic dataset, we find that the neural model outperforms ETAS. Using a new enhanced catalog from the 2016-2017 Central Apennines earthquake sequence, we investigate the predictive skill of ETAS and the neural model with respect to the lowest input magnitude. Constructing multiple forecasting experiments using the Visso, Norcia and Campotosto earthquakes to partition training and testing data, we target M3+ events. We find both models perform similarly at previously explored thresholds (e.g., above M3), but lowering the threshold to M1.2 reduces the performance of ETAS unlike the neural model. We argue that some of these gains are due to the neural model's ability to handle incomplete data. The robustness to missing data and speed to train the neural model present it as an encouraging competitor in earthquake forecasting.
△ Less
Submitted 2 October, 2023; v1 submitted 24 January, 2023;
originally announced January 2023.
-
Control Transformer: Robot Navigation in Unknown Environments through PRM-Guided Return-Conditioned Sequence Modeling
Authors:
Daniel Lawson,
Ahmed H. Qureshi
Abstract:
Learning long-horizon tasks such as navigation has presented difficult challenges for successfully applying reinforcement learning to robotics. From another perspective, under known environments, sampling-based planning can robustly find collision-free paths in environments without learning. In this work, we propose Control Transformer that models return-conditioned sequences from low-level polici…
▽ More
Learning long-horizon tasks such as navigation has presented difficult challenges for successfully applying reinforcement learning to robotics. From another perspective, under known environments, sampling-based planning can robustly find collision-free paths in environments without learning. In this work, we propose Control Transformer that models return-conditioned sequences from low-level policies guided by a sampling-based Probabilistic Roadmap (PRM) planner. We demonstrate that our framework can solve long-horizon navigation tasks using only local information. We evaluate our approach on partially-observed maze navigation with MuJoCo robots, including Ant, Point, and Humanoid. We show that Control Transformer can successfully navigate through mazes and transfer to unknown environments. Additionally, we apply our method to a differential drive robot (Turtlebot3) and show zero-shot sim2real transfer under noisy observations.
△ Less
Submitted 13 July, 2023; v1 submitted 11 November, 2022;
originally announced November 2022.
-
SIXO: Smoothing Inference with Twisted Objectives
Authors:
Dieterich Lawson,
Allan Raventós,
Andrew Warrington,
Scott Linderman
Abstract:
Sequential Monte Carlo (SMC) is an inference algorithm for state space models that approximates the posterior by sampling from a sequence of target distributions. The target distributions are often chosen to be the filtering distributions, but these ignore information from future observations, leading to practical and theoretical limitations in inference and model learning. We introduce SIXO, a me…
▽ More
Sequential Monte Carlo (SMC) is an inference algorithm for state space models that approximates the posterior by sampling from a sequence of target distributions. The target distributions are often chosen to be the filtering distributions, but these ignore information from future observations, leading to practical and theoretical limitations in inference and model learning. We introduce SIXO, a method that instead learns targets that approximate the smoothing distributions, incorporating information from all observations. The key idea is to use density ratio estimation to fit functions that warp the filtering distributions into the smoothing distributions. We then use SMC with these learned targets to define a variational objective for model and proposal learning. SIXO yields provably tighter log marginal lower bounds and offers significantly more accurate posterior inferences and parameter estimates in a variety of domains.
△ Less
Submitted 20 June, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Special Relativity, Einstein Velocity Addition, and Gyrogroups: An Introduction
Authors:
Jimmie D. Lawson
Abstract:
In these notes we give an introductory unified treatment to the topics of special relativity, Lorentz transformations and the Lorentz group, Einstein velocitiy addition, and gyrogroups and gyrovector spaces. An effort has been made to present the material in a manner that is accessible to non-specialists and graduate students, and may even serve as the basis for a graduate course or seminar.
In these notes we give an introductory unified treatment to the topics of special relativity, Lorentz transformations and the Lorentz group, Einstein velocitiy addition, and gyrogroups and gyrovector spaces. An effort has been made to present the material in a manner that is accessible to non-specialists and graduate students, and may even serve as the basis for a graduate course or seminar.
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
Time-resolved reversible optical switching of the ultralow-loss phase change material Sb2Se3
Authors:
Daniel Lawson,
Daniel W. Hewak,
Otto L. Muskens,
Ioannis Zeimpekis
Abstract:
The antimony-based chalcogenide Sb2Se3 is a rapidly emerging material for photonic phase change applications owing to its ultra-low optical losses at telecommunication wavelengths in both crystalline and amorphous phases. Here, we investigate the dynamical response of these materials from nanoseconds to milliseconds under optical pum** conditions. We apply bichromatic pump-probe transient reflec…
▽ More
The antimony-based chalcogenide Sb2Se3 is a rapidly emerging material for photonic phase change applications owing to its ultra-low optical losses at telecommunication wavelengths in both crystalline and amorphous phases. Here, we investigate the dynamical response of these materials from nanoseconds to milliseconds under optical pum** conditions. We apply bichromatic pump-probe transient reflectance spectroscopy which is a widely used method to study the optical performance of optical phase change materials. Amorphous regions of several hundreds of nanometers in diameter are induced by pulsed excitation of the material using a wavelength of 488 nm above the absorption edge, while the transient reflectance is probed using a continuous wave 980 nm laser, well below the absorption edge of the material. We find vitrification dynamics in the nanosecond range and observe crystallization on millisecond time scales. These results show a large five-orders of magnitude difference in time scales between crystallization and vitrification dynamics in this material. The insights provided in this work are fundamental for the optimisation of the material family and its employment in photonic applications.
△ Less
Submitted 25 November, 2021;
originally announced November 2021.
-
UV Spectropolarimetry with Polstar: Protoplanetary Disks
Authors:
John P. Wisniewski,
Andrei V. Berdyugin,
Svetlana V. Berdyugina,
William C. Danchi,
Ruobing Dong,
Rene D. Oudmaijer,
Vladimir S. Airapetian,
Sean D. Brittain,
Ken Gayley,
Richard Ignace,
Maud Langlois,
Kellen D. Lawson,
Jamie R. Lomax,
Motohide Tamura,
Jorick S. Vink,
Paul A. Scowen
Abstract:
Polstar is a proposed NASA MIDEX mission that would feature a high resolution UV spectropolarimeter capable of measure all four Stokes parameters onboard a 60cm telescope. The mission would pioneer the field of time-domain UV spectropolarimetry. Time domain UV spectropolarimetry offers the best resource to determine the geometry and physical conditions of protoplanetary disks from the stellar surf…
▽ More
Polstar is a proposed NASA MIDEX mission that would feature a high resolution UV spectropolarimeter capable of measure all four Stokes parameters onboard a 60cm telescope. The mission would pioneer the field of time-domain UV spectropolarimetry. Time domain UV spectropolarimetry offers the best resource to determine the geometry and physical conditions of protoplanetary disks from the stellar surface to <5 AU. We detail two key objectives that a dedicated time domain UV spectropolarimetry survey, such as that enabled by Polstar, could achieve: 1) Test the hypothesis that magneto-accretion operating in young planet-forming disks around lower-mass stars transitions to boundary layer accretion in planet-forming disks around higher mass stars; and 2) Discriminate whether transient events in the innermost regions of planet-forming disks of intermediate mass stars are caused by inner disk mis-alignments or from stellar or disk emissions.
△ Less
Submitted 9 December, 2021; v1 submitted 12 November, 2021;
originally announced November 2021.
-
The Expanding Universe of the Geometric Mean
Authors:
Jimmie D. Lawson,
Yongdo Lim
Abstract:
In this paper the authors seek to trace in an accessible fashion the rapid recent development of the theory of the matrix geometric mean in the cone of positive definite matrices up through the closely related operator geometric mean in the positive cone of a unital $C^*$-algebra. The story begins with the two-variable matrix geometric mean, moves to the breakthrough developments in the multivaria…
▽ More
In this paper the authors seek to trace in an accessible fashion the rapid recent development of the theory of the matrix geometric mean in the cone of positive definite matrices up through the closely related operator geometric mean in the positive cone of a unital $C^*$-algebra. The story begins with the two-variable matrix geometric mean, moves to the breakthrough developments in the multivariable matrix setting, the main focus of the paper, and then on to the extension to the positive cone of the $C^*$-algebra of operators on a Hilbert space, even to general unital $C^*$-algebras, and finally to the consideration of barycentric maps that grow out of the geometric mean on the space of integrable probability measures on the positive cone. Besides expected tools from linear algebra and operator theory, one observes a surprisingly substantial interplay with geometrical notions in metric spaces, particularly the notion of nonpositive curvature. Added features include a glance at the probabilistic theory of random variables with values in a metric space of nonpositive curvature, and the appearance of related means such as the inductive and power means.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
The Neural Testbed: Evaluating Joint Predictions
Authors:
Ian Osband,
Zheng Wen,
Seyed Mohammad Asghari,
Vikranth Dwaracherla,
Botao Hao,
Morteza Ibrahimi,
Dieterich Lawson,
Xiuyuan Lu,
Brendan O'Donoghue,
Benjamin Van Roy
Abstract:
Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a…
▽ More
Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a range of agents using a simple neural network data generating process. Our results indicate that some popular Bayesian deep learning agents do not fare well with joint predictions, even when they can produce accurate marginal predictions. We also show that the quality of joint predictions drives performance in downstream decision tasks. We find these results are robust across choice a wide range of generative models, and highlight the practical importance of joint predictions to the community.
△ Less
Submitted 1 November, 2022; v1 submitted 9 October, 2021;
originally announced October 2021.
-
SCExAO/CHARIS Direct Imaging Discovery of a 20 au Separation, Low-Mass Ratio Brown Dwarf Companion to an Accelerating Sun-like Star
Authors:
Thayne Currie,
Timothy D. Brandt,
Masayuki Kuzuhara,
Jeffery Chilcote,
Olivier Guyon,
Christian Marois,
Tyler Groff,
Julien Lozi,
Sebastien Vievard,
Ananya Sahoo,
Vincent Deo,
Nemanja Jovanovic,
Frantz Martinache,
Kevin Wagner,
Trent J. Dupuy,
Matthew Wahl,
Michael Letawsky,
Yiting Li,
Yunlin Zeng,
G. Mirek Brandt,
Daniel Michalik,
Carol Grady,
Markus Janson,
Gillian R. Knapp,
Jungmi Kwon
, et al. (5 additional authors not shown)
Abstract:
We present the direct imaging discovery of a substellar companion to the nearby Sun-like star, HD 33632 Aa, at a projected separation of $\sim$ 20 au, obtained with SCExAO/CHARIS integral field spectroscopy complemented by Keck/NIRC2 thermal infrared imaging. The companion, HD 33632 Ab, induces a 10.5$σ$ astrometric acceleration on the star as detected with the $Gaia$ and $Hipparcos$ satellites. S…
▽ More
We present the direct imaging discovery of a substellar companion to the nearby Sun-like star, HD 33632 Aa, at a projected separation of $\sim$ 20 au, obtained with SCExAO/CHARIS integral field spectroscopy complemented by Keck/NIRC2 thermal infrared imaging. The companion, HD 33632 Ab, induces a 10.5$σ$ astrometric acceleration on the star as detected with the $Gaia$ and $Hipparcos$ satellites. SCExAO/CHARIS $JHK$ (1.1--2.4 $μ$m) spectra and Keck/NIRC2 $L_{\rm p}$ (3.78 $μ$m) photometry are best matched by a field L/T transition object: an older, higher gravity, and less dusty counterpart to HR 8799 cde. Combining our astrometry with $Gaia/Hipparcos$ data and archival Lick Observatory radial-velocities, we measure a dynamical mass of 46.4 $\pm$ 8 $M_{\rm J}$ and an eccentricity of $e$ $<$0.46 at 95\% confidence. HD 33632 Ab's mass and mass ratio (4.0\% $\pm$ 0.7\%) are comparable to the low-mass brown dwarf GJ 758 B and intermediate between the more massive brown dwarf HD 19467 B and the (near-)planet mass companions to HR 2562 and GJ 504. Using $Gaia$ to select for direct imaging observations with the newest extreme adaptive optics systems can reveal substellar or even planet-mass companions on solar system-like scales at an increased frequency compared to blind surveys.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
Discovery of a Nearby Young Brown Dwarf Disk
Authors:
M. C. Schutte,
K. D. Lawson,
J. P. Wisniewski,
M. J. Kuchner,
S. M. Silverberg,
J. K. Faherty,
D. C. Bardalez Gagliuffi,
R. Kiman,
J. Gagné,
A. Meisner,
A. C. Schneider,
A. S. Bans,
J. H. Debes,
N. Kovacevic,
M. K. D. Bosch,
H. A. Durantini Luca,
J. Holden,
M. Hyogo
Abstract:
We report the discovery of the youngest brown dwarf with a disk at 102 pc from the Sun, WISEA~J120037.79-784508.3 (W1200-7845), via the Disk Detective citizen science project. We establish that W1200-7845 is located in the 3.7$\substack{+4.6 \\ -1.4}$ Myr-old $\varepsilon$~Cha association. Its spectral energy distribution (SED) exhibits clear evidence of an infrared (IR) excess, indicative of the…
▽ More
We report the discovery of the youngest brown dwarf with a disk at 102 pc from the Sun, WISEA~J120037.79-784508.3 (W1200-7845), via the Disk Detective citizen science project. We establish that W1200-7845 is located in the 3.7$\substack{+4.6 \\ -1.4}$ Myr-old $\varepsilon$~Cha association. Its spectral energy distribution (SED) exhibits clear evidence of an infrared (IR) excess, indicative of the presence of a warm circumstellar disk. Modeling this warm disk, we find the data are best fit using a power-law description with a slope $α= -0.94$, which suggests it is a young, Class II type disk. Using a single blackbody disk fit, we find $T_{eff, disk} = 521 K$ and $L_{IR}/L_{*} = 0.14$. The near-infrared spectrum of W1200-7845 matches a spectral type of M6.0$γ\pm 0.5$, which corresponds to a low surface gravity object, and lacks distinctive signatures of strong Pa$β$ or Br$γ$ accretion. Both our SED fitting and spectral analysis indicate the source is cool ($T_{eff} = $2784-2850 K), with a mass of 42-58 $M_{Jup}$, well within the brown dwarf regime. The proximity of this young brown dwarf disk makes the system an ideal benchmark for investigating the formation and early evolution of brown dwarfs.
△ Less
Submitted 3 August, 2020; v1 submitted 30 July, 2020;
originally announced July 2020.
-
CLARITY -- Comparing heterogeneous data using dissimiLARITY
Authors:
Daniel J. Lawson,
Vinesh Solanki,
Igor Yanovich,
Johannes Dellert,
Damian Ruck,
Phillip Endicott
Abstract:
Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale, and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies wh…
▽ More
Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale, and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise, and aids in their interpretation. We illustrate this using three diverse comparisons: gene methylation vs expression, evolution of language sounds vs word use, and country-level economic metrics vs cultural beliefs. The non-parametric approach is robust to noise and differences in scaling, and makes only weak assumptions about how the data were generated. It operates by decomposing similarities into two components: a `structural' component analogous to a clustering, and an underlying `relationship' between those structures. This allows a `structural comparison' between two similarity matrices using their predictability from `structure'. Significance is assessed with the help of re-sampling appropriate for each dataset. The software, CLARITY, is available as an R package from https://github.com/danjlawson/CLARITY.
△ Less
Submitted 2 December, 2021; v1 submitted 29 May, 2020;
originally announced June 2020.
-
Peter Pan Disks: Long-lived Accretion Disks Around Young M Stars
Authors:
Steven M. Silverberg,
John P. Wisniewski,
Marc J. Kuchner,
Kellen D. Lawson,
Alissa S. Bans,
John H. Debes,
Joseph R. Biggs,
Milton K. D. Bosch,
Katharina Doll,
Hugo A. Durantini Luca,
Alexandru Enachioaie,
Joshua Hamilton,
Jonathan Holden,
Michiharu Hyogo,
the Disk Detective Collaboration
Abstract:
WISEA J080822.18-644357.3, an M star in the Carina association, exhibits extreme infrared excess and accretion activity at an age greater than the expected accretion disk lifetime. We consider J0808 as the prototypical example of a class of M star accretion disks at ages $\gtrsim 20$ Myr, which we call ``Peter Pan'' disks, since they apparently refuse to grow up. We present four new Peter Pan disk…
▽ More
WISEA J080822.18-644357.3, an M star in the Carina association, exhibits extreme infrared excess and accretion activity at an age greater than the expected accretion disk lifetime. We consider J0808 as the prototypical example of a class of M star accretion disks at ages $\gtrsim 20$ Myr, which we call ``Peter Pan'' disks, since they apparently refuse to grow up. We present four new Peter Pan disk candidates identified via the Disk Detective citizen science project, coupled with \textit{Gaia} astrometry. We find that WISEA J044634.16-262756.1 and WISEA J094900.65-713803.1 both exhibit significant infrared excess after accounting for nearby stars within the 2MASS beams. The J0446 system has $>95\%$ likelihood of Columba membership. The J0949 system shows $>95\%$ likelihood of Carina membership. We present new GMOS optical spectra of all four objects, showing possible accretion signatures on all four stars. We present ground-based and \textit{TESS} lightcurves of J0808 and 2MASS J0501-4337, including a large flare and aperiodic dip** activity on J0808, and strong periodicity on J0501. We find Pa$β$ and Br$γ$ emission indicating ongoing accretion in near-IR spectroscopy of J0808. Using observed characteristics of these systems, we discuss mechanisms that lead to accretion disks at ages $\gtrsim20$ Myr, and find that these objects most plausibly represent long-lived CO-poor primordial disks, or ``hybrid'' disks, exhibiting both debris- and primordial-disk features. The question remains: why have gas-rich disks persisted so long around these particular stars?
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Energy-Inspired Models: Learning with Sampler-Induced Distributions
Authors:
Dieterich Lawson,
George Tucker,
Bo Dai,
Rajesh Ranganath
Abstract:
Energy-based models (EBMs) are powerful probabilistic models, but suffer from intractable sampling and density evaluation due to the partition function. As a result, inference in EBMs relies on approximate sampling algorithms, leading to a mismatch between the model and inference. Motivated by this, we consider the sampler-induced distribution as the model of interest and maximize the likelihood o…
▽ More
Energy-based models (EBMs) are powerful probabilistic models, but suffer from intractable sampling and density evaluation due to the partition function. As a result, inference in EBMs relies on approximate sampling algorithms, leading to a mismatch between the model and inference. Motivated by this, we consider the sampler-induced distribution as the model of interest and maximize the likelihood of this model. This yields a class of energy-inspired models (EIMs) that incorporate learned energy functions while still providing exact samples and tractable log-likelihood lower bounds. We describe and evaluate three instantiations of such models based on truncated rejection sampling, self-normalized importance sampling, and Hamiltonian importance sampling. These models outperform or perform comparably to the recently proposed Learned Accept/Reject Sampling algorithm and provide new insights on ranking Noise Contrastive Estimation and Contrastive Predictive Coding. Moreover, EIMs allow us to generalize a recent connection between multi-sample variational lower bounds and auxiliary variable variational inference. We show how recent variational bounds can be unified with EIMs as the variational family.
△ Less
Submitted 9 January, 2020; v1 submitted 31 October, 2019;
originally announced October 2019.
-
Radial Velocity Discovery of an Eccentric Jovian World Orbiting at 18 au
Authors:
Sarah Blunt,
Michael Endl,
Lauren M. Weiss,
William D. Cochran,
Andrew W. Howard,
Phillip J. MacQueen,
Benjamin J. Fulton,
Gregory W. Henry,
Marshall C. Johnson,
Molly R. Kosiarek,
Kellen D. Lawson,
Bruce Macintosh,
Sean M. Mills,
Eric L. Nielsen,
Erik A. Petigura,
Glenn Schneider,
Andrew Vanderburg,
John P. Wisniewski,
Robert A. Wittenmyer,
Erik Brugamyer,
Caroline Caldwell,
Anita L. Cochran,
Artie P. Hatzes,
Lea A. Hirsch,
Howard Isaacson
, et al. (3 additional authors not shown)
Abstract:
Based on two decades of radial velocity (RV) observations using Keck/HIRES and McDonald/Tull, and more recent observations using the Automated Planet Finder, we found that the nearby star HR 5183 (HD 120066) hosts a 3$M_J$ minimum mass planet with an orbital period of $74^{+43}_{-22}$ years. The orbit is highly eccentric (e$\simeq$0.84), shuttling the planet from within the orbit of Jupiter to bey…
▽ More
Based on two decades of radial velocity (RV) observations using Keck/HIRES and McDonald/Tull, and more recent observations using the Automated Planet Finder, we found that the nearby star HR 5183 (HD 120066) hosts a 3$M_J$ minimum mass planet with an orbital period of $74^{+43}_{-22}$ years. The orbit is highly eccentric (e$\simeq$0.84), shuttling the planet from within the orbit of Jupiter to beyond the orbit of Neptune. Our careful survey design enabled high cadence observations before, during, and after the planet's periastron passage, yielding precise orbital parameter constraints. We searched for stellar or planetary companions that could have excited the planet's eccentricity, but found no candidates, potentially implying that the perturber was ejected from the system. We did identify a bound stellar companion more than 15,000 au from the primary, but reasoned that it is currently too widely separated to have an appreciable effect on HR 5183 b. Because HR 5183 b's wide orbit takes it more than 30 au (1") from its star, we also explored the potential of complimentary studies with direct imaging or stellar astrometry. We found that a Gaia detection is very likely, and that imaging at 10 $μ$m is a promising avenue. This discovery highlights the value of long-baseline RV surveys for discovering and characterizing long-period, eccentric Jovian planets. This population may offer important insights into the dynamical evolution of planetary systems containing multiple massive planets.
△ Less
Submitted 2 September, 2019; v1 submitted 26 August, 2019;
originally announced August 2019.
-
High Fidelity Imaging of the Inner AU Mic Debris Disk: Evidence of Differential Wind Sculpting?
Authors:
John P. Wisniewski,
Adam F. Kowalski,
James R. A. Davenport,
Glenn Schneider,
Carol A. Grady,
Leslie Hebb,
Kellen D. Lawson,
Jean-Charles Augereau,
Anthony Boccaletti,
Alexander Brown,
John H. Debes,
Andras Gaspar,
Thomas K. Henning,
Dean C. Hines,
Marc J. Kuchner,
Anne-Marie Lagrange,
Julien Milli,
Elie Sezestre,
Christopher C. Stark,
Christian Thalmann
Abstract:
We present new high fidelity optical coronagraphic imagery of the inner $\sim$50 au of AU Mic's edge-on debris disk using the BAR5 occulter of the Hubble Space Telescope Imaging Spectrograph (HST/STIS) obtained on 26-27 July 2018. This new imagery reveals that "feature A", residing at a projected stellocentric separation of 14.2 au on SE-side of the disk, exhibits an apparent "loop-like" morpholog…
▽ More
We present new high fidelity optical coronagraphic imagery of the inner $\sim$50 au of AU Mic's edge-on debris disk using the BAR5 occulter of the Hubble Space Telescope Imaging Spectrograph (HST/STIS) obtained on 26-27 July 2018. This new imagery reveals that "feature A", residing at a projected stellocentric separation of 14.2 au on SE-side of the disk, exhibits an apparent "loop-like" morphology at the time of our observations. The loop has a projected width of 1.5 au and rises 2.3 au above the disk midplane. We also explored TESS photometric observations of AU Mic that are consistent with evidence of two starspot complexes in the system. The likely co-alignment of the stellar and disk rotational axes breaks degeneracies in detailed spot modeling, indicating that AU Mic's projected magnetic field axis is offset from its rotational axis. We speculate that small grains in AU Mic's disk could be sculpted by a time-dependent wind that is influenced by this offset magnetic field axis, analogous to co-rotating Solar interaction regions that sculpt and influence the inner and outer regions of our own Heliosphere. Alternatively, if the observed spot modulation is indicative of a significant mis-alignment of the stellar and disk rotational axes, we suggest the disk could still be sculpted by the differential equatorial versus polar wind that it sees with every stellar rotation.
△ Less
Submitted 2 September, 2019; v1 submitted 23 July, 2019;
originally announced July 2019.
-
Identification of Stellar Flares Using Differential Evolution Template Optimization
Authors:
Kellen D. Lawson,
John P. Wisniewski,
Eric C. Bellm,
Adam F. Kowalski,
David L. Shupe
Abstract:
We explore methods for the identification of stellar flare events in irregularly sampled data of ground-based time domain surveys. In particular, we describe a new technique for identifying flaring stars, which we have implemented in a publicly available Python module called "PyVAN". The approach uses the Differential Evolution algorithm to optimize parameters of empirically derived light-curve te…
▽ More
We explore methods for the identification of stellar flare events in irregularly sampled data of ground-based time domain surveys. In particular, we describe a new technique for identifying flaring stars, which we have implemented in a publicly available Python module called "PyVAN". The approach uses the Differential Evolution algorithm to optimize parameters of empirically derived light-curve templates for different types of stars to fit a candidate light-curve. The difference of the likelihoods that these best-fit templates produced the observed data is then used to delineate targets that are well explained by a flare template but simultaneously poorly explained by templates of common contaminants. By testing on light-curves of known identity and morphology, we show that our technique is capable of recovering flaring status in $69\%$ of all light-curves containing a flare event above thresholds drawn to include $\lt1\%$ of any contaminant population. By applying to Palomar Transient Factory data, we show consistency with prior samples of flaring stars, and identify a small selection of candidate flaring G-type stars for possible follow-up.
△ Less
Submitted 22 July, 2019; v1 submitted 7 March, 2019;
originally announced March 2019.
-
An Inverse Function Theorem Converse
Authors:
Jimmie D. Lawson
Abstract:
We establish the following converse of the well-known inverse function theorem. Let $g:U\to V$ and $f:V\to U$ be inverse homeomorphisms between open subsets of Banach spaces. If $g$ is differentiable of class $C^p$ and $f$ if locally Lipschitz, then the Fréchet derivative of $g$ at each point of $U$ is invertible and $f$ must be differentiable of class $C^p$.
We establish the following converse of the well-known inverse function theorem. Let $g:U\to V$ and $f:V\to U$ be inverse homeomorphisms between open subsets of Banach spaces. If $g$ is differentiable of class $C^p$ and $f$ if locally Lipschitz, then the Fréchet derivative of $g$ at each point of $U$ is invertible and $f$ must be differentiable of class $C^p$.
△ Less
Submitted 9 December, 2018;
originally announced December 2018.
-
Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives
Authors:
George Tucker,
Dieterich Lawson,
Shixiang Gu,
Chris J. Maddison
Abstract:
Deep latent variable models have become a popular model choice due to the scalable learning algorithms introduced by (Kingma & Welling, 2013; Rezende et al., 2014). These approaches maximize a variational lower bound on the intractable log likelihood of the observed data. Burda et al. (2015) introduced a multi-sample variational bound, IWAE, that is at least as tight as the standard variational lo…
▽ More
Deep latent variable models have become a popular model choice due to the scalable learning algorithms introduced by (Kingma & Welling, 2013; Rezende et al., 2014). These approaches maximize a variational lower bound on the intractable log likelihood of the observed data. Burda et al. (2015) introduced a multi-sample variational bound, IWAE, that is at least as tight as the standard variational lower bound and becomes increasingly tight as the number of samples increases. Counterintuitively, the typical inference network gradient estimator for the IWAE bound performs poorly as the number of samples increases (Rainforth et al., 2018; Le et al., 2018). Roeder et al. (2017) propose an improved gradient estimator, however, are unable to show it is unbiased. We show that it is in fact biased and that the bias can be estimated efficiently with a second application of the reparameterization trick. The doubly reparameterized gradient (DReG) estimator does not suffer as the number of samples increases, resolving the previously raised issues. The same idea can be used to improve many recently introduced training techniques for latent variable models. In particular, we show that this estimator reduces the variance of the IWAE gradient, the reweighted wake-sleep update (RWS) (Bornschein & Bengio, 2014), and the jackknife variational inference (JVI) gradient (Nowozin, 2018). Finally, we show that this computationally efficient, unbiased drop-in gradient estimator translates to improved performance for all three objectives on several modeling tasks.
△ Less
Submitted 19 November, 2018; v1 submitted 9 October, 2018;
originally announced October 2018.
-
An online sequence-to-sequence model for noisy speech recognition
Authors:
Chung-Cheng Chiu,
Dieterich Lawson,
Yu** Luo,
George Tucker,
Kevin Swersky,
Ilya Sutskever,
Navdeep Jaitly
Abstract:
Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners. Recent innovations in Deep Learning have given rise to an alternative - discriminative models called Sequence-to-Sequence models, that can almost match the accuracy…
▽ More
Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners. Recent innovations in Deep Learning have given rise to an alternative - discriminative models called Sequence-to-Sequence models, that can almost match the accuracy of state of the art generative models. While these models are easy to train as they can be trained end-to-end in a single step, they have a practical limitation that they can only be used for offline recognition. This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition. To address this problem, online sequence-to-sequence models were recently introduced. These models are able to start producing outputs as data arrives, and the model feels confident enough to output partial transcripts. These models, like sequence-to-sequence are causal - the output produced by the model until any time, $t$, affects the features that are computed subsequently. This makes the model inherently more powerful than generative models that are unable to change features that are computed from the data. This paper highlights two main contributions - an improvement to online sequence-to-sequence model training, and its application to noisy settings with mixed speech from two speakers.
△ Less
Submitted 16 June, 2017;
originally announced June 2017.
-
Filtering Variational Objectives
Authors:
Chris J. Maddison,
Dieterich Lawson,
George Tucker,
Nicolas Heess,
Mohammad Norouzi,
Andriy Mnih,
Arnaud Doucet,
Yee Whye Teh
Abstract:
When used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results. Inspired by this, we consider the extension of the ELBO to a family of lower bounds defined by a particle filter's estimator of the marginal likelihood, the filtering variational objectives (FIVOs). FIVOs take the same arguments as the E…
▽ More
When used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results. Inspired by this, we consider the extension of the ELBO to a family of lower bounds defined by a particle filter's estimator of the marginal likelihood, the filtering variational objectives (FIVOs). FIVOs take the same arguments as the ELBO, but can exploit a model's sequential structure to form tighter bounds. We present results that relate the tightness of FIVO's bound to the variance of the particle filter's estimator by considering the generic case of bounds defined as log-transformed likelihood estimators. Experimentally, we show that training with FIVO results in substantial improvements over training the same model architecture with the ELBO on sequential data.
△ Less
Submitted 12 November, 2017; v1 submitted 25 May, 2017;
originally announced May 2017.
-
Learning Hard Alignments with Variational Inference
Authors:
Dieterich Lawson,
Chung-Cheng Chiu,
George Tucker,
Colin Raffel,
Kevin Swersky,
Navdeep Jaitly
Abstract:
There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training hard attention models can be difficult because of the discrete latent variables they introduce. Previous work used REINFORCE and Q-learning to ap…
▽ More
There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training hard attention models can be difficult because of the discrete latent variables they introduce. Previous work used REINFORCE and Q-learning to approach these issues, but those methods can provide high-variance gradient estimates and be slow to train. In this paper, we tackle the problem of learning hard attention for a sequential task using variational inference methods, specifically the recently introduced VIMCO and NVIL. Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We demonstrate our method on a phoneme recognition task in clean and noisy environments and show that our method outperforms REINFORCE, with the difference being greater for a more complicated task.
△ Less
Submitted 1 November, 2017; v1 submitted 16 May, 2017;
originally announced May 2017.
-
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
Authors:
George Tucker,
Andriy Mnih,
Chris J. Maddison,
Dieterich Lawson,
Jascha Sohl-Dickstein
Abstract:
Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. 2016, Maddison et al. 2016) has taken a different approach, introducing a continuous relaxation of discrete variables to produce low-variance, but biased, gradient…
▽ More
Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. 2016, Maddison et al. 2016) has taken a different approach, introducing a continuous relaxation of discrete variables to produce low-variance, but biased, gradient estimates. In this work, we combine the two approaches through a novel control variate that produces low-variance, \emph{unbiased} gradient estimates. Then, we introduce a modification to the continuous relaxation and show that the tightness of the relaxation can be adapted online, removing it as a hyperparameter. We show state-of-the-art variance reduction on several benchmark generative modeling tasks, generally leading to faster convergence to a better final log-likelihood.
△ Less
Submitted 6 November, 2017; v1 submitted 21 March, 2017;
originally announced March 2017.
-
Particle Value Functions
Authors:
Chris J. Maddison,
Dieterich Lawson,
George Tucker,
Nicolas Heess,
Arnaud Doucet,
Andriy Mnih,
Yee Whye Teh
Abstract:
The policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its effects on an example. This risk-sensitive value fu…
▽ More
The policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its effects on an example. This risk-sensitive value function is not always applicable to reinforcement learning problems, so we introduce the particle value function defined by a particle filter over the distributions of an agent's experience, which bounds the risk-sensitive one. We illustrate the benefit of the policy gradients of this objective in Cliffworld.
△ Less
Submitted 16 March, 2017;
originally announced March 2017.
-
Changing Model Behavior at Test-Time Using Reinforcement Learning
Authors:
Augustus Odena,
Dieterich Lawson,
Christopher Olah
Abstract:
Machine learning models are often used at test-time subject to constraints and trade-offs not present at training-time. For example, a computer vision model operating on an embedded device may need to perform real-time inference, or a translation model operating on a cell phone may wish to bound its average compute time in order to be power-efficient. In this work we describe a mixture-of-experts…
▽ More
Machine learning models are often used at test-time subject to constraints and trade-offs not present at training-time. For example, a computer vision model operating on an embedded device may need to perform real-time inference, or a translation model operating on a cell phone may wish to bound its average compute time in order to be power-efficient. In this work we describe a mixture-of-experts model and show how to change its test-time resource-usage on a per-input basis using reinforcement learning. We test our method on a small MNIST-based example.
△ Less
Submitted 24 February, 2017;
originally announced February 2017.
-
Training a Subsampling Mechanism in Expectation
Authors:
Colin Raffel,
Dieterich Lawson
Abstract:
We describe a mechanism for subsampling sequences and show how to compute its expected output so that it can be trained with standard backpropagation. We test this approach on a simple toy problem and discuss its shortcomings.
We describe a mechanism for subsampling sequences and show how to compute its expected output so that it can be trained with standard backpropagation. We test this approach on a simple toy problem and discuss its shortcomings.
△ Less
Submitted 7 April, 2017; v1 submitted 22 February, 2017;
originally announced February 2017.
-
Electron impact excitation of N IV: calculations with the DARC code and a comparison with ICFT results
Authors:
K. M. Aggarwal,
F. P. Keenan,
K. D. Lawson
Abstract:
There have been discussions in the recent literature regarding the accuracy of the available electron impact excitation rates (equivalently effective collision strengths $Υ$) for transitions in Be-like ions. In the present paper we demonstrate, once again, that earlier results for $Υ$ are indeed overestimated (by up to four orders of magnitude), for over 40\% of transitions and over a wide range o…
▽ More
There have been discussions in the recent literature regarding the accuracy of the available electron impact excitation rates (equivalently effective collision strengths $Υ$) for transitions in Be-like ions. In the present paper we demonstrate, once again, that earlier results for $Υ$ are indeed overestimated (by up to four orders of magnitude), for over 40\% of transitions and over a wide range of temperatures. To do this we have performed two sets of calculations for N~IV, with two different model sizes consisting of 166 and 238 fine-structure energy levels. As in our previous work, for the determination of atomic structure the GRASP (General-purpose Relativistic Atomic Structure Package) is adopted and for the scattering calculations (the standard and parallelised versions of) the Dirac Atomic R-matrix Code ({\sc darc}) are employed. Calculations for collision strengths and effective collision strengths have been performed over a wide range of energy (up to 45~Ryd) and temperature (up to 2.0$\times$10$^6$~K), useful for applications in a variety of plasmas. Corresponding results for energy levels, lifetimes and A-values for all E1, E2, M1 and M2 transitions among 238 levels of N~IV are also reported.
△ Less
Submitted 3 June, 2016;
originally announced August 2016.
-
Meta-analysis of mid-p-values: some new results based on the convex order
Authors:
Patrick Rubin-Delanchy,
Nicholas A. Heard,
Daniel John Lawson
Abstract:
The mid-p-value is a proposed improvement on the ordinary p-value for the case where the test statistic is partially or completely discrete. In this case, the ordinary p-value is conservative, meaning that its null distribution is larger than a uniform distribution on the unit interval, in the usual stochastic order. The mid-p-value is not conservative. However, its null distribution is dominated…
▽ More
The mid-p-value is a proposed improvement on the ordinary p-value for the case where the test statistic is partially or completely discrete. In this case, the ordinary p-value is conservative, meaning that its null distribution is larger than a uniform distribution on the unit interval, in the usual stochastic order. The mid-p-value is not conservative. However, its null distribution is dominated by the uniform distribution in a different stochastic order, called the convex order. The property leads us to discover some new finite-sample and asymptotic bounds on functions of mid-p-values, which can be used to combine results from different hypothesis tests conservatively, yet more powerfully, using mid-p-values rather than p-values. Our methodology is demonstrated on real data from a cyber-security application.
△ Less
Submitted 31 May, 2017; v1 submitted 19 May, 2015;
originally announced May 2015.
-
Posterior predictive p-values and the convex order
Authors:
Patrick Rubin-Delanchy,
Daniel John Lawson
Abstract:
Posterior predictive p-values are a common approach to Bayesian model-checking. This article analyses their frequency behaviour, that is, their distribution when the parameters and the data are drawn from the prior and the model respectively. We show that the family of possible distributions is exactly described as the distributions that are less variable than uniform on [0,1], in the convex order…
▽ More
Posterior predictive p-values are a common approach to Bayesian model-checking. This article analyses their frequency behaviour, that is, their distribution when the parameters and the data are drawn from the prior and the model respectively. We show that the family of possible distributions is exactly described as the distributions that are less variable than uniform on [0,1], in the convex order. In general, p-values with such a property are not conservative, and we illustrate how the theoretical worst-case error rate for false rejection can occur in practice. We describe how to correct the p-values to recover conservatism in several common scenarios, for example, when interpreting a single p-value or when combining multiple p-values into an overall score of significance. We also handle the case where the p-value is estimated from posterior samples obtained from techniques such as Markov Chain or Sequential Monte Carlo. Our results place posterior predictive p-values in a much clearer theoretical framework, allowing them to be used with more assurance.
△ Less
Submitted 29 March, 2015; v1 submitted 10 December, 2014;
originally announced December 2014.
-
Contrasting H-mode behaviour with deuterium fuelling and nitrogen seeding in the all-carbon and metallic versions of JET
Authors:
G. P. Maddison,
C. Giroud,
B. Alper,
G. Arnoux,
I. Balboa,
M. N. A. Beurskens,
A. Boboc,
S. Brezinsek,
M. Brix,
M. Clever,
R. Coelho,
J. W. Coenen,
I. Coffey,
P. C. da Silva Aresta Belo,
S. Devaux,
P. Devynck,
T. Eich,
R. C. Felton,
J. Flanagan,
L. Frassinetti,
L. Garzotti,
M. Groth,
S. Jachmich,
A. Järvinen,
E. Joffrin
, et al. (26 additional authors not shown)
Abstract:
The former all-carbon wall on JET has been replaced with beryllium in the main torus and tungsten in the divertor to mimic the surface materials envisaged for ITER. Comparisons are presented between Type I H-mode characteristics in each design by examining respective scans over deuterium fuelling and impurity seeding, required to ameliorate exhaust loads both in JET at full capability and in ITER.
The former all-carbon wall on JET has been replaced with beryllium in the main torus and tungsten in the divertor to mimic the surface materials envisaged for ITER. Comparisons are presented between Type I H-mode characteristics in each design by examining respective scans over deuterium fuelling and impurity seeding, required to ameliorate exhaust loads both in JET at full capability and in ITER.
△ Less
Submitted 11 June, 2014;
originally announced June 2014.
-
A general decision framework for structuring computation using Data Directional Scaling to process massive similarity matrices
Authors:
Daniel John Lawson,
Niall M Adams
Abstract:
As datasets grow it becomes infeasible to process them completely with a desired model. For giant datasets, we frame the order in which computation is performed as a decision problem. The order is designed so that partial computations are of value and early stop** yields useful results. Our approach comprises two related tools: a decision framework to choose the order to perform computations, an…
▽ More
As datasets grow it becomes infeasible to process them completely with a desired model. For giant datasets, we frame the order in which computation is performed as a decision problem. The order is designed so that partial computations are of value and early stop** yields useful results. Our approach comprises two related tools: a decision framework to choose the order to perform computations, and an emulation framework to enable estimation of the unevaluated computations. The approach is applied to the problem of computing similarity matrices, for which the cost of computation grows quadratically with the number of objects. Reasoning about similarities before they are observed introduces difficulties as there is no natural space and hence comparisons are difficult. We solve this by introducing a computationally convenient form of multidimensional scaling we call `data directional scaling'. High quality estimation is possible with massively reduced computation from the naive approach, and can be scaled to very large matrices. The approach is applied to the practical problem of assessing genetic similarity in population genetics. The use of statistical reasoning in decision making for large scale problems promises to be an important tool in applying statistical methodology to Big Data.
△ Less
Submitted 17 March, 2014;
originally announced March 2014.
-
Apparent strength conceals instability in a model for the collapse of historical states
Authors:
Daniel John Lawson,
Neeraj Oak
Abstract:
An explanation for the political processes leading to the sudden collapse of empires and states would be useful for understanding both historical and contemporary political events. We seek a general description of state collapse spanning eras and cultures, from small kingdoms to continental empires, drawing on a suitably diverse range of historical sources. Our aim is to provide an accessible verb…
▽ More
An explanation for the political processes leading to the sudden collapse of empires and states would be useful for understanding both historical and contemporary political events. We seek a general description of state collapse spanning eras and cultures, from small kingdoms to continental empires, drawing on a suitably diverse range of historical sources. Our aim is to provide an accessible verbal hypothesis that bridges the gap between mathematical and social methodology. We use game-theory to determine whether factions within a state will accept the political status quo, or wish to better their circumstances through costly rebellion. In lieu of precise data we verify our model using sensitivity analysis. We find that a small amount of dissatisfaction is typically harmless, but can trigger sudden collapse when there is a sufficient buildup of political inequality. Contrary to intuition, a state is predicted to be least stable when its leadership is at the height of its political power and thus most able to exert its influence through external warfare, lavish expense or autocratic decree.
△ Less
Submitted 10 July, 2013;
originally announced July 2013.
-
Populations in statistical genetic modelling and inference
Authors:
Daniel John Lawson
Abstract:
What is a population? This review considers how a population may be defined in terms of understanding the structure of the underlying genetics of the individuals involved. The main approach is to consider statistically identifiable groups of randomly mating individuals, which is well defined in theory for any type of (sexual) organism. We discuss generative models using drift, admixture and spatia…
▽ More
What is a population? This review considers how a population may be defined in terms of understanding the structure of the underlying genetics of the individuals involved. The main approach is to consider statistically identifiable groups of randomly mating individuals, which is well defined in theory for any type of (sexual) organism. We discuss generative models using drift, admixture and spatial structure, and the ancestral recombination graph. These are contrasted with statistical models for inference, principle component analysis and other `non-parametric' methods. The relationships between these approaches are explored with both simulated and real-data examples. The state-of-the-art practical software tools are discussed and contrasted. We conclude that populations are a useful theoretical construct that can be well defined in theory and often approximately exist in practice.
△ Less
Submitted 4 June, 2013;
originally announced June 2013.
-
The effect of ionization on the populations of excited levels of C IV and C V in tokamak edge plasmas
Authors:
K D Lawson,
I H Coffey,
K M Aggarwal,
F P Keenan,
JET-EFDA Contributors
Abstract:
The main populating and depopulating mechanisms of the excited energy levels of ions in plasmas with densities <1023-1024 m-3 are electron collisional excitation from the ion's ground state and radiative decay, respectively, with the majority of the electron population being in the ground state of the ionization stage. Electron collisional ionization is predominately expected to take place from on…
▽ More
The main populating and depopulating mechanisms of the excited energy levels of ions in plasmas with densities <1023-1024 m-3 are electron collisional excitation from the ion's ground state and radiative decay, respectively, with the majority of the electron population being in the ground state of the ionization stage. Electron collisional ionization is predominately expected to take place from one ground state to that of the next higher ionization stage. However, the question arises as to whether, in some cases, ionization can also affect the excited level populations. This would apply particularly to those cases involving transient events such as impurity influxes in a laboratory plasma. An analysis of the importance of ionization in populating the excited levels of ions in plasmas typical of those found in the edge of tokamaks is undertaken for the C IV and C V ionization stages. The emphasis is on those energy levels giving rise to transitions of most use for diagnostic purposes. Carbon is chosen since it is an important contaminant of JET plasmas; it was the dominant low Z impurity before the installation of the ITER-like wall and is still present in the plasma after its installation. Direct electron collisional ionization both from and to excited levels is considered. Distorted-wave Flexible Atomic Code calculations are performed to generate the required ionization cross sections, due to a lack of atomic data in the literature.
△ Less
Submitted 13 May, 2013;
originally announced May 2013.
-
Planck pre-launch status: calibration of the Low Frequency Instrument flight model radiometers
Authors:
F. Villa,
L. Terenzi,
M. Sandri,
P. Meinhold,
T. Poutanen,
P. Battaglia,
C. Franceschet,
N. Hughes,
M. Laaninen,
P. Lapolla,
M. Bersanelli,
R. C. Butler,
F. Cuttaia,
O. D'Arcangelo,
M. Frailis,
E. Franceschi,
S. Galeotta,
A. Gregorio,
R. Leonardi,
S. R. Lowe,
N. Mandolesi,
M. Maris,
L. Mendes,
A. Mennella,
G. Morgante
, et al. (49 additional authors not shown)
Abstract:
The Low Frequency Instrument (LFI) on-board the ESA Planck satellite carries eleven radiometer subsystems, called Radiometer Chain Assemblies (RCAs), each composed of a pair of pseudo-correlation receivers. We describe the on-ground calibration campaign performed to qualify the flight model RCAs and to measure their pre-launch performances. Each RCA was calibrated in a dedicated flight-like cryoge…
▽ More
The Low Frequency Instrument (LFI) on-board the ESA Planck satellite carries eleven radiometer subsystems, called Radiometer Chain Assemblies (RCAs), each composed of a pair of pseudo-correlation receivers. We describe the on-ground calibration campaign performed to qualify the flight model RCAs and to measure their pre-launch performances. Each RCA was calibrated in a dedicated flight-like cryogenic environment with the radiometer front-end cooled to 20K and the back-end at 300K, and with an external input load cooled to 4K. A matched load simulating a blackbody at different temperatures was placed in front of the sky horn to derive basic radiometer properties such as noise temperature, gain, and noise performance, e.g. 1/f noise. The spectral response of each detector was measured as was their susceptibility to thermal variation. All eleven LFI RCAs were calibrated. Instrumental parameters measured in these tests, such as noise temperature, bandwidth, radiometer isolation, and linearity, provide essential inputs to the Planck-LFI data analysis.
△ Less
Submitted 14 May, 2010;
originally announced May 2010.
-
Design, development and verification of the 30 and 44 GHz front-end modules for the Planck Low Frequency Instrument
Authors:
R. J. Davis,
A. Wilkinson,
R. D. Davies,
W. F. Winder,
N. Roddis,
E. J. Blackhurst,
D. Lawson,
S. R. Lowe,
C. Baines,
M. Butlin,
A. Galtress,
D. Shepherd,
B. Aja,
E. Artal,
M. Bersanelli,
R. C. Butler,
C. Castelli,
F. Cuttaia,
O. D'Arcangelo,
T. Gaier,
R. Hoyland,
D. Kettle,
R. Leonardi,
N. Mandolesi,
A. Mennella
, et al. (6 additional authors not shown)
Abstract:
We give a description of the design, construction and testing of the 30 and 44 GHz Front End Modules (FEMs) for the Low Frequency Instrument (LFI) of the Planck mission to be launched in 2009. The scientific requirements of the mission determine the performance parameters to be met by the FEMs, including their linear polarization characteristics.
The FEM design is that of a differential pseudo…
▽ More
We give a description of the design, construction and testing of the 30 and 44 GHz Front End Modules (FEMs) for the Low Frequency Instrument (LFI) of the Planck mission to be launched in 2009. The scientific requirements of the mission determine the performance parameters to be met by the FEMs, including their linear polarization characteristics.
The FEM design is that of a differential pseudo-correlation radiometer in which the signal from the sky is compared with a 4-K blackbody load. The Low Noise Amplifier (LNA) at the heart of the FEM is based on indium phosphide High Electron Mobility Transistors (HEMTs). The radiometer incorporates a novel phase-switch design which gives excellent amplitude and phase match across the band.
The noise temperature requirements are met within the measurement errors at the two frequencies. For the most sensitive LNAs, the noise temperature at the band centre is 3 and 5 times the quantum limit at 30 and 44 GHz respectively. For some of the FEMs, the noise temperature is still falling as the ambient temperature is reduced to 20 K. Stability tests of the FEMs, including a measurement of the 1/f knee frequency, also meet mission requirements.
The 30 and 44 GHz FEMs have met or bettered the mission requirements in all critical aspects. The most sensitive LNAs have reached new limits of noise temperature for HEMTs at their band centres. The FEMs have well-defined linear polarization characteristcs.
△ Less
Submitted 26 January, 2010;
originally announced January 2010.
-
Planck pre-launch status: Low Frequency Instrument calibration and expected scientific performance
Authors:
A. Mennella,
M. Bersanelli,
R. C. Butler,
F. Cuttaia,
O. D'Arcangelo,
R. J. Davis,
M. Frailis,
S. Galeotta,
A. Gregorio,
C. R. Lawrence,
R. Leonardi,
S. R. Lowe,
N. Mandolesi,
M. Maris,
P. Meinhold,
L. Mendes,
G. Morgante,
M. Sandri,
L. Stringhetti,
L. Terenzi,
M. Tomasi,
L. Valenziano,
F. Villa,
A. Zacchei,
A. Zonca
, et al. (61 additional authors not shown)
Abstract:
We give the calibration and scientific performance parameters of the Planck Low Frequency Instrument (LFI) measured during the ground cryogenic test campaign. These parameters characterise the instrument response and constitute our best pre-launch knowledge of the LFI scientific performance. The LFI shows excellent $1/f$ stability and rejection of instrumental systematic effects; measured noise…
▽ More
We give the calibration and scientific performance parameters of the Planck Low Frequency Instrument (LFI) measured during the ground cryogenic test campaign. These parameters characterise the instrument response and constitute our best pre-launch knowledge of the LFI scientific performance. The LFI shows excellent $1/f$ stability and rejection of instrumental systematic effects; measured noise performance shows that LFI is the most sensitive instrument of its kind. The set of measured calibration parameters will be updated during flight operations through the end of the mission.
△ Less
Submitted 25 January, 2010;
originally announced January 2010.
-
Understanding clustering in type space using field theoretic techniques
Authors:
Daniel John Lawson,
Henrik Jeldtoft Jensen
Abstract:
The birth/death process with mutation describes the evolution of a population, and displays rich dynamics including clustering and fluctuations. We discuss an analytical `field-theoretical' approach to the birth/death process, using a simple dimensional analysis argument to describe evolution as a `Super-Brownian Motion' in the infinite population limit. The field theory technique provides corre…
▽ More
The birth/death process with mutation describes the evolution of a population, and displays rich dynamics including clustering and fluctuations. We discuss an analytical `field-theoretical' approach to the birth/death process, using a simple dimensional analysis argument to describe evolution as a `Super-Brownian Motion' in the infinite population limit. The field theory technique provides corrections to this for large but finite population, and an exact description at arbitrary population size. This allows a characterisation of the difference between the evolution of a phenotype, for which strong local clustering is observed, and a genotype for which distributions are more dispersed. We describe the approach with sufficient detail for non-specialists.
△ Less
Submitted 19 November, 2007;
originally announced November 2007.
-
Neutral Evolution as Diffusion in phenotype space: reproduction with mutation but without selection
Authors:
Daniel John Lawson,
Henrik Jeldtoft Jensen
Abstract:
The process of `Evolutionary Diffusion', i.e. reproduction with local mutation but without selection in a biological population, resembles standard Diffusion in many ways. However, Evolutionary Diffusion allows the formation of local peaks with a characteristic width that undergo drift, even in the infinite population limit. We analytically calculate the mean peak width and the effective random…
▽ More
The process of `Evolutionary Diffusion', i.e. reproduction with local mutation but without selection in a biological population, resembles standard Diffusion in many ways. However, Evolutionary Diffusion allows the formation of local peaks with a characteristic width that undergo drift, even in the infinite population limit. We analytically calculate the mean peak width and the effective random walk step size, and obtain the distribution of the peak width which has a power law tail. We find that independent local mutations act as a diffusion of interacting particles with increased stepsize.
△ Less
Submitted 2 March, 2007; v1 submitted 7 September, 2006;
originally announced September 2006.
-
Diversity as a product of interspecial interactions
Authors:
Daniel Lawson,
Henrik Jeldtoft Jensen,
Kunihiko Kaneko
Abstract:
We demonstrate diversification rather than optimisation for highly interacting organisms in a well mixed biological system by means of a simple model and reference to experiment, and find the cause to be the complex network of interactions formed, allowing species less well adapted to an environment to flourish by co-interaction over the `best' species. This diversification can be considered as…
▽ More
We demonstrate diversification rather than optimisation for highly interacting organisms in a well mixed biological system by means of a simple model and reference to experiment, and find the cause to be the complex network of interactions formed, allowing species less well adapted to an environment to flourish by co-interaction over the `best' species. This diversification can be considered as the construction of many co-evolutionary niches by the network of interactions between species. Evidence for this comes from work with the bacteria Escherichia coli, which may coexist with their own mutants under certain conditions. Diversification only occurs above a certain threshold interaction strength, below which competitive exclusion occurs.
△ Less
Submitted 10 May, 2005;
originally announced May 2005.
-
The species-area relationship and evolution
Authors:
Daniel Lawson,
Henrik Jeldtoft Jensen
Abstract:
Models relating to the Species-Area curve are usually defined at the species level, and concerned only with ecological timescales. We examine an individual-based model of co-evolution on a spatial lattice based on the Tangled Nature model, and show that reproduction, mutation and dispersion by diffusion in an interacting system produces power-law Species-Area Relations as observed in ecological…
▽ More
Models relating to the Species-Area curve are usually defined at the species level, and concerned only with ecological timescales. We examine an individual-based model of co-evolution on a spatial lattice based on the Tangled Nature model, and show that reproduction, mutation and dispersion by diffusion in an interacting system produces power-law Species-Area Relations as observed in ecological measurements at medium scales. We find that co-evolutionary habitats form, allowing high diversity levels in a spatially homogenous system, and these are maintained for exponentially increasing time when increasing system size.
△ Less
Submitted 10 September, 2006; v1 submitted 13 December, 2004;
originally announced December 2004.