-
A Data-Driven Search For Mid-Infrared Excesses Among Five Million Main-Sequence FGK Stars
Authors:
Gabriella Contardo,
David W. Hogg
Abstract:
Stellar infrared excesses can indicate various phenomena of interest, from protoplanetary disks to debris disks, or (more speculatively) techno-signatures along the lines of Dyson spheres. In this paper, we conduct a large search for such excesses, designed as a data-driven contextual anomaly detection pipeline. We focus our search on FGK stars close to the main sequence to favour non-young host s…
▽ More
Stellar infrared excesses can indicate various phenomena of interest, from protoplanetary disks to debris disks, or (more speculatively) techno-signatures along the lines of Dyson spheres. In this paper, we conduct a large search for such excesses, designed as a data-driven contextual anomaly detection pipeline. We focus our search on FGK stars close to the main sequence to favour non-young host stars. We look for excess in the mid-infrared, unlocking a large sample to search in while favouring extreme IR excess akin to the ones produced by Extreme Debris Disks (EDD). We combine observations from ESA Gaia DR3, 2MASS, and the unWISE of NASA WISE, and create a catalogue of 4,898,812 stars with $G < 16$ mag. We consider a star to have an excess if it is substantially brighter in $W1$ and $W2$ bands than what is predicted from an ensemble of machine-learning models trained on the data, taking optical and near-infrared information as input features. We apply a set of additional cuts (derived from the ML models and the objects' astronomical features) to avoid false-positive and identify a set of 53 objects (a rate of $1.1\times 10^{-5}$), including one previously identified EDD candidate. Typical infrared-excess fractional luminosities we find are in the range 0.005 to 0.1, consistent with known EDDs.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Prospects for Detecting Gaps in Globular Cluster Stellar Streams in External Galaxies with the Nancy Grace Roman Space Telescope
Authors:
Christian Aganze,
Sarah Pearson,
Tjitske Starkenburg,
Gabriella Contardo,
Kathryn V. Johnston,
Kiyan Tavangar,
Adrian M. Price-Whelan,
Adam J. Burgasser
Abstract:
Stellar streams form through the tidal disruption of satellite galaxies or globular clusters orbiting a host galaxy. Globular cluster streams are exciting since they are thin (dynamically cold) and, therefore sensitive to perturbations from low-mass subhalos. Since the subhalo mass function differs depending on the dark matter composition, these gaps can provide unique constraints on dark matter m…
▽ More
Stellar streams form through the tidal disruption of satellite galaxies or globular clusters orbiting a host galaxy. Globular cluster streams are exciting since they are thin (dynamically cold) and, therefore sensitive to perturbations from low-mass subhalos. Since the subhalo mass function differs depending on the dark matter composition, these gaps can provide unique constraints on dark matter models. However, current samples are limited to the Milky Way. With its large field of view, deep imaging sensitivity, and high angular resolution, the upcoming Nancy Grace Roman Space Telescope (Roman) presents a unique opportunity to increase the number of observed streams and gaps significantly. This paper presents a first exploration of the prospects for detecting gaps in streams in M31 and other nearby galaxies with resolved stars. We simulate the formation of gaps in a Palomar-5-like stream and generate mock observations of these gaps with background stars in M31 and the foreground Milky Way stellar fields. We assess Roman's ability to detect gaps out to 10 Mpc through visual inspection and with the gap-finding tool ${\texttt{FindTheGap}}$. We conclude that gaps of $\approx 1.5$ kpc in streams that are created from subhalos of masses $\geq5 \times 10^6$M$_{\odot}$ are detectable within a 2-3 Mpc volume in exposures of 1000s to 1 hour. This volume contains $\approx 150$ galaxies, including $\approx 8$ galaxies with luminosities $>10^{9}~$L$_{\odot}$. Large samples of stream gaps in external galaxies will open up a new era of statistical analyses of gap characteristics in stellar streams and help constrain dark matter models.
△ Less
Submitted 15 February, 2024; v1 submitted 19 May, 2023;
originally announced May 2023.
-
First Impressions: Early-Time Classification of Supernovae using Host Galaxy Information and Shallow Learning
Authors:
Alexander Gagliano,
Gabriella Contardo,
Daniel Foreman-Mackey,
Alex I. Malz,
Patrick D. Aleo
Abstract:
Substantial effort has been devoted to the characterization of transient phenomena from photometric information. Automated approaches to this problem have taken advantage of complete phase-coverage of an event, limiting their use for triggering rapid follow-up of ongoing phenomena. In this work, we introduce a neural network with a single recurrent layer designed explicitly for early photometric c…
▽ More
Substantial effort has been devoted to the characterization of transient phenomena from photometric information. Automated approaches to this problem have taken advantage of complete phase-coverage of an event, limiting their use for triggering rapid follow-up of ongoing phenomena. In this work, we introduce a neural network with a single recurrent layer designed explicitly for early photometric classification of supernovae. Our algorithm leverages transfer learning to account for model misspecification, host galaxy photometry to solve the data scarcity problem soon after discovery, and a custom weighted loss to prioritize accurate early classification. We first train our algorithm using state-of-the-art transient and host galaxy simulations, then adapt its weights and validate it on the spectroscopically-confirmed SNe Ia, SNe II, and SNe Ib/c from the Zwicky Transient Facility Bright Transient Survey. On observed data, our method achieves an overall accuracy of $82 \pm 2$% within 3 days of an event's discovery, and an accuracy of $87 \pm 5$% within 30 days of discovery. At both early and late phases, our method achieves comparable or superior results to the leading classification algorithms with a simpler network architecture. These results help pave the way for rapid photometric and spectroscopic follow-up of scientifically-valuable transients discovered in massive synoptic surveys.
△ Less
Submitted 3 July, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Emulating radiation transport on cosmological scale using a denoising Unet
Authors:
Mosima P. Masipa,
Sultan Hassan,
Mario G. Santos,
Gabriella Contardo,
Kyunghyun Cho
Abstract:
Semi-numerical simulations are the leading candidates for evolving reionization on cosmological scales. These semi-numerical models are efficient in generating large-scale maps of the 21cm signal, but they are too slow to enable inference at the field level. We present different strategies to train a U-Net to accelerate these simulations. We derive the ionization field directly from the initial de…
▽ More
Semi-numerical simulations are the leading candidates for evolving reionization on cosmological scales. These semi-numerical models are efficient in generating large-scale maps of the 21cm signal, but they are too slow to enable inference at the field level. We present different strategies to train a U-Net to accelerate these simulations. We derive the ionization field directly from the initial density field without using the ionizing sources' location, and hence emulating the radiative transfer process. We find that the U-Net achieves higher accuracy in reconstructing the ionization field if the input includes either white noise or a noisy version of the ionization map beside the density field during training. Our model reconstructs the power spectrum over all scales perfectly well. This work represents a step towards generating large-scale ionization maps with a minimal cost and hence enabling rapid parameter inference at the field level.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
The Eighteenth Data Release of the Sloan Digital Sky Surveys: Targeting and First Spectra from SDSS-V
Authors:
Andrés Almeida,
Scott F. Anderson,
Maria Argudo-Fernández,
Carles Badenes,
Kat Barger,
Jorge K. Barrera-Ballesteros,
Chad F. Bender,
Erika Benitez,
Felipe Besser,
Dmitry Bizyaev,
Michael R. Blanton,
John Bochanski,
Jo Bovy,
William Nielsen Brandt,
Joel R. Brownstein,
Johannes Buchner,
Esra Bulbul,
Joseph N. Burchett,
Mariana Cano Díaz,
Joleen K. Carlberg,
Andrew R. Casey,
Vedant Chandra,
Brian Cherinka,
Cristina Chiappini,
Abigail A. Coker
, et al. (129 additional authors not shown)
Abstract:
The eighteenth data release of the Sloan Digital Sky Surveys (SDSS) is the first one for SDSS-V, the fifth generation of the survey. SDSS-V comprises three primary scientific programs, or "Mappers": Milky Way Mapper (MWM), Black Hole Mapper (BHM), and Local Volume Mapper (LVM). This data release contains extensive targeting information for the two multi-object spectroscopy programs (MWM and BHM),…
▽ More
The eighteenth data release of the Sloan Digital Sky Surveys (SDSS) is the first one for SDSS-V, the fifth generation of the survey. SDSS-V comprises three primary scientific programs, or "Mappers": Milky Way Mapper (MWM), Black Hole Mapper (BHM), and Local Volume Mapper (LVM). This data release contains extensive targeting information for the two multi-object spectroscopy programs (MWM and BHM), including input catalogs and selection functions for their numerous scientific objectives. We describe the production of the targeting databases and their calibration- and scientifically-focused components. DR18 also includes ~25,000 new SDSS spectra and supplemental information for X-ray sources identified by eROSITA in its eFEDS field. We present updates to some of the SDSS software pipelines and preview changes anticipated for DR19. We also describe three value-added catalogs (VACs) based on SDSS-IV data that have been published since DR17, and one VAC based on the SDSS-V data in the eFEDS field.
△ Less
Submitted 6 July, 2023; v1 submitted 18 January, 2023;
originally announced January 2023.
-
Further Evidence of Modified Spin-down in Sun-like Stars: Pileups in the Temperature-Period Distribution
Authors:
Trevor J. David,
Ruth Angus,
Jason L. Curtis,
Jennifer L. van Saders,
Isabel L. Colman,
Gabriella Contardo,
Yuxi Lu,
Joel C. Zinn
Abstract:
We combine stellar surface rotation periods determined from NASA's Kepler mission with spectroscopic temperatures to demonstrate the existence of pileups at the long-period and short-period edges of the temperature-period distribution for main-sequence stars with temperatures exceeding $\sim 5500$K. The long-period pileup is well-described by a curve of constant Rossby number, with a critical valu…
▽ More
We combine stellar surface rotation periods determined from NASA's Kepler mission with spectroscopic temperatures to demonstrate the existence of pileups at the long-period and short-period edges of the temperature-period distribution for main-sequence stars with temperatures exceeding $\sim 5500$K. The long-period pileup is well-described by a curve of constant Rossby number, with a critical value of $\mathrm{Ro_{crit}} \lesssim 2$. The long-period pileup was predicted by van Saders et al. (2019) as a consequence of weakened magnetic braking, in which wind-driven angular momentum losses cease once stars reach a critical Rossby number. Stars in the long-period pileup are found to have a wide range of ages ($\sim 2-6$Gyr), meaning that, along the pileup, rotation period is strongly predictive of a star's surface temperature but weakly predictive of its age. The short-period pileup, which is also well-described by a curve of constant Rossby number, is not a prediction of the weakened magnetic braking hypothesis but may instead be related to a phase of slowed surface spin-down due to core-envelope coupling. The same mechanism was proposed by Curtis et al. (2020) to explain the overlap** rotation sequences of low-mass members of differently aged open clusters. The relative dearth of stars with intermediate rotation periods between the short- and long-period pileups is also well-described by a curve of constant Rossby number, which aligns with the period gap initially discovered by McQuillan et al. (2013a) in M-type stars. These observations provide further support for the hypothesis that the period gap is due to stellar astrophysics, rather than a non-uniform star-formation history in the Kepler field.
△ Less
Submitted 10 May, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
The emptiness inside: Finding gaps, valleys, and lacunae with geometric data analysis
Authors:
Gabriella Contardo,
David W. Hogg,
Jason A. S. Hunt,
Joshua E. G. Peek,
Yen-Chi Chen
Abstract:
Discoveries of gaps in data have been important in astrophysics. For example, there are kinematic gaps opened by resonances in dynamical systems, or exoplanets of a certain radius that are empirically rare. A gap in a data set is a kind of anomaly, but in an unusual sense: Instead of being a single outlier data point, situated far from other data points, it is a region of the space, or a set of po…
▽ More
Discoveries of gaps in data have been important in astrophysics. For example, there are kinematic gaps opened by resonances in dynamical systems, or exoplanets of a certain radius that are empirically rare. A gap in a data set is a kind of anomaly, but in an unusual sense: Instead of being a single outlier data point, situated far from other data points, it is a region of the space, or a set of points, that is anomalous compared to its surroundings. Gaps are both interesting and hard to find and characterize, especially when they have non-trivial shapes. We present in this paper a statistic that can be used to estimate the (local) "gappiness" of a point in the data space. It uses the gradient and Hessian of the density estimate (and thus requires a twice-differentiable density estimator). This statistic can be computed at (almost) any point in the space and does not rely on optimization; it allows to highlight under-dense regions of any dimensionality and shape in a general and efficient way. We illustrate our method on the velocity distribution of nearby stars in the Milky Way disk plane, which exhibits gaps that could originate from different processes. Identifying and characterizing those gaps could help determine their origins. We provide in an Appendix implementation notes and additional considerations for finding under-densities in data, using critical points and the properties of the Hessian of the density.
△ Less
Submitted 5 September, 2022; v1 submitted 25 January, 2022;
originally announced January 2022.
-
The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence
Authors:
Francisco Villaescusa-Navarro,
Shy Genel,
Daniel Angles-Alcazar,
Leander Thiele,
Romeel Dave,
Desika Narayanan,
Andrina Nicola,
Yin Li,
Pablo Villanueva-Domingo,
Benjamin Wandelt,
David N. Spergel,
Rachel S. Somerville,
Jose Manuel Zorrilla Matilla,
Faizan G. Mohammad,
Sultan Hassan,
Helen Shao,
Digvijay Wadekar,
Michael Eickenberg,
Kaze W. K. Wong,
Gabriella Contardo,
Yongseok Jo,
Emily Moser,
Erwin T. Lau,
Luis Fernando Machado Poletti Valle,
Lucia A. Perez
, et al. (3 additional authors not shown)
Abstract:
We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100 million light year…
▽ More
We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100 million light years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N-body simulations from the CAMELS project. Designed to train machine learning models, CMD is the largest dataset of its kind containing more than 70 Terabytes of data. In this paper we describe CMD in detail and outline a few of its applications. We focus our attention on one such task, parameter inference, formulating the problems we face as a challenge to the community. We release all data and provide further technical details at https://camels-multifield-dataset.readthedocs.io.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
Finding universal relations in subhalo properties with artificial intelligence
Authors:
Helen Shao,
Francisco Villaescusa-Navarro,
Shy Genel,
David N. Spergel,
Daniel Angles-Alcazar,
Lars Hernquist,
Romeel Dave,
Desika Narayanan,
Gabriella Contardo,
Mark Vogelsberger
Abstract:
We use a generic formalism designed to search for relations in high-dimensional spaces to determine if the total mass of a subhalo can be predicted from other internal properties such as velocity dispersion, radius, or star-formation rate. We train neural networks using data from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project and show that the model can predict t…
▽ More
We use a generic formalism designed to search for relations in high-dimensional spaces to determine if the total mass of a subhalo can be predicted from other internal properties such as velocity dispersion, radius, or star-formation rate. We train neural networks using data from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project and show that the model can predict the total mass of a subhalo with high accuracy: more than 99% of the subhalos have a predicted mass within 0.2 dex of their true value. The networks exhibit surprising extrapolation properties, being able to accurately predict the total mass of any type of subhalo containing any kind of galaxy at any redshift from simulations with different cosmologies, astrophysics models, subgrid physics, volumes, and resolutions, indicating that the network may have found a universal relation. We then use different methods to find equations that approximate the relation found by the networks and derive new analytic expressions that predict the total mass of a subhalo from its radius, velocity dispersion, and maximum circular velocity. We show that in some regimes, the analytic expressions are more accurate than the neural networks. We interpret the relation found by the neural network and approximated by the analytic equation as being connected to the virial theorem.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
A Deep Learning Approach for Active Anomaly Detection of Extragalactic Transients
Authors:
V. Ashley Villar,
Miles Cranmer,
Edo Berger,
Gabriella Contardo,
Shirley Ho,
Griffin Hosseinzadeh,
Joshua Yao-Yu Lin
Abstract:
There is a shortage of multi-wavelength and spectroscopic followup capabilities given the number of transient and variable astrophysical events discovered through wide-field, optical surveys such as the upcoming Vera C. Rubin Observatory. From the haystack of potential science targets, astronomers must allocate scarce resources to study a selection of needles in real time. Here we present a variat…
▽ More
There is a shortage of multi-wavelength and spectroscopic followup capabilities given the number of transient and variable astrophysical events discovered through wide-field, optical surveys such as the upcoming Vera C. Rubin Observatory. From the haystack of potential science targets, astronomers must allocate scarce resources to study a selection of needles in real time. Here we present a variational recurrent autoencoder neural network to encode simulated Rubin Observatory extragalactic transient events using 1% of the PLAsTiCC dataset to train the autoencoder. Our unsupervised method uniquely works with unlabeled, real time, multivariate and aperiodic data. We rank 1,129,184 events based on an anomaly score estimated using an isolation forest. We find that our pipeline successfully ranks rarer classes of transients as more anomalous. Using simple cuts in anomaly score and uncertainty, we identify a pure (~95% pure) sample of rare transients (i.e., transients other than Type Ia, Type II and Type Ibc supernovae) including superluminous and pair-instability supernovae. Finally, our algorithm is able to identify these transients as anomalous well before peak, enabling real-time follow up studies in the era of the Rubin Observatory.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
The Influence of Age on the Relative Frequency of Super-Earths and Sub-Neptunes
Authors:
Angeli Sandoval,
Gabriella Contardo,
Trevor J. David
Abstract:
There is growing evidence that the population of close-in planets discovered by the Kepler mission was sculpted by atmospheric loss, though the typical timescale for this evolution is not well-constrained. Among a highly complete sample of planet hosts of varying ages the age-dependence of the relative fraction of super-Earth and sub-Neptune detections can be used to constrain the rate at which so…
▽ More
There is growing evidence that the population of close-in planets discovered by the Kepler mission was sculpted by atmospheric loss, though the typical timescale for this evolution is not well-constrained. Among a highly complete sample of planet hosts of varying ages the age-dependence of the relative fraction of super-Earth and sub-Neptune detections can be used to constrain the rate at which some small planets lose their atmospheres. Using the California-Kepler Survey (CKS) sample, we find evidence that the ratio of super-Earth to sub-Neptune detections rises monotonically from 1-10 Gyr. Our results are in good agreement with an independent study focused on stars hotter than the Sun, as well as with forward modeling simulations incorporating the effects of photoevaporation and a CKS-like selection function. We find the observed trend persists even after accounting for the effects of completeness or correlations between age and other fundamental parameters.
△ Less
Submitted 19 February, 2021; v1 submitted 16 December, 2020;
originally announced December 2020.
-
Evolution of the Exoplanet Size Distribution: Forming Large Super-Earths Over Billions of Years
Authors:
Trevor J. David,
Gabriella Contardo,
Angeli Sandoval,
Ruth Angus,
Yuxi,
Lu,
Megan Bedell,
Jason L. Curtis,
Daniel Foreman-Mackey,
Benjamin J. Fulton,
Samuel K. Grunblatt,
Erik A. Petigura
Abstract:
The radius valley, a bifurcation in the size distribution of small, close-in exoplanets, is hypothesized to be a signature of planetary atmospheric loss. Such an evolutionary phenomenon should depend on the age of the star-planet system. In this work, we study the temporal evolution of the radius valley using two independent determinations of host star ages among the California-Kepler Survey (CKS)…
▽ More
The radius valley, a bifurcation in the size distribution of small, close-in exoplanets, is hypothesized to be a signature of planetary atmospheric loss. Such an evolutionary phenomenon should depend on the age of the star-planet system. In this work, we study the temporal evolution of the radius valley using two independent determinations of host star ages among the California-Kepler Survey (CKS) sample. We find evidence for a wide and nearly empty void of planets in the period-radius diagram at the youngest system ages ($\lesssim$2-3 Gyr) represented in the CKS sample. We show that the orbital period dependence of the radius valley among the younger CKS planets is consistent with that found among those planets with asteroseismically determined host star radii. Relative to previous studies of preferentially older planets, the radius valley determined among the younger planetary sample is shifted to smaller radii. This result is compatible with an atmospheric loss timescale on the order of gigayears for progenitors of the largest observed super-Earths. In support of this interpretation, we show that the planet sizes which appear to be unrepresented at ages $\lesssim$2-3 Gyr are likely to correspond to planets with rocky compositions. Our results suggest the size distribution of close-in exoplanets, and the precise location of the radius valley, evolves over gigayears.
△ Less
Submitted 23 March, 2021; v1 submitted 19 November, 2020;
originally announced November 2020.
-
Anomaly Detection for Multivariate Time Series of Exotic Supernovae
Authors:
V. Ashley Villar,
Miles Cranmer,
Gabriella Contardo,
Shirley Ho,
Joshua Yao-Yu Lin
Abstract:
Supernovae mark the explosive deaths of stars and enrich the cosmos with heavy elements. Future telescopes will discover thousands of new supernovae nightly, creating a need to flag astrophysically interesting events rapidly for followup study. Ideally, such an anomaly detection pipeline would be independent of our current knowledge and be sensitive to unexpected phenomena. Here we present an unsu…
▽ More
Supernovae mark the explosive deaths of stars and enrich the cosmos with heavy elements. Future telescopes will discover thousands of new supernovae nightly, creating a need to flag astrophysically interesting events rapidly for followup study. Ideally, such an anomaly detection pipeline would be independent of our current knowledge and be sensitive to unexpected phenomena. Here we present an unsupervised method to search for anomalous time series in real time for transient, multivariate, and aperiodic signals. We use a RNN-based variational autoencoder to encode supernova time series and an isolation forest to search for anomalous events in the learned encoded space. We apply this method to a simulated dataset of 12,159 supernovae, successfully discovering anomalous supernovae and objects with catastrophically incorrect redshift measurements. This work is the first anomaly detection pipeline for supernovae which works with online datastreams.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
The CAMELS project: Cosmology and Astrophysics with MachinE Learning Simulations
Authors:
Francisco Villaescusa-Navarro,
Daniel Anglés-Alcázar,
Shy Genel,
David N. Spergel,
Rachel S. Somerville,
Romeel Dave,
Annalisa Pillepich,
Lars Hernquist,
Dylan Nelson,
Paul Torrey,
Desika Narayanan,
Yin Li,
Oliver Philcox,
Valentina La Torre,
Ana Maria Delgado,
Shirley Ho,
Sultan Hassan,
Blakesley Burkhart,
Digvijay Wadekar,
Nicholas Battaglia,
Gabriella Contardo,
Greg L. Bryan
Abstract:
We present the Cosmology and Astrophysics with MachinE Learning Simulations --CAMELS-- project. CAMELS is a suite of 4,233 cosmological simulations of $(25~h^{-1}{\rm Mpc})^3$ volume each: 2,184 state-of-the-art (magneto-)hydrodynamic simulations run with the AREPO and GIZMO codes, employing the same baryonic subgrid physics as the IllustrisTNG and SIMBA simulations, and 2,049 N-body simulations.…
▽ More
We present the Cosmology and Astrophysics with MachinE Learning Simulations --CAMELS-- project. CAMELS is a suite of 4,233 cosmological simulations of $(25~h^{-1}{\rm Mpc})^3$ volume each: 2,184 state-of-the-art (magneto-)hydrodynamic simulations run with the AREPO and GIZMO codes, employing the same baryonic subgrid physics as the IllustrisTNG and SIMBA simulations, and 2,049 N-body simulations. The goal of the CAMELS project is to provide theory predictions for different observables as a function of cosmology and astrophysics, and it is the largest suite of cosmological (magneto-)hydrodynamic simulations designed to train machine learning algorithms. CAMELS contains thousands of different cosmological and astrophysical models by way of varying $Ω_m$, $σ_8$, and four parameters controlling stellar and AGN feedback, following the evolution of more than 100 billion particles and fluid elements over a combined volume of $(400~h^{-1}{\rm Mpc})^3$. We describe the simulations in detail and characterize the large range of conditions represented in terms of the matter power spectrum, cosmic star formation rate density, galaxy stellar mass function, halo baryon fractions, and several galaxy scaling relations. We show that the IllustrisTNG and SIMBA suites produce roughly similar distributions of galaxy properties over the full parameter space but significantly different halo baryon fractions and baryonic effects on the matter power spectrum. This emphasizes the need for marginalizing over baryonic effects to extract the maximum amount of information from cosmological surveys. We illustrate the unique potential of CAMELS using several machine learning applications, including non-linear interpolation, parameter estimation, symbolic regression, data generation with Generative Adversarial Networks (GANs), dimensionality reduction, and anomaly detection.
△ Less
Submitted 15 August, 2021; v1 submitted 1 October, 2020;
originally announced October 2020.
-
Meta-Learning for One-Class Classification with Few Examples using Order-Equivariant Network
Authors:
Ademola Oladosu,
Tony Xu,
Philip Ekfeldt,
Brian A. Kelly,
Miles Cranmer,
Shirley Ho,
Adrian M. Price-Whelan,
Gabriella Contardo
Abstract:
This paper presents a meta-learning framework for few-shots One-Class Classification (OCC) at test-time, a setting where labeled examples are only available for the positive class, and no supervision is given for the negative example. We consider that we have a set of `one-class classification' objective-tasks with only a small set of positive examples available for each task, and a set of trainin…
▽ More
This paper presents a meta-learning framework for few-shots One-Class Classification (OCC) at test-time, a setting where labeled examples are only available for the positive class, and no supervision is given for the negative example. We consider that we have a set of `one-class classification' objective-tasks with only a small set of positive examples available for each task, and a set of training tasks with full supervision (i.e. highly imbalanced classification). We propose an approach using order-equivariant networks to learn a 'meta' binary-classifier. The model will take as input an example to classify from a given task, as well as the corresponding supervised set of positive examples for this OCC task. Thus, the output of the model will be 'conditioned' on the available positive example of a given task, allowing to predict on new tasks and new examples without labeled negative examples. In this paper, we are motivated by an astronomy application. Our goal is to identify if stars belong to a specific stellar group (the 'one-class' for a given task), called \textit{stellar streams}, where each stellar stream is a different OCC-task. We show that our method transfers well on unseen (test) synthetic streams, and outperforms the baselines even though it is not retrained and accesses a much smaller part of the data per task to predict (only positive supervision). We see however that it doesn't transfer as well on the real stream GD-1. This could come from intrinsic differences from the synthetic and real stream, highlighting the need for consistency in the 'nature' of the task for this method. However, light fine-tuning improve performances and outperform our baselines. Our experiments show encouraging results to further explore meta-learning methods for OCC tasks.
△ Less
Submitted 21 May, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Dalek -- a deep-learning emulator for TARDIS
Authors:
Wolfgang E. Kerzendorf,
Christian Vogl,
Johannes Buchner,
Gabriella Contardo,
Marc Williamson,
Patrick van der Smagt
Abstract:
Supernova spectral time series contain a wealth of information about the progenitor and explosion process of these energetic events. The modeling of these data requires the exploration of very high dimensional posterior probabilities with expensive radiative transfer codes. Even modest parametrizations of supernovae contain more than ten parameters and a detailed exploration demands at least sever…
▽ More
Supernova spectral time series contain a wealth of information about the progenitor and explosion process of these energetic events. The modeling of these data requires the exploration of very high dimensional posterior probabilities with expensive radiative transfer codes. Even modest parametrizations of supernovae contain more than ten parameters and a detailed exploration demands at least several million function evaluations. Physically realistic models require at least tens of CPU minutes per evaluation putting a detailed reconstruction of the explosion out of reach of traditional methodology. The advent of widely available libraries for the training of neural networks combined with their ability to approximate almost arbitrary functions with high precision allows for a new approach to this problem. Instead of evaluating the radiative transfer model itself, one can build a neural network proxy trained on the simulations but evaluating orders of magnitude faster. Such a framework is called an emulator or surrogate model. In this work, we present an emulator for the TARDIS supernova radiative transfer code applied to Type Ia supernova spectra. We show that we can train an emulator for this problem given a modest training set of a hundred thousand spectra (easily calculable on modern supercomputers). The results show an accuracy on the percent level (that are dominated by the Monte Carlo nature of TARDIS and not the emulator) with a speedup of several orders of magnitude. This method has a much broader set of applications and is not limited to the presented problem.
△ Less
Submitted 3 July, 2020;
originally announced July 2020.
-
Gravitational wave population inference with deep flow-based generative network
Authors:
Kaze W. K. Wong,
Gabriella Contardo,
Shirley Ho
Abstract:
We combine hierarchical Bayesian modeling with a flow-based deep generative network, in order to demonstrate that one can efficiently constraint numerical gravitational wave (GW) population models at a previously intractable complexity. Existing techniques for comparing data to simulation,such as discrete model selection and Gaussian process regression, can only be applied efficiently to moderate-…
▽ More
We combine hierarchical Bayesian modeling with a flow-based deep generative network, in order to demonstrate that one can efficiently constraint numerical gravitational wave (GW) population models at a previously intractable complexity. Existing techniques for comparing data to simulation,such as discrete model selection and Gaussian process regression, can only be applied efficiently to moderate-dimension data. This limits the number of observable (e.g. chirp mass, spins.) and hyper-parameters (e.g. common envelope efficiency) one can use in a population inference. In this study, we train a network to emulate a phenomenological model with 6 observables and 4 hyper-parameters, use it to infer the properties of a simulated catalogue and compare the results to the phenomenological model. We find that a 10-layer network can emulate the phenomenological model accurately and efficiently. Our machine enables simulation-based GW population inferences to take on data at a new complexity level.
△ Less
Submitted 4 July, 2020; v1 submitted 21 February, 2020;
originally announced February 2020.
-
From Dark Matter to Galaxies with Convolutional Neural Networks
Authors:
Jacky H. T. Yip,
Xinyue Zhang,
Yanfang Wang,
Wei Zhang,
Yueqiu Sun,
Gabriella Contardo,
Francisco Villaescusa-Navarro,
Siyu He,
Shy Genel,
Shirley Ho
Abstract:
Cosmological simulations play an important role in the interpretation of astronomical data, in particular in comparing observed data to our theoretical expectations. However, to compare data with these simulations, the simulations in principle need to include gravity, magneto-hydrodyanmics, radiative transfer, etc. These ideal large-volume simulations (gravo-magneto-hydrodynamical) are incredibly…
▽ More
Cosmological simulations play an important role in the interpretation of astronomical data, in particular in comparing observed data to our theoretical expectations. However, to compare data with these simulations, the simulations in principle need to include gravity, magneto-hydrodyanmics, radiative transfer, etc. These ideal large-volume simulations (gravo-magneto-hydrodynamical) are incredibly computationally expensive which can cost tens of millions of CPU hours to run. In this paper, we propose a deep learning approach to map from the dark-matter-only simulation (computationally cheaper) to the galaxy distribution (from the much costlier cosmological simulation). The main challenge of this task is the high sparsity in the target galaxy distribution: space is mainly empty. We propose a cascade architecture composed of a classification filter followed by a regression procedure. We show that our result outperforms a state-of-the-art model used in the astronomical community, and provides a good trade-off between computational cost and prediction accuracy.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
The Quijote simulations
Authors:
Francisco Villaescusa-Navarro,
ChangHoon Hahn,
Elena Massara,
Arka Banerjee,
Ana Maria Delgado,
Doogesh Kodi Ramanah,
Tom Charnock,
Elena Giusarma,
Yin Li,
Erwan Allys,
Antoine Brochard,
Cora Uhlemann,
Chi-Ting Chiang,
Siyu He,
Alice Pisani,
Andrej Obuljen,
Yu Feng,
Emanuele Castorina,
Gabriella Contardo,
Christina D. Kreisch,
Andrina Nicola,
Justin Alsing,
Roman Scoccimarro,
Licia Verde,
Matteo Viel
, et al. (4 additional authors not shown)
Abstract:
The Quijote simulations are a set of 44,100 full N-body simulations spanning more than 7,000 cosmological models in the $\{Ω_{\rm m}, Ω_{\rm b}, h, n_s, σ_8, M_ν, w \}$ hyperplane. At a single redshift the simulations contain more than 8.5 trillions of particles over a combined volume of 44,100 $(h^{-1}{\rm Gpc})^3$; each simulation follow the evolution of $256^3$, $512^3$ or $1024^3$ particles in…
▽ More
The Quijote simulations are a set of 44,100 full N-body simulations spanning more than 7,000 cosmological models in the $\{Ω_{\rm m}, Ω_{\rm b}, h, n_s, σ_8, M_ν, w \}$ hyperplane. At a single redshift the simulations contain more than 8.5 trillions of particles over a combined volume of 44,100 $(h^{-1}{\rm Gpc})^3$; each simulation follow the evolution of $256^3$, $512^3$ or $1024^3$ particles in a box of $1~h^{-1}{\rm Gpc}$ length. Billions of dark matter halos and cosmic voids have been identified in the simulations, whose runs required more than 35 million core hours. The Quijote simulations have been designed for two main purposes: 1) to quantify the information content on cosmological observables, and 2) to provide enough data to train machine learning algorithms. In this paper we describe the simulations and show a few of their applications. We also release the Petabyte of data generated, comprising hundreds of thousands of simulation snapshots at multiple redshifts, halo and void catalogs, together with millions of summary statistics such as power spectra, bispectra, correlation functions, marked power spectra, and estimated probability density functions.
△ Less
Submitted 15 August, 2021; v1 submitted 11 September, 2019;
originally announced September 2019.
-
From Dark Matter to Galaxies with Convolutional Networks
Authors:
Xinyue Zhang,
Yanfang Wang,
Wei Zhang,
Yueqiu Sun,
Siyu He,
Gabriella Contardo,
Francisco Villaescusa-Navarro,
Shirley Ho
Abstract:
Cosmological surveys aim at answering fundamental questions about our Universe, including the nature of dark matter or the reason of unexpected accelerated expansion of the Universe. In order to answer these questions, two important ingredients are needed: 1) data from observations and 2) a theoretical model that allows fast comparison between observation and theory. Most of the cosmological surve…
▽ More
Cosmological surveys aim at answering fundamental questions about our Universe, including the nature of dark matter or the reason of unexpected accelerated expansion of the Universe. In order to answer these questions, two important ingredients are needed: 1) data from observations and 2) a theoretical model that allows fast comparison between observation and theory. Most of the cosmological surveys observe galaxies, which are very difficult to model theoretically due to the complicated physics involved in their formation and evolution; modeling realistic galaxies over cosmological volumes requires running computationally expensive hydrodynamic simulations that can cost millions of CPU hours. In this paper, we propose to use deep learning to establish a map** between the 3D galaxy distribution in hydrodynamic simulations and its underlying dark matter distribution. One of the major challenges in this pursuit is the very high sparsity in the predicted galaxy distribution. To this end, we develop a two-phase convolutional neural network architecture to generate fast galaxy catalogues, and compare our results against a standard cosmological technique. We find that our proposed approach either outperforms or is competitive with traditional cosmological techniques. Compared to the common methods used in cosmology, our approach also provides a nice trade-off between time-consumption (comparable to fastest benchmark in the literature) and the quality and accuracy of the predicted simulation. In combination with current and upcoming data from cosmological observations, our method has the potential to answer fundamental questions about our Universe with the highest accuracy.
△ Less
Submitted 31 March, 2019; v1 submitted 15 February, 2019;
originally announced February 2019.
-
Constraints on the Progenitor Systems of Type Ia Supernovae
Authors:
Maximilian Stritzinger,
Bruno Leibundgut,
Stefanie Walch,
Gertrud Contardo
Abstract:
UVOIR bolometric light curves provide valuable insight into the nature of type Ia supernovae. We present an analysis of sixteen well-observed SNe Ia. Constraints are placed on several global parameters concerning the progenitor system, explosion mechanism and subsequent radiation transport. By fitting a radioactive decay energy deposition function to the quasi-exponential phase (50 to 100 days a…
▽ More
UVOIR bolometric light curves provide valuable insight into the nature of type Ia supernovae. We present an analysis of sixteen well-observed SNe Ia. Constraints are placed on several global parameters concerning the progenitor system, explosion mechanism and subsequent radiation transport. By fitting a radioactive decay energy deposition function to the quasi-exponential phase (50 to 100 days after maximum light), it is found that the ejected mass varies by at least a factor of two. This result suggests that a sub-Chandrasekhar mass model could be responsible for the progenitor system of some type Ia supernovae. We find that the range in the amount of synthesized (56)Ni indicates a significant variation in the burning mechanism. In order to explain a factor of ten range in the observed bolometric luminosity more detailed modeling of the explosion mechanism is required.
△ Less
Submitted 5 December, 2005; v1 submitted 17 June, 2005;
originally announced June 2005.
-
Epochs of Maximum Light and Bolometric Light Curves of Type Ia Supernovae
Authors:
G. Contardo,
B. Leibundgut,
W. D. Vacca
Abstract:
We present empirical fits to the UBVRI light curves of type Ia supernovae. These fits are used to objectively evaluate light curve parameters. We find that the relative times of maximum light in the filter passbands are very similar for most objects. Surprisingly the maximum at longer wavelengths is reached earlier than in the B and V light curves. This clearly demonstrates the complicated natur…
▽ More
We present empirical fits to the UBVRI light curves of type Ia supernovae. These fits are used to objectively evaluate light curve parameters. We find that the relative times of maximum light in the filter passbands are very similar for most objects. Surprisingly the maximum at longer wavelengths is reached earlier than in the B and V light curves. This clearly demonstrates the complicated nature of the supernova emission. Bolometric light curves for a small sample of well-observed SNe Ia are constructed by integration over the optical filters. In most objects a plateau or inflection is observed in the light curve about 20-40 days after bolometric maximum. The strength of this plateau varies considerably among the individual objects in the sample. Furthermore the rise times show a range of several days for the few objects which have observations early enough for such an analysis. On the other hand, the decline rate between 50 and 80 days past maximum is remarkably similar for all objects, with the notable exception of SN 1991bg. The similar late decline rates for the supernovae indicate that the energy release at late times is very uniform; the differences at early times is likely due to the radiation diffusing out of the ejecta. With the exception of SN 1991bg, the range of absolute bolometric luminosities of SNe Ia is found to be at least a factor of 2.5. The nickel masses derived from this estimate range from 0.4 to 1.1 Msun. It seems impossible to explain such a mass range by a single explosion mechanism, especially since the rate of gamma-ray escape at late phases seems to be very uniform.
△ Less
Submitted 25 May, 2000;
originally announced May 2000.
-
The High-Redshift Supernova Search -- Evidence for a Positive Cosmological Constant
Authors:
Bruno Leibundgut,
Gertrud Contardo,
Patrick Woudt,
Jason Spyromilio
Abstract:
A new component of the Universe which leads to an accelerated cosmic expansion is found from the measurements of distances to high-redshift type Ia supernovae. We describe the method and the results obtained from the observations of distant supernovae. The dependence on the understanding of the local type Ia supernovae is stressed. The lack of a good understanding of the stellar evolution leadin…
▽ More
A new component of the Universe which leads to an accelerated cosmic expansion is found from the measurements of distances to high-redshift type Ia supernovae. We describe the method and the results obtained from the observations of distant supernovae. The dependence on the understanding of the local type Ia supernovae is stressed. The lack of a good understanding of the stellar evolution leading to the explosion of the white dwarf, the exact explosion physics and the current difficulties in calculating the emission from the ejecta limit the theoretical support. Despite the current ignorance of some of the basic physics of the explosions, the cosmological result is robust. The empirical relations seem to hold for the distant supernovae the same way as for the local ones and the spectral appearance is identical. The distances to the high-redshift supernovae are larger than expected in a freely coasting, i.e. empty, Universe. A positive cosmological constant is inferred from these measurements.
△ Less
Submitted 2 December, 1998;
originally announced December 1998.
-
Photometric Evolution of Galaxies in Cosmological Scenarios
Authors:
Gertrud Contardo,
Matthias Steinmetz,
Uta Fritze-von Alvensleben
Abstract:
The photometric evolution of galaxies in a hierarchically clustering universe is investigated. The study is based on high resolution numerical simulations which include the effects of gas dynamics, shock heating, radiative cooling and a heuristic star formation scheme. The outcome of the simulations is convolved with photometric models which enables us to predict the appearance of galaxies in th…
▽ More
The photometric evolution of galaxies in a hierarchically clustering universe is investigated. The study is based on high resolution numerical simulations which include the effects of gas dynamics, shock heating, radiative cooling and a heuristic star formation scheme. The outcome of the simulations is convolved with photometric models which enables us to predict the appearance of galaxies in the broad band colors U, B, V, R, I and K. We demonstrate the effect of the mutual interplay of the hierarchical build-up of galaxies, photometric evolution, k-correction, and intervening absorption on the appearance of forming disk galaxies at redshift one to three. We also discuss to what extend the numerical resolution of current computer simulations is sufficient to make quantitative predictions on surface density profiles and color gradients.
△ Less
Submitted 28 January, 1998;
originally announced January 1998.