Search | arXiv e-print repository

Probabilistic Forward Modeling of Galaxy Catalogs with Normalizing Flows

Authors: John Franklin Crenshaw, J. Bryce Kalmbach, Alexander Gagliano, Ziang Yan, Andrew J. Connolly, Alex I. Malz, Samuel J. Schmidt, The LSST Dark Energy Science Collaboration

Abstract: Evaluating the accuracy and calibration of the redshift posteriors produced by photometric redshift (photo-z) estimators is vital for enabling precision cosmology and extragalactic astrophysics with modern wide-field photometric surveys. Evaluating photo-z posteriors on a per-galaxy basis is difficult, however, as real galaxies have a true redshift but not a true redshift posterior. We introduce P… ▽ More Evaluating the accuracy and calibration of the redshift posteriors produced by photometric redshift (photo-z) estimators is vital for enabling precision cosmology and extragalactic astrophysics with modern wide-field photometric surveys. Evaluating photo-z posteriors on a per-galaxy basis is difficult, however, as real galaxies have a true redshift but not a true redshift posterior. We introduce PZFlow, a Python package for the probabilistic forward modeling of galaxy catalogs with normalizing flows. For catalogs simulated with PZFlow, there is a natural notion of "true" redshift posteriors that can be used for photo-z validation. We use PZFlow to simulate a photometric galaxy catalog where each galaxy has a redshift, noisy photometry, shape information, and a true redshift posterior. We also demonstrate the use of an ensemble of normalizing flows for photo-z estimation. We discuss how PZFlow will be used to validate the photo-z estimation pipeline of the Dark Energy Science Collaboration (DESC), and the wider applicability of PZFlow for statistical modeling of any tabular data. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 19 pages, 13 figures, submitted to AJ

arXiv:2405.04522 [pdf, other]

Astrometric Redshifts of Supernovae

Authors: Jaemyoung Jason Lee, Masao Sako, Richard Kessler, Alex I. Malz, The LSST Dark Energy Science Collaboration

Abstract: Differential Chromatic Refraction (DCR) is caused by the wavelength dependence of our atmosphere's refractive index, which shifts the apparent positions of stars and galaxies and distorts their shapes depending on their spectral energy distributions (SEDs). While this effect is typically mitigated and corrected for in imaging observations, we investigate how DCR can instead be used to our advantag… ▽ More Differential Chromatic Refraction (DCR) is caused by the wavelength dependence of our atmosphere's refractive index, which shifts the apparent positions of stars and galaxies and distorts their shapes depending on their spectral energy distributions (SEDs). While this effect is typically mitigated and corrected for in imaging observations, we investigate how DCR can instead be used to our advantage to infer the redshifts of supernovae from multi-band, time-series imaging data. We simulate Type Ia supernovae (SNe Ia) in the proposed Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) Deep Drilling Field (DDF), and evaluate astrometric redshifts. We find that the redshift accuracy improves dramatically with the statistical quality of the astrometric measurements as well as with the accuracy of the astrometric solution. For a conservative choice of a 5-mas systematic uncertainty floor, we find that our redshift estimation is accurate at $z < 0.6$. We then combine our astrometric redshifts with both host galaxy photometric redshifts and supernovae photometric (light-curve) redshifts and show that this considerably improves the overall redshift estimates. These astrometric redshifts will be valuable especially since Rubin will discover a vast number of supernovae for which we will not be able to obtain spectroscopic redshifts. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 28 pages, 24 figures, submitted to The Astrophysical Journal

arXiv:2403.07975 [pdf, other]

Superphot+: Realtime Fitting and Classification of Supernova Light Curves

Authors: Kaylee M. de Soto, Ashley Villar, Edo Berger, Sebastian Gomez, Griffin Hosseinzadeh, Doug Branton, Sandro Campos, Melissa DeLucchi, Jeremy Kubica, Olivia Lynn, Konstantin Malanchev, Alex I. Malz

Abstract: Photometric classifications of supernova (SN) light curves have become necessary to utilize the full potential of large samples of observations obtained from wide-field photometric surveys, such as the Zwicky Transient Facility (ZTF) and the Vera C. Rubin Observatory. Here, we present a photometric classifier for SN light curves that does not rely on redshift information and still maintains compar… ▽ More Photometric classifications of supernova (SN) light curves have become necessary to utilize the full potential of large samples of observations obtained from wide-field photometric surveys, such as the Zwicky Transient Facility (ZTF) and the Vera C. Rubin Observatory. Here, we present a photometric classifier for SN light curves that does not rely on redshift information and still maintains comparable accuracy to redshift-dependent classifiers. Our new package, Superphot+, uses a parametric model to extract meaningful features from multiband SN light curves. We train a gradient-boosted machine with fit parameters from 6,061 ZTF SNe that pass data quality cuts and are spectroscopically classified as one of five classes: SN Ia, SN II, SN Ib/c, SN IIn, and SLSN-I. Without redshift information, our classifier yields a class-averaged F1-score of 0.61 +/- 0.02 and a total accuracy of 0.83 +/- 0.01. Including redshift information improves these metrics to 0.71 +/- 0.02 and 0.88 +/- 0.01, respectively. We assign new class probabilities to 3,558 ZTF transients that show SN-like characteristics (based on the ALeRCE Broker light curve and stamp classifiers), but lack spectroscopic classifications. Finally, we compare our predicted SN labels with those generated by the ALeRCE light curve classifier, finding that the two classifiers agree on photometric labels for 82 +/- 2% of light curves with spectroscopic labels and 72% of light curves without spectroscopic labels. Superphot+ is currently classifying ZTF SNe in real time via the ANTARES Broker, and is designed for simple adaptation to six-band Rubin light curves in the future. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 37 pages, 25 figures. Submitted to AAS Journals

arXiv:2402.15551 [pdf, other]

doi 10.3847/2041-8213/ad4039

Improving Photometric Redshift Estimates with Training Sample Augmentation

Authors: Irene Moskowitz, Eric Gawiser, John Franklin Crenshaw, Brett H. Andrews, Alex I. Malz, Samuel Schmidt, The LSST Dark Energy Science Collaboration

Abstract: Large imaging surveys will rely on photometric redshifts (photo-z's), which are typically estimated through machine learning methods. Currently planned spectroscopic surveys will not be deep enough to produce a representative training sample for LSST, so we seek methods to improve the photo-z estimates that arise from non-representative training samples. Spectroscopic training samples for photo-z'… ▽ More Large imaging surveys will rely on photometric redshifts (photo-z's), which are typically estimated through machine learning methods. Currently planned spectroscopic surveys will not be deep enough to produce a representative training sample for LSST, so we seek methods to improve the photo-z estimates that arise from non-representative training samples. Spectroscopic training samples for photo-z's are biased towards redder, brighter galaxies, which also tend to be at lower redshift than the typical galaxy observed by LSST, leading to poor photo-z estimates with outlier fractions nearly 4 times larger than for a representative training sample. In this paper, we apply the concept of training sample augmentation, where we augment simulated non-representative training samples with simulated galaxies possessing otherwise unrepresented features. When we select simulated galaxies with (g-z) color, i-band magnitude and redshift outside the range of the original training sample, we are able to reduce the outlier fraction of the photo-z estimates for simulated LSST data by nearly 50% and the normalized median absolute deviation (NMAD) by 56%. When compared to a fully representative training sample, augmentation can recover nearly 70% of the degradation in the outlier fraction and 80% of the degradation in NMAD. Training sample augmentation is a simple and effective way to improve training samples for photo-z's without requiring additional spectroscopic samples. △ Less

Submitted 14 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: 11 pages, 4 figures, published in ApJ Letters

Journal ref: ApJL 967 L6 (2024)

arXiv:2305.14421 [pdf, other]

Are classification metrics good proxies for SN Ia cosmological constraining power?

Authors: Alex I. Malz, Mi Dai, Kara A. Ponder, Emille E. O. Ishida, Santiago Gonzalez-Gaitain, Rupesh Durgesh, Alberto Krone-Martins, Rafael S. de Souza, Noble Kennamer, Sreevarsha Sreejith, Lluis Galbany, The LSST Dark Energy Science Collaboration, The Cosmostatistics Initiative

Abstract: Context: When selecting a classifier to use for a supernova Ia (SN Ia) cosmological analysis, it is common to make decisions based on metrics of classification performance, i.e. contamination within the photometrically classified SN Ia sample, rather than a measure of cosmological constraining power. If the former is an appropriate proxy for the latter, this practice would save those designing an… ▽ More Context: When selecting a classifier to use for a supernova Ia (SN Ia) cosmological analysis, it is common to make decisions based on metrics of classification performance, i.e. contamination within the photometrically classified SN Ia sample, rather than a measure of cosmological constraining power. If the former is an appropriate proxy for the latter, this practice would save those designing an analysis pipeline from the computational expense of a full cosmology forecast. Aims: This study tests the assumption that classification metrics are an appropriate proxy for cosmology metrics. Methods: We emulate photometric SN Ia cosmology samples with controlled contamination rates of individual contaminant classes and evaluate each of them under a set of classification metrics. We then derive cosmological parameter constraints from all samples under two common analysis approaches and quantify the impact of contamination by each contaminant class on the resulting cosmological parameter estimates. Results: We observe that cosmology metrics are sensitive to both the contamination rate and the class of the contaminating population, whereas the classification metrics are insensitive to the latter. Conclusions: We therefore discourage exclusive reliance on classification-based metrics for cosmological analysis design decisions, e.g. classifier choice, and instead recommend optimizing using a metric of cosmological parameter constraining power. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 9 pages, 6 figures; submitted to A&A

arXiv:2305.08894 [pdf, other]

First Impressions: Early-Time Classification of Supernovae using Host Galaxy Information and Shallow Learning

Authors: Alexander Gagliano, Gabriella Contardo, Daniel Foreman-Mackey, Alex I. Malz, Patrick D. Aleo

Abstract: Substantial effort has been devoted to the characterization of transient phenomena from photometric information. Automated approaches to this problem have taken advantage of complete phase-coverage of an event, limiting their use for triggering rapid follow-up of ongoing phenomena. In this work, we introduce a neural network with a single recurrent layer designed explicitly for early photometric c… ▽ More Substantial effort has been devoted to the characterization of transient phenomena from photometric information. Automated approaches to this problem have taken advantage of complete phase-coverage of an event, limiting their use for triggering rapid follow-up of ongoing phenomena. In this work, we introduce a neural network with a single recurrent layer designed explicitly for early photometric classification of supernovae. Our algorithm leverages transfer learning to account for model misspecification, host galaxy photometry to solve the data scarcity problem soon after discovery, and a custom weighted loss to prioritize accurate early classification. We first train our algorithm using state-of-the-art transient and host galaxy simulations, then adapt its weights and validate it on the spectroscopically-confirmed SNe Ia, SNe II, and SNe Ib/c from the Zwicky Transient Facility Bright Transient Survey. On observed data, our method achieves an overall accuracy of $82 \pm 2$% within 3 days of an event's discovery, and an accuracy of $87 \pm 5$% within 30 days of discovery. At both early and late phases, our method achieves comparable or superior results to the leading classification algorithms with a simpler network architecture. These results help pave the way for rapid photometric and spectroscopic follow-up of scientifically-valuable transients discovered in massive synoptic surveys. △ Less

Submitted 3 July, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 24 pages, 8 figures. Accepted to ApJ

arXiv:2208.02781 [pdf, other]

From Data to Software to Science with the Rubin Observatory LSST

Authors: Katelyn Breivik, Andrew J. Connolly, K. E. Saavik Ford, Mario Jurić, Rachel Mandelbaum, Adam A. Miller, Dara Norman, Knut Olsen, William O'Mullane, Adrian Price-Whelan, Timothy Sacco, J. L. Sokoloski, Ashley Villar, Viviana Acquaviva, Tomas Ahumada, Yusra AlSayyad, Catarina S. Alves, Igor Andreoni, Timo Anguita, Henry J. Best, Federica B. Bianco, Rosaria Bonito, Andrew Bradshaw, Colin J. Burke, Andresa Rodrigues de Campos , et al. (75 additional authors not shown)

Abstract: The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) dataset will dramatically alter our understanding of the Universe, from the origins of the Solar System to the nature of dark matter and dark energy. Much of this research will depend on the existence of robust, tested, and scalable algorithms, software, and services. Identifying and develo** such tools ahead of time has the po… ▽ More The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) dataset will dramatically alter our understanding of the Universe, from the origins of the Solar System to the nature of dark matter and dark energy. Much of this research will depend on the existence of robust, tested, and scalable algorithms, software, and services. Identifying and develo** such tools ahead of time has the potential to significantly accelerate the delivery of early science from LSST. Develo** these collaboratively, and making them broadly available, can enable more inclusive and equitable collaboration on LSST science. To facilitate such opportunities, a community workshop entitled "From Data to Software to Science with the Rubin Observatory LSST" was organized by the LSST Interdisciplinary Network for Collaboration and Computing (LINCC) and partners, and held at the Flatiron Institute in New York, March 28-30th 2022. The workshop included over 50 in-person attendees invited from over 300 applications. It identified seven key software areas of need: (i) scalable cross-matching and distributed joining of catalogs, (ii) robust photometric redshift determination, (iii) software for determination of selection functions, (iv) frameworks for scalable time-series analyses, (v) services for image access and reprocessing at scale, (vi) object image access (cutouts) and analysis at scale, and (vii) scalable job execution systems. This white paper summarizes the discussions of this workshop. It considers the motivating science use cases, identified cross-cutting algorithms, software, and services, their high-level technical specifications, and the principles of inclusive collaborations needed to develop them. We provide it as a useful roadmap of needs, as well as to spur action and collaboration between groups and individuals looking to develop reusable software for early LSST science. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: White paper from "From Data to Software to Science with the Rubin Observatory LSST" workshop

arXiv:2206.02815 [pdf, other]

doi 10.1093/mnras/stad302

The Simulated Catalogue of Optical Transients and Correlated Hosts (SCOTCH)

Authors: Martine Lokken, Alexander Gagliano, Gautham Narayan, Renée Hložek, Richard Kessler, John Franklin Crenshaw, Laura Salo, Catarina S. Alves, Deep Chatterjee, Maria Vincenzi, Alex I. Malz, The LSST Dark Energy Science Collaboration

Abstract: As we observe a rapidly growing number of astrophysical transients, we learn more about the diverse host galaxy environments in which they occur. Host galaxy information can be used to purify samples of cosmological Type Ia supernovae, uncover the progenitor systems of individual classes, and facilitate low-latency follow-up of rare and peculiar explosions. In this work, we develop a novel data-dr… ▽ More As we observe a rapidly growing number of astrophysical transients, we learn more about the diverse host galaxy environments in which they occur. Host galaxy information can be used to purify samples of cosmological Type Ia supernovae, uncover the progenitor systems of individual classes, and facilitate low-latency follow-up of rare and peculiar explosions. In this work, we develop a novel data-driven methodology to simulate the time-domain sky that includes detailed modeling of the probability density function for multiple transient classes conditioned on host galaxy magnitudes, colours, star formation rates, and masses. We have designed these simulations to optimize photometric classification and analysis in upcoming large synoptic surveys. We integrate host galaxy information into the SNANA simulation framework to construct the Simulated Catalogue of Optical Transients and Correlated Hosts (SCOTCH), a publicly-available catalogue of 5 million idealized transient light curves in LSST passbands and their host galaxy properties over the redshift range $0<z<3$. This catalogue includes supernovae, tidal disruption events, kilonovae, and active galactic nuclei. Each light curve consists of true top-of-the-galaxy magnitudes sampled with high ($\lesssim$2 day) cadence. In conjunction with SCOTCH, we also release an associated set of tutorials and the transient-specific libraries to enable simulations of arbitrary space- and ground-based surveys. Our methodology is being used to test critical science infrastructure in advance of surveys by the Vera C. Rubin Observatory and the Nancy G. Roman Space Telescope. △ Less

Submitted 27 February, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: 30 pages, 21 figures. Updated to reflect version published in MNRAS Vol. 520, Issue 2. This version improves the treatment of transient-host offsets and AGN host correlations. Associated SCOTCH files are available through Zenodo at https://doi.org/10.5281/zenodo.7563623

arXiv:2202.12775 [pdf, other]

doi 10.1088/1538-3873/ac59bf

The sensitivity of GPz estimates of photo-z posterior PDFs to realistically complex training set imperfections

Authors: Natalia Stylianou, Alex I. Malz, Peter Hatfield, John Franklin Crenshaw, Julia Gschwend

Abstract: The accurate estimation of photometric redshifts is crucial to many upcoming galaxy surveys, for example the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). Almost all Rubin extragalactic and cosmological science requires accurate and precise calculation of photometric redshifts; many diverse approaches to this problem are currently in the process of being developed, validated, a… ▽ More The accurate estimation of photometric redshifts is crucial to many upcoming galaxy surveys, for example the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). Almost all Rubin extragalactic and cosmological science requires accurate and precise calculation of photometric redshifts; many diverse approaches to this problem are currently in the process of being developed, validated, and tested. In this work, we use the photometric redshift code GPz to examine two realistically complex training set imperfections scenarios for machine learning based photometric redshift calculation: i) where the spectroscopic training set has a very different distribution in colour-magnitude space to the test set, and ii) where the effect of emission line confusion causes a fraction of the training spectroscopic sample to not have the true redshift. By evaluating the sensitivity of GPz to a range of increasingly severe imperfections, with a range of metrics (both of photo-z point estimates as well as posterior probability distribution functions, PDFs), we quantify the degree to which predictions get worse with higher degrees of degradation. In particular we find that there is a substantial drop-off in photo-z quality when line-confusion goes above ~1%, and sample incompleteness below a redshift of 1.5, for an experimental setup using data from the Buzzard Flock synthetic sky catalogues. △ Less

Submitted 25 February, 2022; originally announced February 2022.

Comments: 12 pages, 8 figures, accepted in PASP

arXiv:2110.15209 [pdf, other]

Re-calibrating Photometric Redshift Probability Distributions Using Feature-space Regression

Authors: Biprateep Dey, Jeffrey A. Newman, Brett H. Andrews, Rafael Izbicki, Ann B. Lee, David Zhao, Markus Michael Rau, Alex I. Malz

Abstract: Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF)… ▽ More Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift -- i.e., the fraction of times the true redshift falls between two limits $z_{1}$ and $z_{2}$ should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs. Though we focus on an example from astrophysics, our method can produce PDFs which are calibrated at all locations in feature space for any use case. △ Less

Submitted 27 January, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

Comments: Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021)

arXiv:2108.13418 [pdf, other]

doi 10.21105/astro.2108.13418

The LSST-DESC 3x2pt Tomography Optimization Challenge

Authors: Joe Zuntz, François Lanusse, Alex I. Malz, Angus H. Wright, Anže Slosar, Bela Abolfathi, David Alonso, Abby Bault, Clécio R. Bom, Massimo Brescia, Adam Broussard, Jean-Eric Campagne, Stefano Cavuoti, Eduardo S. Cypriano, Bernardo M. O. Fraga, Eric Gawiser, Elizabeth J. Gonzalez, Dylan Green, Peter Hatfield, Kartheik Iyer, David Kirkby, Andrina Nicola, Erfan Nourbakhsh, Andy Park, Gabriel Teixeira , et al. (3 additional authors not shown)

Abstract: This paper presents the results of the Rubin Observatory Dark Energy Science Collaboration (DESC) 3x2pt tomography challenge, which served as a first step toward optimizing the tomographic binning strategy for the main DESC analysis. The task of choosing an optimal tomographic binning scheme for a photometric survey is made particularly delicate in the context of a metacalibrated lensing catalogue… ▽ More This paper presents the results of the Rubin Observatory Dark Energy Science Collaboration (DESC) 3x2pt tomography challenge, which served as a first step toward optimizing the tomographic binning strategy for the main DESC analysis. The task of choosing an optimal tomographic binning scheme for a photometric survey is made particularly delicate in the context of a metacalibrated lensing catalogue, as only the photometry from the bands included in the metacalibration process (usually riz and potentially g) can be used in sample definition. The goal of the challenge was to collect and compare bin assignment strategies under various metrics of a standard 3x2pt cosmology analysis in a highly idealized setting to establish a baseline for realistically complex follow-up studies; in this preliminary study, we used two sets of cosmological simulations of galaxy redshifts and photometry under a simple noise model neglecting photometric outliers and variation in observing conditions, and contributed algorithms were provided with a representative and complete training set. We review and evaluate the entries to the challenge, finding that even from this limited photometry information, multiple algorithms can separate tomographic bins reasonably well, reaching figures-of-merit scores close to the attainable maximum. We further find that adding the g band to riz photometry improves metric performance by ~15% and that the optimal bin assignment strategy depends strongly on the science case: which figure-of-merit is to be optimized, and which observables (clustering, lensing, or both) are included. △ Less

Submitted 15 October, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: 30 pages (incl. 12 in appendix), 12 figures. Version accepted for publication in the Open Journal of Astrophysics

Report number: DESC-PUB-00054

arXiv:2107.10857 [pdf, other]

doi 10.1093/mnras/stab2764

CLMM: a LSST-DESC Cluster weak Lensing Mass Modeling library for cosmology

Authors: M. Aguena, C. Avestruz, C. Combet, S. Fu, R. Herbonnet, A. I. Malz, M. Penna-Lima, M. Ricci, S. D. P. Vitenti, L. Baumont, H. Fan, M. Fong, M. Ho, M. Kirby, C. Payerne, D. Boutigny, B. Lee, B. Liu, T. McClintock, H. Miyatake, C. Sifón, A. von der Linden, H. Wu, M. Yoon, The LSST Dark Energy Science Collaboration

Abstract: We present the v1.0 release of CLMM, an open source Python library for the estimation of the weak lensing masses of clusters of galaxies. CLMM is designed as a standalone toolkit of building blocks to enable end-to-end analysis pipeline validation for upcoming cluster cosmology analyses such as the ones that will be performed by the LSST-DESC. Its purpose is to serve as a flexible, easy-to-install… ▽ More We present the v1.0 release of CLMM, an open source Python library for the estimation of the weak lensing masses of clusters of galaxies. CLMM is designed as a standalone toolkit of building blocks to enable end-to-end analysis pipeline validation for upcoming cluster cosmology analyses such as the ones that will be performed by the LSST-DESC. Its purpose is to serve as a flexible, easy-to-install and easy-to-use interface for both weak lensing simulators and observers and can be applied to real and mock data to study the systematics affecting weak lensing mass reconstruction. At the core of CLMM are routines to model the weak lensing shear signal given the underlying mass distribution of galaxy clusters and a set of data operations to prepare the corresponding data vectors. The theoretical predictions rely on existing software, used as backends in the code, that have been thoroughly tested and cross-checked. Combined, theoretical predictions and data can be used to constrain the mass distribution of galaxy clusters as demonstrated in a suite of example Jupyter Notebooks shipped with the software and also available in the extensive online documentation. △ Less

Submitted 5 October, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

Comments: 21 pages, 6 figures, accepted for publication by MNRAS

arXiv:2104.08229 [pdf, other]

An information-based metric for observing strategy optimization, demonstrated in the context of photometric redshifts with applications to cosmology

Authors: Alex I. Malz, François Lanusse, John Franklin Crenshaw, Melissa L. Graham

Abstract: The observing strategy of a galaxy survey influences the degree to which its resulting data can be used to accomplish any science goal. LSST is thus seeking metrics of observing strategies for multiple science cases in order to optimally choose a cadence. Photometric redshifts are essential for many extragalactic science applications of LSST's data, including but not limited to cosmology, but ther… ▽ More The observing strategy of a galaxy survey influences the degree to which its resulting data can be used to accomplish any science goal. LSST is thus seeking metrics of observing strategies for multiple science cases in order to optimally choose a cadence. Photometric redshifts are essential for many extragalactic science applications of LSST's data, including but not limited to cosmology, but there are few metrics available, and they are not straightforwardly integrated with metrics of other cadence-dependent quantities that may influence any given use case. We propose a metric for observing strategy optimization based on the potentially recoverable mutual information about redshift from a photometric sample under the constraints of a realistic observing strategy. We demonstrate a tractable estimation of a variational lower bound of this mutual information implemented in a public code using conditional normalizing flows. By comparing the recoverable redshift information across observing strategies, we can distinguish between those that preclude robust redshift constraints and those whose data will preserve more redshift information, to be generically utilized in a downstream analysis. We recommend the use of this versatile metric to observing strategy optimization for redshift-dependent extragalactic use cases, including but not limited to cosmology, as well as any other science applications for which photometry may be modeled from true parameter values beyond redshift. △ Less

Submitted 16 April, 2021; originally announced April 2021.

Comments: 8 pages, 5 figures, to be submitted to MNRAS

arXiv:2101.04675 [pdf, ps, other]

doi 10.1103/PhysRevD.103.083502

How not to obtain the redshift distribution from probabilistic redshift estimates: Under what conditions is it not inappropriate to estimate the redshift distribution N(z) by stacking photo-z PDFs?

Authors: Alex I. Malz

Abstract: The scientific impact of current and upcoming photometric galaxy surveys is contingent on our ability to obtain redshift estimates for large numbers of faint galaxies. In the absence of spectroscopically confirmed redshifts, broad-band photometric redshift point estimates (photo-$z$s) have been superseded by photo-$z$ probability density functions (PDFs) that encapsulate their nontrivial uncertain… ▽ More The scientific impact of current and upcoming photometric galaxy surveys is contingent on our ability to obtain redshift estimates for large numbers of faint galaxies. In the absence of spectroscopically confirmed redshifts, broad-band photometric redshift point estimates (photo-$z$s) have been superseded by photo-$z$ probability density functions (PDFs) that encapsulate their nontrivial uncertainties. Initial applications of photo-$z$ PDFs in weak gravitational lensing studies of cosmology have obtained the redshift distribution function $\mathcal{N}(z)$ by employing computationally straightforward stacking methodologies that violate the laws of probability. In response, mathematically self-consistent models of varying complexity have been proposed in an effort to answer the question, "What is the right way to obtain the redshift distribution function $\mathcal{N}(z)$ from a catalog of photo-$z$ PDFs?" This letter aims to motivate adoption of such principled methods by addressing the contrapositive of the more common presentation of such models, answering the question, "Under what conditions do traditional stacking methods successfully recover the true redshift distribution function $\mathcal{N}(z)$?" By placing stacking in a rigorous mathematical environment, we identify two such conditions: those of perfectly informative data and perfectly informative prior information. Stacking has maintained its foothold in the astronomical community for so long because the conditions in question were only weakly violated in the past. These conditions, however, will be strongly violated by future galaxy surveys. We therefore conclude that stacking must be abandoned in favor of mathematically supported methods in order to advance observational cosmology. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: accepted to Phys Rev D

Journal ref: Phys. Rev. D 103, 083502 (2021)

arXiv:2012.12392 [pdf, other]

Results of the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC)

Authors: R. Hložek, K. A. Ponder, A. I. Malz, M. Dai, G. Narayan, E. E. O. Ishida, T. Allam Jr, A. Bahmanyar, R. Biswas, L. Galbany, S. W. Jha, D. O. Jones, R. Kessler, M. Lochner, A. A. Mahabal, K. S. Mandel, J. R. Martínez-Galarza, J. D. McEwen, D. Muthukrishna, H. V. Peiris, C. M. Peters, C. N. Setzer

Abstract: Next-generation surveys like the Legacy Survey of Space and Time (LSST) on the Vera C. Rubin Observatory will generate orders of magnitude more discoveries of transients and variable stars than previous surveys. To prepare for this data deluge, we developed the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC), a competition which aimed to catalyze the development of ro… ▽ More Next-generation surveys like the Legacy Survey of Space and Time (LSST) on the Vera C. Rubin Observatory will generate orders of magnitude more discoveries of transients and variable stars than previous surveys. To prepare for this data deluge, we developed the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC), a competition which aimed to catalyze the development of robust classifiers under LSST-like conditions of a non-representative training set for a large photometric test set of imbalanced classes. Over 1,000 teams participated in PLAsTiCC, which was hosted in the Kaggle data science competition platform between Sep 28, 2018 and Dec 17, 2018, ultimately identifying three winners in February 2019. Participants produced classifiers employing a diverse set of machine learning techniques including hybrid combinations and ensemble averages of a range of approaches, among them boosted decision trees, neural networks, and multi-layer perceptrons. The strong performance of the top three classifiers on Type Ia supernovae and kilonovae represent a major improvement over the current state-of-the-art within astronomy. This paper summarizes the most promising methods and evaluates their results in detail, highlighting future directions both for classifier development and simulation needs for a next generation PLAsTiCC data set. △ Less

Submitted 22 December, 2020; originally announced December 2020.

Comments: 20 pages, 14 figures

arXiv:2010.05941 [pdf, other]

Active learning with RESSPECT: Resource allocation for extragalactic astronomical transients

Authors: Noble Kennamer, Emille E. O. Ishida, Santiago Gonzalez-Gaitan, Rafael S. de Souza, Alexander Ihler, Kara Ponder, Ricardo Vilalta, Anais Moller, David O. Jones, Mi Dai, Alberto Krone-Martins, Bruno Quint, Sreevarsha Sreejith, Alex I. Malz, Lluis Galbany

Abstract: The recent increase in volume and complexity of available astronomical data has led to a wide use of supervised machine learning techniques. Active learning strategies have been proposed as an alternative to optimize the distribution of scarce labeling resources. However, due to the specific conditions in which labels can be acquired, fundamental assumptions, such as sample representativeness and… ▽ More The recent increase in volume and complexity of available astronomical data has led to a wide use of supervised machine learning techniques. Active learning strategies have been proposed as an alternative to optimize the distribution of scarce labeling resources. However, due to the specific conditions in which labels can be acquired, fundamental assumptions, such as sample representativeness and labeling cost stability cannot be fulfilled. The Recommendation System for Spectroscopic follow-up (RESSPECT) project aims to enable the construction of optimized training samples for the Rubin Observatory Legacy Survey of Space and Time (LSST), taking into account a realistic description of the astronomical data environment. In this work, we test the robustness of active learning techniques in a realistic simulated astronomical data scenario. Our experiment takes into account the evolution of training and pool samples, different costs per object, and two different sources of budget. Results show that traditional active learning strategies significantly outperform random sampling. Nevertheless, more complex batch strategies are not able to significantly overcome simple uncertainty sampling techniques. Our findings illustrate three important points: 1) active learning strategies are a powerful tool to optimize the label-acquisition task in astronomy, 2) for upcoming large surveys like LSST, such techniques allow us to tailor the construction of the training sample for the first day of the survey, and 3) the peculiar data environment related to the detection of astronomical transients is a fertile ground that calls for the development of tailored machine learning algorithms. △ Less

Submitted 26 October, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: Accepted to the 2020 IEEE Symposium Series on Computational Intelligence

arXiv:2007.12178 [pdf, other]

How to obtain the redshift distribution from probabilistic redshift estimates

Authors: Alex I. Malz, David W. Hogg

Abstract: A trustworthy estimate of the redshift distribution $n(z)$ is crucial for using weak gravitational lensing and large-scale structure of galaxy catalogs to study cosmology. Spectroscopic redshifts for the dim and numerous galaxies of next-generation weak-lensing surveys are expected to be unavailable, making photometric redshift (photo-$z$) probability density functions (PDFs) the next-best alterna… ▽ More A trustworthy estimate of the redshift distribution $n(z)$ is crucial for using weak gravitational lensing and large-scale structure of galaxy catalogs to study cosmology. Spectroscopic redshifts for the dim and numerous galaxies of next-generation weak-lensing surveys are expected to be unavailable, making photometric redshift (photo-$z$) probability density functions (PDFs) the next-best alternative for comprehensively encapsulating the nontrivial systematics affecting photo-$z$ point estimation. The established stacked estimator of $n(z)$ avoids reducing photo-$z$ PDFs to point estimates but yields a systematically biased estimate of $n(z)$ that worsens with decreasing signal-to-noise, the very regime where photo-$z$ PDFs are most necessary. We introduce Cosmological Hierarchical Inference with Probabilistic Photometric Redshifts (CHIPPR), a statistically rigorous probabilistic graphical model of redshift-dependent photometry, which correctly propagates the redshift uncertainty information beyond the best-fit estimator of $n(z)$ produced by traditional procedures and is provably the only self-consistent way to recover $n(z)$ from photo-$z$ PDFs. We present the $\texttt{chippr}$ prototype code, noting that the mathematically justifiable approach incurs computational expense. The CHIPPR approach is applicable to any one-point statistic of any random variable, provided the prior probability density used to produce the posteriors is explicitly known; if the prior is implicit, as may be the case for popular photo-$z$ techniques, then the resulting posterior PDFs cannot be used for scientific inference. We therefore recommend that the photo-$z$ community focus on develo** methodologies that enable the recovery of photo-$z$ likelihoods with support over all redshifts, either directly or via a known prior probability density. △ Less

Submitted 23 July, 2020; originally announced July 2020.

Comments: submitted to ApJ

arXiv:2005.08583 [pdf, ps, other]

doi 10.1093/mnras/staa3204

Ridges in the Dark Energy Survey for cosmic trough identification

Authors: Ben Moews, Morgan A. Schmitz, Andrew J. Lawler, Joe Zuntz, Alex I. Malz, Rafael S. de Souza, Ricardo Vilalta, Alberto Krone-Martins, Emille E. O. Ishida

Abstract: Cosmic voids and their corresponding redshift-projected mass densities, known as troughs, play an important role in our attempt to model the large-scale structure of the Universe. Understanding these structures enables us to compare the standard model with alternative cosmologies, constrain the dark energy equation of state, and distinguish between different gravitational theories. In this paper,… ▽ More Cosmic voids and their corresponding redshift-projected mass densities, known as troughs, play an important role in our attempt to model the large-scale structure of the Universe. Understanding these structures enables us to compare the standard model with alternative cosmologies, constrain the dark energy equation of state, and distinguish between different gravitational theories. In this paper, we extend the subspace-constrained mean shift algorithm, a recently introduced method to estimate density ridges, and apply it to 2D weak lensing mass density maps from the Dark Energy Survey Y1 data release to identify curvilinear filamentary structures. We compare the obtained ridges with previous approaches to extract trough structure in the same data, and apply curvelets as an alternative wavelet-based method to constrain densities. We then invoke the Wasserstein distance between noisy and noiseless simulations to validate the denoising capabilities of our method. Our results demonstrate the viability of ridge estimation as a precursor for denoising weak lensing observables to recover the large-scale structure, paving the way for a more versatile and effective search for troughs. △ Less

Submitted 14 November, 2022; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: 12 pages, 5 figures, accepted for publication in MNRAS

MSC Class: 85A40; 62G07; 62P35; 85A35

arXiv:2001.03621 [pdf, other]

doi 10.1093/mnras/staa2799

Evaluation of probabilistic photometric redshift estimation approaches for The Rubin Observatory Legacy Survey of Space and Time (LSST)

Authors: S. J. Schmidt, A. I. Malz, J. Y. H. Soo, I. A. Almosallam, M. Brescia, S. Cavuoti, J. Cohen-Tanugi, A. J. Connolly, J. DeRose, P. E. Freeman, M. L. Graham, K. G. Iyer, M. J. Jarvis, J. B. Kalmbach, E. Kovacs, A. B. Lee, G. Longo, C. B. Morrison, J. A. Newman, E. Nourbakhsh, E. Nuss, T. Pospisil, H. Tranin, R. H. Wechsler, R. Zhou , et al. (2 additional authors not shown)

Abstract: Many scientific investigations of photometric galaxy surveys require redshift estimates, whose uncertainty properties are best encapsulated by photometric redshift (photo-z) posterior probability density functions (PDFs). A plethora of photo-z PDF estimation methodologies abound, producing discrepant results with no consensus on a preferred approach. We present the results of a comprehensive exper… ▽ More Many scientific investigations of photometric galaxy surveys require redshift estimates, whose uncertainty properties are best encapsulated by photometric redshift (photo-z) posterior probability density functions (PDFs). A plethora of photo-z PDF estimation methodologies abound, producing discrepant results with no consensus on a preferred approach. We present the results of a comprehensive experiment comparing twelve photo-z algorithms applied to mock data produced for The Rubin Observatory Legacy Survey of Space and Time (LSST) Dark Energy Science Collaboration (DESC). By supplying perfect prior information, in the form of the complete template library and a representative training set as inputs to each code, we demonstrate the impact of the assumptions underlying each technique on the output photo-z PDFs. In the absence of a notion of true, unbiased photo-z PDFs, we evaluate and interpret multiple metrics of the ensemble properties of the derived photo-z PDFs as well as traditional reductions to photo-z point estimates. We report systematic biases and overall over/under-breadth of the photo-z PDFs of many popular codes, which may indicate avenues for improvement in the algorithms or implementations. Furthermore, we raise attention to the limitations of established metrics for assessing photo-z PDF accuracy; though we identify the conditional density estimate (CDE) loss as a promising metric of photo-z PDF performance in the case where true redshifts are available but true photo-z PDFs are not, we emphasize the need for science-specific performancemetrics. △ Less

Submitted 31 July, 2021; v1 submitted 10 January, 2020; originally announced January 2020.

Journal ref: MNRAS 499 2 1587 (2020)

arXiv:1908.11523 [pdf, other]

doi 10.1016/j.ascom.2019.100362

Conditional Density Estimation Tools in Python and R with Applications to Photometric Redshifts and Likelihood-Free Cosmological Inference

Authors: Niccolò Dalmasso, Taylor Pospisil, Ann B. Lee, Rafael Izbicki, Peter E. Freeman, Alex I. Malz

Abstract: It is well known in astronomy that propagating non-Gaussian prediction uncertainty in photometric redshift estimates is key to reducing bias in downstream cosmological analyses. Similarly, likelihood-free inference approaches, which are beginning to emerge as a tool for cosmological analysis, require a characterization of the full uncertainty landscape of the parameters of interest given observed… ▽ More It is well known in astronomy that propagating non-Gaussian prediction uncertainty in photometric redshift estimates is key to reducing bias in downstream cosmological analyses. Similarly, likelihood-free inference approaches, which are beginning to emerge as a tool for cosmological analysis, require a characterization of the full uncertainty landscape of the parameters of interest given observed data. However, most machine learning (ML) or training-based methods with open-source software target point prediction or classification, and hence fall short in quantifying uncertainty in complex regression and parameter inference settings. As an alternative to methods that focus on predicting the response (or parameters) $\mathbf{y}$ from features $\mathbf{x}$, we provide nonparametric conditional density estimation (CDE) tools for approximating and validating the entire probability density function (PDF) $\mathrm{p}(\mathbf{y}|\mathbf{x})$ of $\mathbf{y}$ given (i.e., conditional on) $\mathbf{x}$. As there is no one-size-fits-all CDE method, the goal of this work is to provide a comprehensive range of statistical tools and open-source software for nonparametric CDE and method assessment which can accommodate different types of settings and be easily fit to the problem at hand. Specifically, we introduce four CDE software packages in $\texttt{Python}$ and $\texttt{R}$ based on ML prediction methods adapted and optimized for CDE: $\texttt{NNKCDE}$, $\texttt{RFCDE}$, $\texttt{FlexCode}$, and $\texttt{DeepCDE}$. Furthermore, we present the $\texttt{cdetools}$ package, which includes functions for computing a CDE loss function for tuning and assessing the quality of individual PDFs, along with diagnostic functions. We provide sample code in $\texttt{Python}$ and $\texttt{R}$ as well as examples of applications to photometric redshift estimation and likelihood-free cosmological inference via CDE. △ Less

Submitted 20 December, 2019; v1 submitted 29 August, 2019; originally announced August 2019.

Comments: 27 pages, 7 figures, 4 tables

arXiv:1812.09786 [pdf, ps, other]

doi 10.1103/PhysRevD.99.123529

Stress testing the dark energy equation of state imprint on supernova data

Authors: Ben Moews, Rafael S. de Souza, Emille E. O. Ishida, Alex I. Malz, Caroline Heneka, Ricardo Vilalta, Joe Zuntz

Abstract: This work determines the degree to which a standard Lambda-CDM analysis based on type Ia supernovae can identify deviations from a cosmological constant in the form of a redshift-dependent dark energy equation of state w(z). We introduce and apply a novel random curve generator to simulate instances of w(z) from constraint families with increasing distinction from a cosmological constant. After pr… ▽ More This work determines the degree to which a standard Lambda-CDM analysis based on type Ia supernovae can identify deviations from a cosmological constant in the form of a redshift-dependent dark energy equation of state w(z). We introduce and apply a novel random curve generator to simulate instances of w(z) from constraint families with increasing distinction from a cosmological constant. After producing a series of mock catalogs of binned type Ia supernovae corresponding to each w(z) curve, we perform a standard Lambda-CDM analysis to estimate the corresponding posterior densities of the absolute magnitude of type Ia supernovae, the present-day matter density, and the equation of state parameter. Using the Kullback-Leibler divergence between posterior densities as a difference measure, we demonstrate that a standard type Ia supernova cosmology analysis has limited sensitivity to extensive redshift dependencies of the dark energy equation of state. In addition, we report that larger redshift-dependent departures from a cosmological constant do not necessarily manifest easier-detectable incompatibilities with the Lambda-CDM model. Our results suggest that physics beyond the standard model may simply be hidden in plain sight. △ Less

Submitted 5 July, 2019; v1 submitted 23 December, 2018; originally announced December 2018.

Comments: 14 pages, 9 figures

MSC Class: 85A40; 62P35; 68W20

Journal ref: Phys. Rev. D 99, 123529 (2019)

arXiv:1810.05494 [pdf, other]

doi 10.1051/0004-6361/201834453

Gaia DR2 unravels incompleteness of nearby cluster population: New open clusters in the direction of Perseus

Authors: T. Cantat-Gaudin, A. Krone-Martins, N. Sedaghat, A. Farahi, R. S. de Souza, R. Skalidis, A. I. Malz, S. Macêdo, B. Moews, C. Jordi, A. Moitinho, A. Castro-Ginard, E. E. O. Ishida, C. Heneka, A. Boucaud, A. M. M. Trindade

Abstract: Open clusters (OCs) are popular tracers of the structure and evolutionary history of the Galactic disk. The OC population is often considered to be complete within 1.8 kpc of the Sun. The recent Gaia Data Release 2 (DR2) allows the latter claim to be challenged. We perform a systematic search for new OCs in the direction of Perseus using precise and accurate astrometry from Gaia DR2. We implement… ▽ More Open clusters (OCs) are popular tracers of the structure and evolutionary history of the Galactic disk. The OC population is often considered to be complete within 1.8 kpc of the Sun. The recent Gaia Data Release 2 (DR2) allows the latter claim to be challenged. We perform a systematic search for new OCs in the direction of Perseus using precise and accurate astrometry from Gaia DR2. We implement a coarse-to-fine search method. First, we exploit spatial proximity using a fast density-aware partitioning of the sky via a k-d tree in the spatial domain of Galactic coordinates, (l, b). Secondly, we employ a Gaussian mixture model in the proper motion space to quickly tag fields around OC candidates. Thirdly, we apply an unsupervised membership assignment method, UPMASK, to scrutinise the candidates. We visually inspect colour-magnitude diagrams to validate the detected objects. Finally, we perform a diagnostic to quantify the significance of each identified overdensity in proper motion and in parallax space We report the discovery of 41 new stellar clusters. This represents an increment of at least 20% of the previously known OC population in this volume of the Milky Way. We also report on the clear identification of NGC 886, an object previously considered an asterism. This letter challenges the previous claim of a near-complete sample of open clusters up to 1.8 kpc. Our results reveal that this claim requires revision, and a complete census of nearby open clusters is yet to be found. △ Less

Submitted 21 March, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

Comments: accepted for publication in A&A

Journal ref: A&A 624, A126 (2019)

arXiv:1810.00001 [pdf, other]

The Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC): Data set

Authors: The PLAsTiCC team, Tarek Allam Jr., Anita Bahmanyar, Rahul Biswas, Mi Dai, Lluís Galbany, Renée Hložek, Emille E. O. Ishida, Saurabh W. Jha, David O. Jones, Richard Kessler, Michelle Lochner, Ashish A. Mahabal, Alex I. Malz, Kaisey S. Mandel, Juan Rafael Martínez-Galarza, Jason D. McEwen, Daniel Muthukrishna, Gautham Narayan, Hiranya Peiris, Christina M. Peters, Kara Ponder, Christian N. Setzer, The LSST Dark Energy Science Collaboration, The LSST Transients , et al. (1 additional authors not shown)

Abstract: The Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC) is an open data challenge to classify simulated astronomical time-series data in preparation for observations from the Large Synoptic Survey Telescope (LSST), which will achieve first light in 2019 and commence its 10-year main survey in 2022. LSST will revolutionize our understanding of the changing sky, discovering… ▽ More The Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC) is an open data challenge to classify simulated astronomical time-series data in preparation for observations from the Large Synoptic Survey Telescope (LSST), which will achieve first light in 2019 and commence its 10-year main survey in 2022. LSST will revolutionize our understanding of the changing sky, discovering and measuring millions of time-varying objects. In this challenge, we pose the question: how well can we classify objects in the sky that vary in brightness from simulated LSST time-series data, with all its challenges of non-representativity? In this note we explain the need for a data challenge to help classify such astronomical sources and describe the PLAsTiCC data set and Kaggle data challenge, noting that while the references are provided for context, they are not needed to participate in the challenge. △ Less

Submitted 28 September, 2018; originally announced October 2018.

Comments: Research note to accompany the https://www.kaggle.com/c/PLAsTiCC-2018 challenge

arXiv:1809.11145 [pdf, other]

doi 10.3847/1538-3881/ab3a2f

The Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC): Selection of a performance metric for classification probabilities balancing diverse science goals

Authors: A. I. Malz, R. Hložek, T. Allam Jr, A. Bahmanyar, R. Biswas, M. Dai, L. Galbany, E. E. O. Ishida, S. W. Jha, D. O. Jones, R. Kessler, M. Lochner, A. A. Mahabal, K. S. Mandel, J. R. Martínez-Galarza, J. D. McEwen, D. Muthukrishna, G. Narayan, H. Peiris, C. M. Peters, K. A. Ponder, C. N. Setzer, The LSST Dark Energy Science Collaboration, The LSST Transients, Variable Stars Science Collaboration

Abstract: Classification of transient and variable light curves is an essential step in using astronomical observations to develop an understanding of their underlying physical processes. However, upcoming deep photometric surveys, including the Large Synoptic Survey Telescope (LSST), will produce a deluge of low signal-to-noise data for which traditional labeling procedures are inappropriate. Probabilistic… ▽ More Classification of transient and variable light curves is an essential step in using astronomical observations to develop an understanding of their underlying physical processes. However, upcoming deep photometric surveys, including the Large Synoptic Survey Telescope (LSST), will produce a deluge of low signal-to-noise data for which traditional labeling procedures are inappropriate. Probabilistic classification is more appropriate for the data but are incompatible with the traditional metrics used on deterministic classifications. Furthermore, large survey collaborations intend to use these classification probabilities for diverse science objectives, indicating a need for a metric that balances a variety of goals. We describe the process used to develop an optimal performance metric for an open classification challenge that seeks probabilistic classifications and must serve many scientific interests. The Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC) is an open competition aiming to identify promising techniques for obtaining classification probabilities of transient and variable objects by engaging a broader community both within and outside astronomy. Using mock classification probability submissions emulating archetypes of those anticipated of PLAsTiCC, we compare the sensitivity of metrics of classification probabilities under various weighting schemes, finding that they yield qualitatively consistent results. We choose as a metric for PLAsTiCC a weighted modification of the cross-entropy because it can be meaningfully interpreted. Finally, we propose extensions of our methodology to ever more complex challenge goals and suggest some guiding principles for approaching the choice of a metric of probabilistic classifications. △ Less

Submitted 31 July, 2021; v1 submitted 28 September, 2018; originally announced September 2018.

Journal ref: AJ 158 5 171 (2019)

arXiv:1806.00014 [pdf, other]

doi 10.3847/1538-3881/aac6b5

Approximating photo-$z$ PDFs for large surveys

Authors: A. I. Malz, P. J. Marshall, S. J. Schmidt, M. L. Graham, J. DeRose, R. Wechsler

Abstract: Modern galaxy surveys produce redshift probability density functions (PDFs) in addition to traditional photometric redshift (photo-$z$) point estimates. However, the storage of photo-$z$ PDFs may present a challenge with increasingly large catalogs, as we face a trade-off between the accuracy of subsequent science measurements and the limitation of finite storage resources. This paper presents… ▽ More Modern galaxy surveys produce redshift probability density functions (PDFs) in addition to traditional photometric redshift (photo-$z$) point estimates. However, the storage of photo-$z$ PDFs may present a challenge with increasingly large catalogs, as we face a trade-off between the accuracy of subsequent science measurements and the limitation of finite storage resources. This paper presents $\texttt{qp}$, a Python package for manipulating parametrizations of 1-dimensional PDFs, as suitable for photo-$z$ PDF compression. We use $\texttt{qp}$ to investigate the performance of three simple PDF storage formats (quantiles, samples, and step functions) as a function of the number of stored parameters on two realistic mock datasets, representative of upcoming surveys with different data qualities. We propose some best practices for choosing a photo-$z$ PDF approximation scheme and demonstrate the approach on a science case using performance metrics on both ensembles of individual photo-$z$ PDFs and an estimator of the overall redshift distribution function. We show that both the properties of the set of PDFs we wish to approximate and the chosen fidelity metric(s) affect the optimal parametrization. Additionally, we find that quantiles and samples outperform step functions, and we encourage further consideration of these formats for PDF approximation. △ Less

Submitted 31 July, 2021; v1 submitted 31 May, 2018; originally announced June 2018.

Journal ref: AJ 156 1 35 (2018)

arXiv:1510.07043 [pdf, other]

doi 10.3847/1538-4357/aa71af

Bayesian Redshift Classification of Emission-line Galaxies with Photometric Equivalent Widths

Authors: Andrew S. Leung, Viviana Acquaviva, Eric Gawiser, Robin Ciardullo, Eiichiro Komatsu, A. I. Malz, Gregory R. Zeimann, Joanna S. Bridge, Niv Drory, John J. Feldmeier, Steven L. Finkelstein, Karl Gebhardt, Caryl Gronwall, Alex Hagen, Gary J. Hill, Donald P. Schneider

Abstract: We present a Bayesian approach to the redshift classification of emission-line galaxies when only a single emission line is detected spectroscopically. We consider the case of surveys for high-redshift Lyman-alpha-emitting galaxies (LAEs), which have traditionally been classified via an inferred rest-frame equivalent width (EW) greater than 20 angstrom. Our Bayesian method relies on known prior pr… ▽ More We present a Bayesian approach to the redshift classification of emission-line galaxies when only a single emission line is detected spectroscopically. We consider the case of surveys for high-redshift Lyman-alpha-emitting galaxies (LAEs), which have traditionally been classified via an inferred rest-frame equivalent width (EW) greater than 20 angstrom. Our Bayesian method relies on known prior probabilities in measured emission-line luminosity functions and equivalent width distributions for the galaxy populations, and returns the probability that an object in question is an LAE given the characteristics observed. This approach will be directly relevant for the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), which seeks to classify ~10^6 emission-line galaxies into LAEs and low-redshift [O II] emitters. For a simulated HETDEX catalog with realistic measurement noise, our Bayesian method recovers 86% of LAEs missed by the traditional EW > 20 angstrom cutoff over 2 < z < 3, outperforming the EW cut in both contamination and incompleteness. This is due to the method's ability to trade off between the two types of binary classification error by adjusting the stringency of the probability requirement for classifying an observed object as an LAE. In our simulations of HETDEX, this method reduces the uncertainty in cosmological distance measurements by 14% with respect to the EW cut, equivalent to recovering 29% more cosmological information. Rather than using binary object labels, this method enables the use of classification probabilities in large-scale structure analyses. It can be applied to narrowband emission-line surveys as well as upcoming large spectroscopic surveys including Euclid and WFIRST. △ Less

Submitted 21 April, 2016; v1 submitted 23 October, 2015; originally announced October 2015.

Comments: 16 pages, 7 figures, 5 tables, submitted to ApJ

arXiv:1411.6015 [pdf, ps, other]

doi 10.1088/0004-637X/799/2/205

Physical and Morphological Properties of [O II] Emitting Galaxies in the HETDEX Pilot Survey

Authors: Joanna S. Bridge, Caryl Gronwall, Robin Ciardullo, Alex Hagen, Greg Zeimann, A. I. Malz, Viviana Acquaviva, Donald P. Schneider, Niv Drory, Karl Gebhardt, Shardha Jogee

Abstract: The Hobby-Eberly Dark Energy Experiment pilot survey identified 284 [O II] 3727 emitting galaxies in a 169 square-arcminute field of sky in the redshift range 0 < z < 0.57. This line flux limited sample provides a bridge between studies in the local universe and higher-redshift [O II] surveys. We present an analysis of the star formation rates (SFRs) of these galaxies as a function of stellar mass… ▽ More The Hobby-Eberly Dark Energy Experiment pilot survey identified 284 [O II] 3727 emitting galaxies in a 169 square-arcminute field of sky in the redshift range 0 < z < 0.57. This line flux limited sample provides a bridge between studies in the local universe and higher-redshift [O II] surveys. We present an analysis of the star formation rates (SFRs) of these galaxies as a function of stellar mass as determined via spectral energy distribution fitting. The [O II] emitters fall on the "main sequence" of star-forming galaxies with SFR decreasing at lower masses and redshifts. However, the slope of our relation is flatter than that found for most other samples, a result of the metallicity dependence of the [O II] star formation rate indicator. The mass specific SFR is higher for lower mass objects, supporting the idea that massive galaxies formed more quickly and efficiently than their lower mass counterparts. This is confirmed by the fact that the equivalent widths of the [O II] emission lines trend smaller with larger stellar mass. Examination of the morphologies of the [O II] emitters reveals that their star formation is not a result of mergers, and the galaxies' half-light radii do not indicate evolution of physical sizes. △ Less

Submitted 21 November, 2014; originally announced November 2014.

Comments: 36 pages, 16 figures, 4 tables, accepted to ApJ

arXiv:1409.8304 [pdf, ps, other]

doi 10.1088/0004-637X/796/1/64

HST Emission Line Galaxies at z ~ 2: The Ly-alpha Escape Fraction

Authors: Robin Ciardullo, Gregory Zeimann, Caryl Gronwall, Henry Gebhardt, Donald P. Schneider, Alex Hagen, A. I. Malz, Guillermo A. Blanc, Gary J. Hill, Niv Drory, Eric Gawiser

Abstract: We compare the H-beta line strengths of 1.90 < z < 2.35 star-forming galaxies observed with the near-IR grism of the Hubble Space Telescope with ground-based measurements of Ly-alpha from the HETDEX Pilot Survey and narrow-band imaging. By examining the line ratios of 73 galaxies, we show that most star-forming systems at this epoch have a Ly-alpha escape fraction below ~6%. We confirm this result… ▽ More We compare the H-beta line strengths of 1.90 < z < 2.35 star-forming galaxies observed with the near-IR grism of the Hubble Space Telescope with ground-based measurements of Ly-alpha from the HETDEX Pilot Survey and narrow-band imaging. By examining the line ratios of 73 galaxies, we show that most star-forming systems at this epoch have a Ly-alpha escape fraction below ~6%. We confirm this result by using stellar reddening to estimate the effective logarithmic extinction of the H-beta emission line (c_Hbeta = 0.5) and measuring both the H-beta and Ly-alpha luminosity functions in a ~ 100,000 cubic Mpc volume of space. We show that in our redshift window, the volumetric Ly-alpha escape fraction is at most 4.4+/-2.1(1.2)%, with an additional systematic ~25% uncertainty associated with our estimate of extinction. Finally, we demonstrate that the bulk of the epoch's star-forming galaxies have Ly-alpha emission line optical depths that are significantly greater than that for the underlying UV continuum. In our predominantly [O~III] 5007-selected sample of galaxies, resonant scattering must be important for the escape of Ly-alpha photons. △ Less

Submitted 29 September, 2014; originally announced September 2014.

Comments: 14 pages, 3 figures, Accepted to ApJ

arXiv:1403.4935 [pdf, other]

doi 10.1088/0004-637X/786/1/59

Spectral Energy Distribution Fitting of HETDEX Pilot Survey Lyman-alpha Emitters in COSMOS and GOODS-N

Authors: Alex Hagen, Robin Ciardullo, Caryl Gronwall, Viviana Acquaviva, Joanna Bridge, Gregory R. Zeimann, Guillermo A. Blanc, Nicholas A. Bond, Steven L. Finkelstein, Mimi Song, Eric Gawiser, Derek B. Fox, Henry Gebhardt, A. I. Malz, Donald P. Schneider, Niv Drory, Karl Gebhardt, Gary J. Hill

Abstract: We use broadband photometry extending from the rest-frame UV to the near-IR to fit the individual spectral energy distributions (SEDs) of 63 bright (L(Ly-alpha) > 10^43 ergs/s) Ly-alpha emitting galaxies (LAEs) in the redshift range 1.9 < z < 3.6. We find that these LAEs are quite heterogeneous, with stellar masses that span over three orders of magnitude, from 7.5 < log M < 10.5. Moreover, althou… ▽ More We use broadband photometry extending from the rest-frame UV to the near-IR to fit the individual spectral energy distributions (SEDs) of 63 bright (L(Ly-alpha) > 10^43 ergs/s) Ly-alpha emitting galaxies (LAEs) in the redshift range 1.9 < z < 3.6. We find that these LAEs are quite heterogeneous, with stellar masses that span over three orders of magnitude, from 7.5 < log M < 10.5. Moreover, although most LAEs have small amounts of extinction, some high-mass objects have stellar reddenings as large as E(B-V) ~0.4. Interestingly, in dusty objects the optical depths for Ly-alpha and the UV continuum are always similar, indicating that Ly-alpha photons are not undergoing many scatters before esca** their galaxy. In contrast, the ratio of optical depths in low-reddening systems can vary widely, illustrating the diverse nature of the systems. Finally, we show that in the star formation rate (SFR)-log mass diagram, our LAEs fall above the "main-sequence" defined by z ~ 3 continuum selected star-forming galaxies. In this respect, they are similar to sub-mm-selected galaxies, although most LAEs have much lower mass. △ Less

Submitted 19 March, 2014; originally announced March 2014.

Comments: Accepted to the ApJ

Showing 1–29 of 29 results for author: Malz, A I