-
ELEPHANT: ExtragaLactic alErt Pipeline for Hostless AstroNomical Transients
Authors:
P. J. Pessi,
R. Durgesh,
L. Nakazono,
E. E. Hayes,
R. A. P. Oliveira,
E. E. O. Ishida,
A. Moitinho,
A. Krone-Martins,
B. Moews,
R. S. de Souza,
R. Beck,
M. A. Kuhn,
K. Nowak,
S. Vaughan
Abstract:
Context. Transient astronomical events that exhibit no discernible association with a host galaxy are commonly referred to as hostless. These rare phenomena are associated with extremely energetic events, and they can offer unique insights into the properties and evolution of stars and galaxies. However, the sheer number of transients captured by contemporary high-cadence astronomical surveys rend…
▽ More
Context. Transient astronomical events that exhibit no discernible association with a host galaxy are commonly referred to as hostless. These rare phenomena are associated with extremely energetic events, and they can offer unique insights into the properties and evolution of stars and galaxies. However, the sheer number of transients captured by contemporary high-cadence astronomical surveys renders the manual identification of all potential hostless transients impractical. Therefore, creating a systematic identification tool is crucial for studying these elusive events. Aims. We present the ExtragaLactic alErt Pipeline for Hostless AstroNomical Transients (ELEPHANT), a framework for filtering hostless transients in astronomical data streams. Methods. We used Fink to access all the ZTF alerts produced between January/2022 and December/2023, selecting only those associated with extragalactic transients. We then processed the associated stamps using a sequence of image analysis techniques to retrieve hostless candidates. Results. We find that less than 2% of all analyzed transients are potentially hostless. Among them, approximately 10% have a spectroscopic class reported on TNS, with Type Ia supernova being the most common class, followed by SLSN. Among the hostless candidates retrieved by our pipeline, there was SN 2018ibb, which has been proposed to be a PISN candidate; and SN 2022ann, one of only five known SNe Icn. When no class is reported on TNS, the dominant classes are QSO and SN candidates, the former obtained from SIMBAD and the latter inferred using the Fink ML classifier. Conclusions. ELEPHANT represents an effective strategy to filter extragalactic events within large and complex astronomical alert streams. There are many applications for which this pipeline will be useful, ranging from transient selection for follow-up to studies of transient environments.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Transient Classifiers for Fink: Benchmarks for LSST
Authors:
B. M. O. Fraga,
C. R. Bom,
A. Santos,
E. Russeil,
M. Leoni,
J. Peloton,
E. E. O. Ishida,
A. Möller,
S. Blondin
Abstract:
The upcoming Legacy Survey of Space and Time (LSST) at the Vera Rubin Observatory is expected to detect a few million transients per night, which will generate a live alert stream during the entire 10 years of the survey. This will be distributed via community brokers whose task is to select subsets of the stream and direct them to scientific communities. Given the volume and complexity of data, m…
▽ More
The upcoming Legacy Survey of Space and Time (LSST) at the Vera Rubin Observatory is expected to detect a few million transients per night, which will generate a live alert stream during the entire 10 years of the survey. This will be distributed via community brokers whose task is to select subsets of the stream and direct them to scientific communities. Given the volume and complexity of data, machine learning (ML) algorithms will be paramount for this task. We present the infrastructure tests and classification methods developed within the {\sc Fink} broker in preparation for LSST. This work aims to provide detailed information regarding the underlying assumptions, and methods, behind each classifier, enabling users to make informed follow-up decisions from {\sc Fink} photometric classifications. Using simulated data from the Extended LSST Astronomical Time-series Classification Challenge (ELAsTiCC), we showcase the performance of binary and multi-class ML classifiers available in {\sc Fink}. These include tree-based classifiers coupled with tailored feature extraction strategies, as well as deep learning algorithms. We introduce the CBPF Alert Transient Search (CATS), a deep learning architecture specifically designed for this task. Results show that {\sc Fink} classifiers are able to handle the extra complexity which is expected from LSST data. CATS achieved $97\%$ accuracy on a multi-class classification while our best performing binary classifier achieve $99\%$ when classifying the Periodic class. ELAsTiCC was an important milestone in preparing {\sc Fink} infrastructure to deal with LSST-like data. Our results demonstrate that {\sc Fink} classifiers are well prepared for the arrival of the new stream; this experience also highlights that transitioning from current infrastructures to Rubin will require significant adaptation of currently available tools.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
M-dwarf flares in the Zwicky Transient Facility data and what we can learn from them
Authors:
A. S. Voloshina,
A. D. Lavrukhina,
M. V. Pruzhinskaya,
K. L. Malanchev,
E. E. O. Ishida,
V. V. Krushinsky,
P. D. Aleo,
E. Gangler,
M. V. Kornilov,
V. S. Korolev,
E. Russeil,
T. A. Semenikhin,
S. Sreejith,
A. A. Volnova
Abstract:
In this paper, we explore the possibility of detecting M-dwarf flares using data from the Zwicky Transient Facility data releases (ZTF DRs). We employ two different approaches: the traditional method of parametric fit search and a machine learning algorithm originally developed for anomaly detection. We analyzed over 35 million ZTF light curves and visually scrutinized 1168 candidates suggested by…
▽ More
In this paper, we explore the possibility of detecting M-dwarf flares using data from the Zwicky Transient Facility data releases (ZTF DRs). We employ two different approaches: the traditional method of parametric fit search and a machine learning algorithm originally developed for anomaly detection. We analyzed over 35 million ZTF light curves and visually scrutinized 1168 candidates suggested by the algorithms to filter out artifacts, occultations of a star by an asteroid, and known variable objects of other types. Our final sample comprises 134 flares with amplitude ranging from 0.2 to 4.6 magnitudes, including repeated flares and complex flares with multiple components. Using Pan-STARRS DR2 colors, we also assigned a corresponding spectral subclass to each object in the sample. For 13 flares with well-sampled light curves, we estimated the bolometric energy. Our results show that the ZTF's cadence strategy is suitable for identifying M-dwarf flares and other fast transients, allowing for the extraction of significant astrophysical information from their light curves.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Multi-View Symbolic Regression
Authors:
Etienne Russeil,
Fabrício Olivetti de França,
Konstantin Malanchev,
Bogdan Burlacu,
Emille E. O. Ishida,
Marion Leroux,
Clément Michelin,
Guillaume Moinard,
Emmanuel Gangler
Abstract:
Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail t…
▽ More
Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; θ) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behaviour, recovering known expressions from the literature as well as promising alternatives, thus enabling the use SR to a large range of experimental scenarios.
△ Less
Submitted 16 February, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
The 2022-2023 accretion outburst of the young star V1741 Sgr
Authors:
Michael A. Kuhn,
Lynne A. Hillenbrand,
Michael S. Connelley,
R. Michael Rich,
Bart Staels,
Adolfo S. Carvalho,
Philip W. Lucas,
Christoffer Fremling,
Viraj R. Karambelkar,
Ellen Lee,
Tomás Ahumada,
Emille E. O. Ishida,
Kishalay De,
Rafael S. de Souza,
Mansi Kasliwal
Abstract:
V1741 Sgr (= SPICY 71482/Gaia22dtk) is a Classical T Tauri star on the outskirts of the Lagoon Nebula. After at least a decade of stability, in mid-2022, the optical source brightened by ~3 mag over two months, remained bright until early 2023, then dimmed erratically over the next four months. This event was monitored with optical and infrared spectroscopy and photometry. Spectra from the peak (O…
▽ More
V1741 Sgr (= SPICY 71482/Gaia22dtk) is a Classical T Tauri star on the outskirts of the Lagoon Nebula. After at least a decade of stability, in mid-2022, the optical source brightened by ~3 mag over two months, remained bright until early 2023, then dimmed erratically over the next four months. This event was monitored with optical and infrared spectroscopy and photometry. Spectra from the peak (October 2022) indicate an EX Lup-type (EXor) accretion outburst, with strong emission from H I, He I, and Ca II lines and CO bands. At this stage, spectroscopic absorption features indicated a temperature of T ~ 4750 K with low-gravity lines (e.g., Ba II and Sr II). By April 2023, with the outburst beginning to dim, strong TiO absorption appeared, indicating a cooler T ~ 3600 K temperature. However, once the source had returned to its pre-outburst flux in August 2023, the TiO absorption and the CO emission disappeared. When the star went into outburst, the source's spectral energy distribution became flatter, leading to bluer colours at wavelengths shorter than ~1.6 microns and redder colours at longer wavelengths. The brightening requires a continuum emitting area larger than the stellar surface, likely from optically thick circumstellar gas with cooler surface layers producing the absorption features. Additional contributions to the outburst spectrum may include blue excess from hotspots on the stellar surface, emission lines from diffuse gas, and reprocessed emission from the dust disc. Cooling of the circumstellar gas would explain the appearance of TiO, which subsequently disappeared once this gas had faded and the stellar spectrum reemerged.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Bayesian multi-band fitting of alerts for kilonovae detection
Authors:
Biswajit Biswas,
Junpeng Lao,
Eric Aubourg,
Alexandre Boucaud,
Axel Guinot,
Emille E. O. Ishida,
Cécile Roucelle
Abstract:
In the era of multi-messenger astronomy, early classification of photometric alerts from wide-field and high-cadence surveys is a necessity to trigger spectroscopic follow-ups. These classifications are expected to play a key role in identifying potential candidates that might have a corresponding gravitational wave (GW) signature. Machine learning classifiers using features from parametric fittin…
▽ More
In the era of multi-messenger astronomy, early classification of photometric alerts from wide-field and high-cadence surveys is a necessity to trigger spectroscopic follow-ups. These classifications are expected to play a key role in identifying potential candidates that might have a corresponding gravitational wave (GW) signature. Machine learning classifiers using features from parametric fitting of light curves are widely deployed by broker software to analyze millions of alerts, but most of these algorithms require as many points in the filter as the number of parameters to produce the fit, which increases the chances of missing a short transient. Moreover, the classifiers are not able to account for the uncertainty in the fits when producing the final score. In this context, we present a novel classification strategy that incorporates data-driven priors for extracting a joint posterior distribution of fit parameters and hence obtaining a distribution of classification scores. We train and test a classifier to identify kilonovae events which originate from binary neutron star mergers or neutron star black hole mergers, among simulations for the Zwicky Transient Facility observations with 19 other non-kilonovae-type events. We demonstrate that our method can estimate the uncertainty of misclassification, and the mean of the distribution of classification scores as point estimate obtains an AUC score of 0.96 on simulated data. We further show that using this method we can process the entire alert steam in real-time and bring down the sample of probable events to a scale where they can be analyzed by domain experts.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Rainbow: a colorful approach on multi-passband light curve estimation
Authors:
E. Russeil,
K. L. Malanchev,
P. D. Aleo,
E. E. O. Ishida,
M. V. Pruzhinskaya,
E. Gangler,
A. D. Lavrukhina,
A. A. Volnova,
A. Voloshina,
T. Semenikhin,
S. Sreejith,
M. V. Kornilov,
V. S. Korolev
Abstract:
We present Rainbow, a physically motivated framework which enables simultaneous multi-band light curve fitting. It allows the user to construct a 2-dimensional continuous surface across wavelength and time, even in situations where the number of observations in each filter is significantly limited. Assuming the electromagnetic radiation emission from the transient can be approximated by a black-bo…
▽ More
We present Rainbow, a physically motivated framework which enables simultaneous multi-band light curve fitting. It allows the user to construct a 2-dimensional continuous surface across wavelength and time, even in situations where the number of observations in each filter is significantly limited. Assuming the electromagnetic radiation emission from the transient can be approximated by a black-body, we combined an expected temperature evolution and a parametric function describing its bolometric light curve. These three ingredients allow the information available in one passband to guide the reconstruction in the others, thus enabling a proper use of multi-survey data. We demonstrate the effectiveness of our method by applying it to simulated data from the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC) as well as real data from the Young Supernova Experiment (YSE DR1). We evaluate the quality of the estimated light curves according to three different tests: goodness of fit, time of peak prediction and ability to transfer information to machine learning (ML) based classifiers. Results confirm that Rainbow leads to equivalent (SNII) or up to 75% better (SN Ibc) goodness of fit when compared to the Monochromatic approach. Similarly, accuracy when using Rainbow best-fit values as a parameter space in multi-class ML classification improves for all classes in our sample. An efficient implementation of Rainbow has been publicly released as part of the light curve package at https://github.com/light-curve/light-curve-python. Our approach enables straight forward light curve estimation for objects with observations in multiple filters and from multiple experiments. It is particularly well suited for situations where light curve sampling is sparse.
△ Less
Submitted 5 October, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Are classification metrics good proxies for SN Ia cosmological constraining power?
Authors:
Alex I. Malz,
Mi Dai,
Kara A. Ponder,
Emille E. O. Ishida,
Santiago Gonzalez-Gaitain,
Rupesh Durgesh,
Alberto Krone-Martins,
Rafael S. de Souza,
Noble Kennamer,
Sreevarsha Sreejith,
Lluis Galbany,
The LSST Dark Energy Science Collaboration,
The Cosmostatistics Initiative
Abstract:
Context: When selecting a classifier to use for a supernova Ia (SN Ia) cosmological analysis, it is common to make decisions based on metrics of classification performance, i.e. contamination within the photometrically classified SN Ia sample, rather than a measure of cosmological constraining power. If the former is an appropriate proxy for the latter, this practice would save those designing an…
▽ More
Context: When selecting a classifier to use for a supernova Ia (SN Ia) cosmological analysis, it is common to make decisions based on metrics of classification performance, i.e. contamination within the photometrically classified SN Ia sample, rather than a measure of cosmological constraining power. If the former is an appropriate proxy for the latter, this practice would save those designing an analysis pipeline from the computational expense of a full cosmology forecast. Aims: This study tests the assumption that classification metrics are an appropriate proxy for cosmology metrics. Methods: We emulate photometric SN Ia cosmology samples with controlled contamination rates of individual contaminant classes and evaluate each of them under a set of classification metrics. We then derive cosmological parameter constraints from all samples under two common analysis approaches and quantify the impact of contamination by each contaminant class on the resulting cosmological parameter estimates. Results: We observe that cosmology metrics are sensitive to both the contamination rate and the class of the contaminating population, whereas the classification metrics are insensitive to the latter. Conclusions: We therefore discourage exclusive reliance on classification-based metrics for cosmological analysis design decisions, e.g. classifier choice, and instead recommend optimizing using a metric of cosmological parameter constraining power.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Repeating Outbursts from the Young Stellar Object Gaia23bab (= SPICY 97589)
Authors:
Michael A. Kuhn,
Robert A. Benjamin,
Emille E. O. Ishida,
Rafael S. de Souza,
Julien Peloton,
Michele Delli Veneri
Abstract:
The light curve of Gaia23bab (= SPICY 97589) shows two significant ($ΔG>2$ mag) brightening events, one in 2017 and an ongoing event starting in 2022. The source's quiescent spectral energy distribution indicates an embedded ($A_V>5$ mag) pre-main-sequence star, with optical accretion emission and mid-infrared disk emission. This characterization is supported by the source's membership in an embed…
▽ More
The light curve of Gaia23bab (= SPICY 97589) shows two significant ($ΔG>2$ mag) brightening events, one in 2017 and an ongoing event starting in 2022. The source's quiescent spectral energy distribution indicates an embedded ($A_V>5$ mag) pre-main-sequence star, with optical accretion emission and mid-infrared disk emission. This characterization is supported by the source's membership in an embedded cluster in the star-forming cloud DOBASHI 1604 at a distance of $900\pm45$~pc. Thus, the brightening events are probable accretion outbursts, likely of EX Lup-type.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
From Images to Features: Unbiased Morphology Classification via Variational Auto-Encoders and Domain Adaptation
Authors:
Quanfeng Xu,
Shiyin Shen,
Rafael S. de Souza,
Mi Chen,
Renhao Ye,
Yumei She,
Zhu Chen,
Emille E. O. Ishida,
Alberto Krone-Martins,
Rupesh Durgesh
Abstract:
We present a novel approach for the dimensionality reduction of galaxy images by leveraging a combination of variational auto-encoders (VAE) and domain adaptation (DA). We demonstrate the effectiveness of this approach using a sample of low redshift galaxies with detailed morphological type labels from the Galaxy-Zoo DECaLS project. We show that 40-dimensional latent variables can effectively repr…
▽ More
We present a novel approach for the dimensionality reduction of galaxy images by leveraging a combination of variational auto-encoders (VAE) and domain adaptation (DA). We demonstrate the effectiveness of this approach using a sample of low redshift galaxies with detailed morphological type labels from the Galaxy-Zoo DECaLS project. We show that 40-dimensional latent variables can effectively reproduce most morphological features in galaxy images. To further validate the effectiveness of our approach, we utilised a classical random forest (RF) classifier on the 40-dimensional latent variables to make detailed morphology feature classifications. This approach performs similarly to a direct neural network application on galaxy images. We further enhance our model by tuning the VAE network via DA using galaxies in the overlap** footprint of DECaLS and BASS+MzLS, enabling the unbiased application of our model to galaxy images in both surveys. We observed that DA led to even better morphological feature extraction and classification performance. Overall, this combination of VAE and DA can be applied to achieve image dimensionality reduction, defect image identification, and morphology classification in large optical surveys.
△ Less
Submitted 13 October, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Finding active galactic nuclei through Fink
Authors:
Etienne Russeil,
Emille E. O. Ishida,
Roman Le Montagner,
Julien Peloton,
Anais Moller
Abstract:
We present the Active Galactic Nuclei (AGN) classifier as currently implemented within the Fink broker. Features were built upon summary statistics of available photometric points, as well as color estimation enabled by symbolic regression. The learning stage includes an active learning loop, used to build an optimized training sample from labels reported in astronomical catalogs. Using this metho…
▽ More
We present the Active Galactic Nuclei (AGN) classifier as currently implemented within the Fink broker. Features were built upon summary statistics of available photometric points, as well as color estimation enabled by symbolic regression. The learning stage includes an active learning loop, used to build an optimized training sample from labels reported in astronomical catalogs. Using this method to classify real alerts from the Zwicky Transient Facility (ZTF), we achieved 98.0% accuracy, 93.8% precision and 88.5% recall. We also describe the modifications necessary to enable processing data from the upcoming Vera C. Rubin Observatory Large Survey of Space and Time (LSST), and apply them to the training sample of the Extended LSST Astronomical Time-series Classification Challenge (ELAsTiCC). Results show that our designed feature space enables high performances of traditional machine learning algorithms in this binary classification task.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
The SNAD Viewer: Everything You Want to Know about Your Favorite ZTF Object
Authors:
Konstantin Malanchev,
Matwey V. Kornilov,
Maria V. Pruzhinskaya,
Emille E. O. Ishida,
Patrick D. Aleo,
Vladimir S. Korolev,
Anastasia Lavrukhina,
Etienne Russeil,
Sreevarsha Sreejith,
Alina A. Volnova,
Anastasiya Voloshina,
Alberto Krone-Martins
Abstract:
We describe the SNAD Viewer, a web portal for astronomers which presents a centralized view of individual objects from the Zwicky Transient Facility's (ZTF) data releases, including data gathered from multiple publicly available astronomical archives and data sources. Initially built to enable efficient expert feedback in the context of adaptive machine learning applications, it has evolved into a…
▽ More
We describe the SNAD Viewer, a web portal for astronomers which presents a centralized view of individual objects from the Zwicky Transient Facility's (ZTF) data releases, including data gathered from multiple publicly available astronomical archives and data sources. Initially built to enable efficient expert feedback in the context of adaptive machine learning applications, it has evolved into a full-fledged community asset that centralizes public information and provides a multi-dimensional view of ZTF sources. For users, we provide detailed descriptions of the data sources and choices underlying the information displayed in the portal. For developers, we describe our architectural choices and their consequences such that our experience can help others engaged in similar endeavors or in adapting our publicly released code to their requirements. The infrastructure we describe here is scalable and flexible and can be personalized and used by other surveys and for other science goals. The Viewer has been instrumental in highlighting the crucial roles domain experts retain in the era of big data in astronomy. Given the arrival of the upcoming generation of large-scale surveys, we believe similar systems will be paramount in enabling an optimal exploitation of the scientific potential enclosed in current terabyte and future petabyte-scale data sets. The Viewer is publicly available online at https://ztf.snad.space
△ Less
Submitted 3 March, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Enabling the discovery of fast transients: A kilonova science module for the Fink broker
Authors:
B. Biswas,
E. E. O. Ishida,
J. Peloton,
A. Moller,
M. V. Pruzhinskaya,
R. S. de Souza,
D. Muthukrishna
Abstract:
We describe the fast transient classification algorithm in the center of the kilonova (KN) science module currently implemented in the Fink broker and report classification results based on simulated catalogs and real data from the ZTF alert stream. We used noiseless, homogeneously sampled simulations to construct a basis of principal components (PCs). All light curves from a more realistic ZTF si…
▽ More
We describe the fast transient classification algorithm in the center of the kilonova (KN) science module currently implemented in the Fink broker and report classification results based on simulated catalogs and real data from the ZTF alert stream. We used noiseless, homogeneously sampled simulations to construct a basis of principal components (PCs). All light curves from a more realistic ZTF simulation were written as a linear combination of this basis. The corresponding coefficients were used as features in training a random forest classifier. The same method was applied to long (>30 days) and medium (<30 days) light curves. The latter aimed to simulate the data situation found within the ZTF alert stream. Classification based on long light curves achieved 73.87% precision and 82.19% recall. Medium baseline analysis resulted in 69.30% precision and 69.74% recall, thus confirming the robustness of precision results when limited to 30 days of observations. In both cases, dwarf flares and point Type Ia supernovae were the most frequent contaminants. The final trained model was integrated into the Fink broker and has been distributing fast transients, tagged as KN_candidates, to the astronomical community, especially through the GRANDMA collaboration. We showed that features specifically designed to grasp different light curve behaviors provide enough information to separate fast (KN-like) from slow (non-KN-like) evolving events. This module represents one crucial link in an intricate chain of infrastructure elements for multi-messenger astronomy which is currently being put in place by the Fink broker team in preparation for the arrival of data from the Vera Rubin Observatory Legacy Survey of Space and Time.
△ Less
Submitted 5 October, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Explainable classification of astronomical uncertain time series
Authors:
Michael Franklin Mbouopda,
Emille E O Ishida,
Engelbert Mephu Nguifo,
Emmanuel Gangler
Abstract:
Exploring the expansion history of the universe, understanding its evolutionary stages, and predicting its future evolution are important goals in astrophysics. Today, machine learning tools are used to help achieving these goals by analyzing transient sources, which are modeled as uncertain time series. Although black-box methods achieve appreciable performance, existing interpretable time series…
▽ More
Exploring the expansion history of the universe, understanding its evolutionary stages, and predicting its future evolution are important goals in astrophysics. Today, machine learning tools are used to help achieving these goals by analyzing transient sources, which are modeled as uncertain time series. Although black-box methods achieve appreciable performance, existing interpretable time series methods failed to obtain acceptable performance for this type of data. Furthermore, data uncertainty is rarely taken into account in these methods. In this work, we propose an uncertaintyaware subsequence based model which achieves a classification comparable to that of state-of-the-art methods. Unlike conformal learning which estimates model uncertainty on predictions, our method takes data uncertainty as additional input. Moreover, our approach is explainable-by-design, giving domain experts the ability to inspect the model and explain its predictions. The explainability of the proposed method has also the potential to inspire new developments in theoretical astrophysics modeling by suggesting important subsequences which depict details of light curve shapes. The dataset, the source code of our experiment, and the results are made available on a public repository.
△ Less
Submitted 28 September, 2022;
originally announced October 2022.
-
Supernova search with active learning in ZTF DR3
Authors:
Maria V. Pruzhinskaya,
Emille E. O. Ishida,
Alexandra K. Novinskaya,
Etienne Russeil,
Alina A. Volnova,
Konstantin L. Malanchev,
Matwey V. Kornilov,
Patrick D. Aleo,
Vladimir S. Korolev,
Vadim V. Krushinsky,
Sreevarsha Sreejith,
Emmanuel Gangler
Abstract:
We provide the first results from the complete SNAD adaptive learning pipeline in the context of a broad scope of data from large-scale astronomical surveys. The main goal of this work is to explore the potential of adaptive learning techniques in application to big data sets. Our SNAD team used Active Anomaly Discovery (AAD) as a tool to search for new supernova (SN) candidates in the photometric…
▽ More
We provide the first results from the complete SNAD adaptive learning pipeline in the context of a broad scope of data from large-scale astronomical surveys. The main goal of this work is to explore the potential of adaptive learning techniques in application to big data sets. Our SNAD team used Active Anomaly Discovery (AAD) as a tool to search for new supernova (SN) candidates in the photometric data from the first 9.4 months of the Zwicky Transient Facility (ZTF) survey, namely, between March 17 and December 31 2018 (58194 < MJD < 58483). We analysed 70 ZTF fields at a high galactic latitude and visually inspected 2100 outliers. This resulted in 104 SN-like objects being found, 57 of which were reported to the Transient Name Server for the first time and with 47 having previously been mentioned in other catalogues, either as SNe with known types or as SN candidates. We visually inspected the multi-colour light curves of the non-catalogued transients and performed fittings with different supernova models to assign it to a probable photometric class: Ia, Ib/c, IIP, IIL, or IIn. Moreover, we also identified unreported slow-evolving transients that are good superluminous SN candidates, along with a few other non-catalogued objects, such as red dwarf flares and active galactic nuclei. Beyond confirming the effectiveness of human-machine integration underlying the AAD strategy, our results shed light on potential leaks in currently available pipelines. These findings can help avoid similar losses in future large-scale astronomical surveys. Furthermore, the algorithm enables direct searches of any type of data and based on any definition of an anomaly set by the expert.
△ Less
Submitted 27 March, 2023; v1 submitted 18 August, 2022;
originally announced August 2022.
-
From Data to Software to Science with the Rubin Observatory LSST
Authors:
Katelyn Breivik,
Andrew J. Connolly,
K. E. Saavik Ford,
Mario Jurić,
Rachel Mandelbaum,
Adam A. Miller,
Dara Norman,
Knut Olsen,
William O'Mullane,
Adrian Price-Whelan,
Timothy Sacco,
J. L. Sokoloski,
Ashley Villar,
Viviana Acquaviva,
Tomas Ahumada,
Yusra AlSayyad,
Catarina S. Alves,
Igor Andreoni,
Timo Anguita,
Henry J. Best,
Federica B. Bianco,
Rosaria Bonito,
Andrew Bradshaw,
Colin J. Burke,
Andresa Rodrigues de Campos
, et al. (75 additional authors not shown)
Abstract:
The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) dataset will dramatically alter our understanding of the Universe, from the origins of the Solar System to the nature of dark matter and dark energy. Much of this research will depend on the existence of robust, tested, and scalable algorithms, software, and services. Identifying and develo** such tools ahead of time has the po…
▽ More
The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) dataset will dramatically alter our understanding of the Universe, from the origins of the Solar System to the nature of dark matter and dark energy. Much of this research will depend on the existence of robust, tested, and scalable algorithms, software, and services. Identifying and develo** such tools ahead of time has the potential to significantly accelerate the delivery of early science from LSST. Develo** these collaboratively, and making them broadly available, can enable more inclusive and equitable collaboration on LSST science.
To facilitate such opportunities, a community workshop entitled "From Data to Software to Science with the Rubin Observatory LSST" was organized by the LSST Interdisciplinary Network for Collaboration and Computing (LINCC) and partners, and held at the Flatiron Institute in New York, March 28-30th 2022. The workshop included over 50 in-person attendees invited from over 300 applications. It identified seven key software areas of need: (i) scalable cross-matching and distributed joining of catalogs, (ii) robust photometric redshift determination, (iii) software for determination of selection functions, (iv) frameworks for scalable time-series analyses, (v) services for image access and reprocessing at scale, (vi) object image access (cutouts) and analysis at scale, and (vii) scalable job execution systems.
This white paper summarizes the discussions of this workshop. It considers the motivating science use cases, identified cross-cutting algorithms, software, and services, their high-level technical specifications, and the principles of inclusive collaborations needed to develop them. We provide it as a useful roadmap of needs, as well as to spur action and collaboration between groups and individuals looking to develop reusable software for early LSST science.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
The GRANDMA network in preparation for the fourth gravitational-wave observing run
Authors:
S. Agayeva,
V. Aivazyan,
S. Alishov,
M. Almualla,
C. Andrade,
S. Antier,
J. -M. Bai,
A. Baransky,
S. Basa,
P. Bendjoya,
Z. Benkhaldoun,
S. Beradze,
D. Berezin,
U. Bhardwaj,
M. Blazek,
O. Burkhonov,
E. Burns,
S. Caudill,
N. Christensen,
F. Colas,
A. Coleiro,
W. Corradi,
M. W. Coughlin,
T. Culino,
D. Darson
, et al. (76 additional authors not shown)
Abstract:
GRANDMA is a world-wide collaboration with the primary scientific goal of studying gravitational-wave sources, discovering their electromagnetic counterparts and characterizing their emission. GRANDMA involves astronomers, astrophysicists, gravitational-wave physicists, and theorists. GRANDMA is now a truly global network of telescopes, with (so far) 30 telescopes in both hemispheres. It incorpora…
▽ More
GRANDMA is a world-wide collaboration with the primary scientific goal of studying gravitational-wave sources, discovering their electromagnetic counterparts and characterizing their emission. GRANDMA involves astronomers, astrophysicists, gravitational-wave physicists, and theorists. GRANDMA is now a truly global network of telescopes, with (so far) 30 telescopes in both hemispheres. It incorporates a citizen science programme (Kilonova-Catcher) which constitutes an opportunity to spread the interest in time-domain astronomy. The telescope network is an heterogeneous set of already-existing observing facilities that operate coordinated as a single observatory. Within the network there are wide-field imagers that can observe large areas of the sky to search for optical counterparts, narrow-field instruments that do targeted searches within a predefined list of host-galaxy candidates, and larger telescopes that are devoted to characterization and follow-up of the identified counterparts. Here we present an overview of GRANDMA after the third observing run of the LIGO/VIRGO gravitational-wave observatories in $2019-2020$ and its ongoing preparation for the forthcoming fourth observational campaign (O4). Additionally, we review the potential of GRANDMA for the discovery and follow-up of other types of astronomical transients.
△ Less
Submitted 27 July, 2022; v1 submitted 20 July, 2022;
originally announced July 2022.
-
A graph-based spectral classification of Type II supernovae
Authors:
Rafael S. de Souza,
Stephen Thorp,
Lluís Galbany,
Emille E. O. Ishida,
Santiago González-Gaitán,
Morgan A. Schmitz,
Alberto Krone-Martins,
Christina Peters
Abstract:
Given the ever-increasing number of time-domain astronomical surveys, employing robust, interpretative, and automated data-driven classification schemes is pivotal. Based on graph theory, we present new data-driven classification heuristics for spectral data. A spectral classification scheme of Type II supernovae (SNe II) is proposed based on the phase relative to the maximum light in the $V$ band…
▽ More
Given the ever-increasing number of time-domain astronomical surveys, employing robust, interpretative, and automated data-driven classification schemes is pivotal. Based on graph theory, we present new data-driven classification heuristics for spectral data. A spectral classification scheme of Type II supernovae (SNe II) is proposed based on the phase relative to the maximum light in the $V$ band and the end of the plateau phase. We utilize a compiled optical data set that comprises 145 SNe and 1595 optical spectra in 4000-9000 $\overset{\circ}{\mathrm {A}}$. Our classification method naturally identifies outliers and arranges the different SNe in terms of their major spectral features. We compare our approach to the off-the-shelf umap manifold learning and show that both strategies are consistent with a continuous variation of spectral types rather than discrete families. The automated classification naturally reflects the fast evolution of Type II SNe around the maximum light while showcasing their homogeneity close to the end of the plateau phase. The scheme we develop could be more widely applicable to unsupervised time series classification or characterisation of other functional data.
△ Less
Submitted 1 June, 2023; v1 submitted 28 June, 2022;
originally announced June 2022.
-
Spectroscopic Confirmation of a Population of Isolated, Intermediate-Mass YSOs
Authors:
Michael A. Kuhn,
Ramzi Saber,
Matthew S. Povich,
Rafael S. de Souza,
Alberto Krone-Martins,
Emille E. O. Ishida,
Catherine Zucker,
Robert A. Benjamin,
Lynne A. Hillenbrand,
Alfred Castro-Ginard,
Xingyu Zhou
Abstract:
Wide-field searches for young stellar objects (YSOs) can place useful constraints on the prevalence of clustered versus distributed star formation. The Spitzer/IRAC Candidate YSO (SPICY) catalog is one of the largest compilations of such objects (~120,000 candidates in the Galactic midplane). Many SPICY candidates are spatially clustered, but, perhaps surprisingly, approximately half the candidate…
▽ More
Wide-field searches for young stellar objects (YSOs) can place useful constraints on the prevalence of clustered versus distributed star formation. The Spitzer/IRAC Candidate YSO (SPICY) catalog is one of the largest compilations of such objects (~120,000 candidates in the Galactic midplane). Many SPICY candidates are spatially clustered, but, perhaps surprisingly, approximately half the candidates appear spatially distributed. To better characterize this unexpected population and confirm its nature, we obtained Palomar/DBSP spectroscopy for 26 of the optically-bright (G<15 mag) "isolated" YSO candidates. We confirm the YSO classifications of all 26 sources based on their positions on the Hertzsprung-Russell diagram, H and Ca II line-emission from over half the sample, and robust detection of infrared excesses. This implies a contamination rate of <10% for SPICY stars that meet our optical selection criteria. Spectral types range from B4 to K3, with A-type stars most common. Spectral energy distributions, diffuse interstellar bands, and Galactic extinction maps indicate moderate to high extinction. Stellar masses range from ~1 to 7 $M_\odot$, and the estimated accretion rates, ranging from $3\times10^{-8}$ to $3\times10^{-7}$ $M_\odot$ yr$^{-1}$, are typical for YSOs in this mass range. The 3D spatial distribution of these stars, based on Gaia astrometry, reveals that the "isolated" YSOs are not evenly distributed in the Solar neighborhood but are concentrated in kpc-scale dusty Galactic structures that also contain the majority of the SPICY YSO clusters. Thus, the processes that produce large Galactic star-forming structures may yield nearly as many distributed as clustered YSOs.
△ Less
Submitted 19 September, 2022; v1 submitted 8 June, 2022;
originally announced June 2022.
-
How have astronomers cited other fields in the last decade?
Authors:
Michele Delli Veneri,
Rafael S. de Souza,
Alberto Krone-Martins,
Emille E. O. Ishida,
Maria Luiza L. Dantas,
Noble Kennamer
Abstract:
We present a citation pattern analysis between astronomical papers and 13 other disciplines, based on the arXiv database over the past decade ($2010 - 2020$). We analyze 12,600 astronomical papers citing over 14,531 unique publications outside astronomy. Two striking patterns are unraveled. First, general relativity recently became the most cited field by astronomers, a trend highly correlated wit…
▽ More
We present a citation pattern analysis between astronomical papers and 13 other disciplines, based on the arXiv database over the past decade ($2010 - 2020$). We analyze 12,600 astronomical papers citing over 14,531 unique publications outside astronomy. Two striking patterns are unraveled. First, general relativity recently became the most cited field by astronomers, a trend highly correlated with the discovery of gravitational waves. Secondly, the fast growth of referenced papers in computer science and statistics, the first with a notable 15-fold increase since 2015. Such findings confirm the critical role of interdisciplinary efforts involving astronomy, statistics, and computer science in recent astronomical research.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Sidestep** the inversion of the weak-lensing covariance matrix with Approximate Bayesian Computation
Authors:
Martin Kilbinger,
Emille E. O. Ishida,
Jessi Cisewski-Kehe
Abstract:
Weak gravitational lensing is one of the few direct methods to map the dark-matter distribution on large scales in the Universe, and to estimate cosmological parameters. We study a Bayesian inference problem where the data covariance $\mathbf{C}$, estimated from a number $n_{\textrm{s}}$ of numerical simulations, is singular. In a cosmological context of large-scale structure observations, the cre…
▽ More
Weak gravitational lensing is one of the few direct methods to map the dark-matter distribution on large scales in the Universe, and to estimate cosmological parameters. We study a Bayesian inference problem where the data covariance $\mathbf{C}$, estimated from a number $n_{\textrm{s}}$ of numerical simulations, is singular. In a cosmological context of large-scale structure observations, the creation of a large number of such $N$-body simulations is often prohibitively expensive. Inference based on a likelihood function often includes a precision matrix, $Ψ= \mathbf{C}^{-1}$. The covariance matrix corresponding to a $p$-dimensional data vector is singular for $p \ge n_{\textrm{s}}$, in which case the precision matrix is unavailable. We propose the likelihood-free inference method Approximate Bayesian Computation (ABC) as a solution that circumvents the inversion of the singular covariance matrix. We present examples of increasing degree of complexity, culminating in a realistic cosmological scenario of the determination of the weak-gravitational lensing power spectrum for the upcoming European Space Agency satellite Euclid. While we found the ABC parameter estimate variances to be mildly larger compared to likelihood-based approaches, which are restricted to settings with $p < n_{\textrm{s}}$, we obtain unbiased parameter estimates with ABC even in extreme cases where $p / n_{\textrm{s}} \gg 1$. The code has been made publicly available to ensure the reproducibility of the results.
△ Less
Submitted 28 February, 2023; v1 submitted 6 December, 2021;
originally announced December 2021.
-
SNAD Transient Miner: Finding Missed Transient Events in ZTF DR4 using k-D trees
Authors:
P. D. Aleo,
K. L. Malanchev,
M. V. Pruzhinskaya,
E. E. O. Ishida,
E. Russeil,
M. V. Kornilov,
V. S. Korolev,
S. Sreejith,
A. A. Volnova,
G. S. Narayan
Abstract:
We report the automatic detection of 11 transients (7 possible supernovae and 4 active galactic nuclei candidates) within the Zwicky Transient Facility fourth data release (ZTF DR4), all of them observed in 2018 and absent from public catalogs. Among these, three were not part of the ZTF alert stream. Our transient mining strategy employs 41 physically motivated features extracted from both real l…
▽ More
We report the automatic detection of 11 transients (7 possible supernovae and 4 active galactic nuclei candidates) within the Zwicky Transient Facility fourth data release (ZTF DR4), all of them observed in 2018 and absent from public catalogs. Among these, three were not part of the ZTF alert stream. Our transient mining strategy employs 41 physically motivated features extracted from both real light curves and four simulated light curve models (SN Ia, SN II, TDE, SLSN-I). These features are input to a k-D tree algorithm, from which we calculate the 15 nearest neighbors. After pre-processing and selection cuts, our dataset contained approximately a million objects among which we visually inspected the 105 closest neighbors from seven of our brightest, most well-sampled simulations, comprising 89 unique ZTF DR4 sources. Our result illustrates the potential of coherently incorporating domain knowledge and automatic learning algorithms, which is one of the guiding principles directing the SNAD team. It also demonstrates that the ZTF DR is a suitable testing ground for data mining algorithms aiming to prepare for the next generation of astronomical data.
△ Less
Submitted 4 May, 2022; v1 submitted 22 November, 2021;
originally announced November 2021.
-
Fink: early supernovae Ia classification using active learning
Authors:
Marco Leoni,
Emille E. O. Ishida,
Julien Peloton,
Anais Möller
Abstract:
We describe how the Fink broker early supernova Ia classifier optimizes its ML classifications by employing an active learning (AL) strategy. We demonstrate the feasibility of implementation of such strategies in the current Zwicky Transient Facility (ZTF) public alert data stream. We compare the performance of two AL strategies: uncertainty sampling and random sampling. Our pipeline consists of 3…
▽ More
We describe how the Fink broker early supernova Ia classifier optimizes its ML classifications by employing an active learning (AL) strategy. We demonstrate the feasibility of implementation of such strategies in the current Zwicky Transient Facility (ZTF) public alert data stream. We compare the performance of two AL strategies: uncertainty sampling and random sampling. Our pipeline consists of 3 stages: feature extraction, classification and learning strategy. Starting from an initial sample of 10 alerts (5 SN Ia and 5 non-Ia), we let the algorithm identify which alert should be added to the training sample. The system is allowed to evolve through 300 iterations. Our data set consists of 23 840 alerts from the ZTF with confirmed classification via cross-match with SIMBAD database and the Transient name server (TNS), 1 600 of which were SNe Ia (1 021 unique objects). The data configuration, after the learning cycle was completed, consists of 310 alerts for training and 23 530 for testing. Averaging over 100 realizations, the classifier achieved 89% purity and 54% efficiency. From 01/November/2020 to 31/October/2021 Fink has applied its early supernova Ia module to the ZTF stream and communicated promising SN Ia candidates to the TNS. From the 535 spectroscopically classified Fink candidates, 459 (86%) were proven to be SNe Ia. Our results confirm the effectiveness of active learning strategies for guiding the construction of optimal training samples for astronomical classifiers. It demonstrates in real data that the performance of learning algorithms can be highly improved without the need of extra computational resources or overwhelmingly large training samples. This is, to our knowledge, the first application of AL to real alerts data.
△ Less
Submitted 20 April, 2022; v1 submitted 22 November, 2021;
originally announced November 2021.
-
Probabilistic modeling of asteroid diameters from Gaia DR2 errors
Authors:
Rafael S. de Souza,
Alberto Krone-Martins,
Valerio Carruba,
Rita de Cassia Domingos,
Emille E. O. Ishida,
Safwan Alijbaae,
Mariela Huaman Espinoza,
William Barletta
Abstract:
The Gaia Data Release 2 provides precise astrometry for nearly 1.5 billion sources across the entire sky, including several thousand asteroids. In this work, we provide evidence that reasonably large asteroids (diameter $>$ 20 km) have high correlations with Gaia relative flux uncertainties and systematic right ascension errors. We further capture these correlations using a logistic Bayesian addit…
▽ More
The Gaia Data Release 2 provides precise astrometry for nearly 1.5 billion sources across the entire sky, including several thousand asteroids. In this work, we provide evidence that reasonably large asteroids (diameter $>$ 20 km) have high correlations with Gaia relative flux uncertainties and systematic right ascension errors. We further capture these correlations using a logistic Bayesian additive regression tree model. We compile a small list of probable large asteroids that can be targeted for direct diameter measurements and shape reconstruction.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
A high pitch angle structure in the Sagittarius Arm
Authors:
M. A. Kuhn,
R. A. Benjamin,
C. Zucker,
A. Krone-Martins,
R. S. de Souza,
A. Castro-Ginard,
E. E. O. Ishida,
M. S. Povich,
L. A. Hillenbrand
Abstract:
Context: In spiral galaxies, star formation tends to trace features of the spiral pattern, including arms, spurs, feathers, and branches. However, in our own Milky Way, it has been challenging to connect individual star-forming regions to their larger Galactic environment owing to our perspective from within the disk. One feature in nearly all modern models of the Milky Way is the Sagittarius Arm,…
▽ More
Context: In spiral galaxies, star formation tends to trace features of the spiral pattern, including arms, spurs, feathers, and branches. However, in our own Milky Way, it has been challenging to connect individual star-forming regions to their larger Galactic environment owing to our perspective from within the disk. One feature in nearly all modern models of the Milky Way is the Sagittarius Arm, located inward of the Sun with a pitch angle of ~12 deg. Aims: We map the 3D locations and velocities of star-forming regions in a segment of the Sagittarius Arm using young stellar objects (YSOs) from the Spitzer/IRAC Candidate YSO (SPICY) catalog to compare their distribution to models of the arm. Methods: Distances and velocities for these objects are derived from Gaia EDR3 astrometry and molecular line surveys. We infer parallaxes and proper motions for spatially clustered groups of YSOs and estimate their radial velocities from the velocities of spatially associated molecular clouds. Results: We identify 25 star-forming regions in the Galactic longitude range l~4.0-18.5 deg arranged in a narrow, ~1 kpc long linear structure with a high pitch angle of $ψ= 56$ deg and a high aspect ratio of ~7:1. This structure includes massive star-forming regions such as M8, M16, M17, and M20. The motions in the structure are remarkably coherent, with velocities in the direction of Galactic rotation of $240\pm3$ km/s (slightly higher than average) and slight drifts toward the Galactic center (-4.3 km/s) and in the negative Z direction (-2.9 km/s). The rotational shear experienced by the structure is 4.6 km/s/kpc. Conclusions: The observed 56 deg pitch angle is remarkably high for a segment of the Sagittarius Arm. We discuss possible interpretations of this feature as a substructure within the lower pitch angle Sagittarius Arm, as a spur, or as an isolated structure.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Results of the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC)
Authors:
R. Hložek,
K. A. Ponder,
A. I. Malz,
M. Dai,
G. Narayan,
E. E. O. Ishida,
T. Allam Jr,
A. Bahmanyar,
R. Biswas,
L. Galbany,
S. W. Jha,
D. O. Jones,
R. Kessler,
M. Lochner,
A. A. Mahabal,
K. S. Mandel,
J. R. Martínez-Galarza,
J. D. McEwen,
D. Muthukrishna,
H. V. Peiris,
C. M. Peters,
C. N. Setzer
Abstract:
Next-generation surveys like the Legacy Survey of Space and Time (LSST) on the Vera C. Rubin Observatory will generate orders of magnitude more discoveries of transients and variable stars than previous surveys. To prepare for this data deluge, we developed the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC), a competition which aimed to catalyze the development of ro…
▽ More
Next-generation surveys like the Legacy Survey of Space and Time (LSST) on the Vera C. Rubin Observatory will generate orders of magnitude more discoveries of transients and variable stars than previous surveys. To prepare for this data deluge, we developed the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC), a competition which aimed to catalyze the development of robust classifiers under LSST-like conditions of a non-representative training set for a large photometric test set of imbalanced classes. Over 1,000 teams participated in PLAsTiCC, which was hosted in the Kaggle data science competition platform between Sep 28, 2018 and Dec 17, 2018, ultimately identifying three winners in February 2019. Participants produced classifiers employing a diverse set of machine learning techniques including hybrid combinations and ensemble averages of a range of approaches, among them boosted decision trees, neural networks, and multi-layer perceptrons. The strong performance of the top three classifiers on Type Ia supernovae and kilonovae represent a major improvement over the current state-of-the-art within astronomy. This paper summarizes the most promising methods and evaluates their results in detail, highlighting future directions both for classifier development and simulation needs for a next generation PLAsTiCC data set.
△ Less
Submitted 22 December, 2020;
originally announced December 2020.
-
Anomaly detection in the Zwicky Transient Facility DR3
Authors:
K. L. Malanchev,
M. V. Pruzhinskaya,
V. S. Korolev,
P. D. Aleo,
M. V. Kornilov,
E. E. O. Ishida,
V. V. Krushinsky,
F. Mondon,
S. Sreejith,
A. A. Volnova,
A. A. Belinski,
A. V. Dodin,
A. M. Tatarnikov,
S. G. Zheltoukhov
Abstract:
We present results from applying the SNAD anomaly detection pipeline to the third public data release of the Zwicky Transient Facility (ZTF DR3). The pipeline is composed of 3 stages: feature extraction, search of outliers with machine learning algorithms and anomaly identification with followup by human experts. Our analysis concentrates in three ZTF fields, comprising more than 2.25 million obje…
▽ More
We present results from applying the SNAD anomaly detection pipeline to the third public data release of the Zwicky Transient Facility (ZTF DR3). The pipeline is composed of 3 stages: feature extraction, search of outliers with machine learning algorithms and anomaly identification with followup by human experts. Our analysis concentrates in three ZTF fields, comprising more than 2.25 million objects. A set of 4 automatic learning algorithms was used to identify 277 outliers, which were subsequently scrutinised by an expert. From these, 188 (68%) were found to be bogus light curves -- including effects from the image subtraction pipeline as well as overlap** between a star and a known asteroid, 66 (24%) were previously reported sources whereas 23 (8%) correspond to non-catalogued objects, with the two latter cases of potential scientific interest (e. g. 1 spectroscopically confirmed RS Canum Venaticorum star, 4 supernovae candidates, 1 red dwarf flare). Moreover, using results from the expert analysis, we were able to identify a simple bi-dimensional relation which can be used to aid filtering potentially bogus light curves in future studies. We provide a complete list of objects with potential scientific application so they can be further scrutinised by the community. These results confirm the importance of combining automatic machine learning algorithms with domain knowledge in the construction of recommendation systems for astronomy. Our code is publicly available at https://github.com/snad-space/zwad
△ Less
Submitted 2 February, 2021; v1 submitted 2 December, 2020;
originally announced December 2020.
-
SPICY: The Spitzer/IRAC Candidate YSO Catalog for the Inner Galactic Midplane
Authors:
Michael A. Kuhn,
Rafael S. de Souza,
Alberto Krone-Martins,
Alfred Castro-Ginard,
Emille E. O. Ishida,
Matthew S. Povich,
Lynne A. Hillenbrand
Abstract:
We present ~120,000 Spitzer/IRAC candidate young stellar objects (YSOs) based on surveys of the Galactic midplane between l~255 deg and 110 deg, including the GLIMPSE I, II, and 3D, Vela-Carina, Cygnus X, and SMOG surveys (613 square degrees), augmented by near-infrared catalogs. We employed a classification scheme that uses the flexibility of a tailored statistical learning method and curated YSO…
▽ More
We present ~120,000 Spitzer/IRAC candidate young stellar objects (YSOs) based on surveys of the Galactic midplane between l~255 deg and 110 deg, including the GLIMPSE I, II, and 3D, Vela-Carina, Cygnus X, and SMOG surveys (613 square degrees), augmented by near-infrared catalogs. We employed a classification scheme that uses the flexibility of a tailored statistical learning method and curated YSO datasets to take full advantage of IRAC's spatial resolution and sensitivity in the mid-infrared ~3-9 micron range. Multi-wavelength color/magnitude distributions provide intuition about how the classifier separates YSOs from other red IRAC sources and validate that the sample is consistent with expectations for disk/envelope-bearing pre-main-sequence stars. We also identify areas of IRAC color space associated with objects with strong silicate absorption or polycyclic aromatic hydrocarbon emission. Spatial distributions and variability properties help corroborate the youthful nature of our sample. Most of the candidates are in regions with mid-IR nebulosity, associated with star-forming clouds, but others appear distributed in the field. Using Gaia DR2 distance estimates, we find groups of YSO candidates associated with the Local Arm, the Sagittarius-Carina Arm, and the Scutum-Centaurus Arm. Candidate YSOs visible to the Zwicky Transient Facility tend to exhibit higher variability amplitudes than randomly selected field stars of the same magnitude, with many high-amplitude variables having light-curve morphologies characteristic of YSOs. Given that no current or planned instruments will significantly exceed IRAC's spatial resolution while possessing its wide-area map** capabilities, Spitzer-based catalogs such as ours will remain the main resources for mid-infrared YSOs in the Galactic midplane for the near future.
△ Less
Submitted 12 July, 2021; v1 submitted 25 November, 2020;
originally announced November 2020.
-
Active learning with RESSPECT: Resource allocation for extragalactic astronomical transients
Authors:
Noble Kennamer,
Emille E. O. Ishida,
Santiago Gonzalez-Gaitan,
Rafael S. de Souza,
Alexander Ihler,
Kara Ponder,
Ricardo Vilalta,
Anais Moller,
David O. Jones,
Mi Dai,
Alberto Krone-Martins,
Bruno Quint,
Sreevarsha Sreejith,
Alex I. Malz,
Lluis Galbany
Abstract:
The recent increase in volume and complexity of available astronomical data has led to a wide use of supervised machine learning techniques. Active learning strategies have been proposed as an alternative to optimize the distribution of scarce labeling resources. However, due to the specific conditions in which labels can be acquired, fundamental assumptions, such as sample representativeness and…
▽ More
The recent increase in volume and complexity of available astronomical data has led to a wide use of supervised machine learning techniques. Active learning strategies have been proposed as an alternative to optimize the distribution of scarce labeling resources. However, due to the specific conditions in which labels can be acquired, fundamental assumptions, such as sample representativeness and labeling cost stability cannot be fulfilled. The Recommendation System for Spectroscopic follow-up (RESSPECT) project aims to enable the construction of optimized training samples for the Rubin Observatory Legacy Survey of Space and Time (LSST), taking into account a realistic description of the astronomical data environment. In this work, we test the robustness of active learning techniques in a realistic simulated astronomical data scenario. Our experiment takes into account the evolution of training and pool samples, different costs per object, and two different sources of budget. Results show that traditional active learning strategies significantly outperform random sampling. Nevertheless, more complex batch strategies are not able to significantly overcome simple uncertainty sampling techniques. Our findings illustrate three important points: 1) active learning strategies are a powerful tool to optimize the label-acquisition task in astronomy, 2) for upcoming large surveys like LSST, such techniques allow us to tailor the construction of the training sample for the first day of the survey, and 3) the peculiar data environment related to the detection of astronomical transients is a fertile ground that calls for the development of tailored machine learning algorithms.
△ Less
Submitted 26 October, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Fink, a new generation of broker for the LSST community
Authors:
Anais Möller,
Julien Peloton,
Emille E. O. Ishida,
Chris Arnault,
Etienne Bachelet,
Tristan Blaineau,
Dominique Boutigny,
Abhishek Chauhan,
Emmanuel Gangler,
Fabio Hernandez,
Julius Hrivnac,
Marco Leoni,
Nicolas Leroy,
Marc Moniez,
Sacha Pateyron,
Adrien Ramparison,
Damien Turpin,
Réza Ansari,
Tarek Allam Jr.,
Armelle Bajat,
Biswajit Biswas,
Alexandre Boucaud,
Johan Bregeon,
Jean-Eric Campagne,
Johann Cohen-Tanugi
, et al. (11 additional authors not shown)
Abstract:
Fink is a broker designed to enable science with large time-domain alert streams such as the one from the upcoming Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). It exhibits traditional astronomy broker features such as automatised ingestion, annotation, selection and redistribution of promising alerts for transient science. It is also designed to go beyond traditional broker fe…
▽ More
Fink is a broker designed to enable science with large time-domain alert streams such as the one from the upcoming Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). It exhibits traditional astronomy broker features such as automatised ingestion, annotation, selection and redistribution of promising alerts for transient science. It is also designed to go beyond traditional broker features by providing real-time transient classification which is continuously improved by using state-of-the-art Deep Learning and Adaptive Learning techniques. These evolving added values will enable more accurate scientific output from LSST photometric data for diverse science cases while also leading to a higher incidence of new discoveries which shall accompany the evolution of the survey. In this paper we introduce Fink, its science motivation, architecture and current status including first science verification cases using the Zwicky Transient Facility alert stream.
△ Less
Submitted 16 December, 2020; v1 submitted 21 September, 2020;
originally announced September 2020.
-
Periodic Astrometric Signal Recovery through Convolutional Autoencoders
Authors:
Michele Delli Veneri,
Louis Desdoigts,
Morgan A. Schmitz,
Alberto Krone-Martins,
Emille E. O. Ishida,
Peter Tuthill,
Rafael S. de Souza,
Richard Scalzo,
Massimo Brescia,
Giuseppe Longo,
Antonio Picariello
Abstract:
Astrometric detection involves a precise measurement of stellar positions, and is widely regarded as the leading concept presently ready to find earth-mass planets in temperate orbits around nearby sun-like stars. The TOLIMAN space telescope[39] is a low-cost, agile mission concept dedicated to narrow-angle astrometric monitoring of bright binary stars. In particular the mission will be optimised…
▽ More
Astrometric detection involves a precise measurement of stellar positions, and is widely regarded as the leading concept presently ready to find earth-mass planets in temperate orbits around nearby sun-like stars. The TOLIMAN space telescope[39] is a low-cost, agile mission concept dedicated to narrow-angle astrometric monitoring of bright binary stars. In particular the mission will be optimised to search for habitable-zone planets around Alpha Centauri AB. If the separation between these two stars can be monitored with sufficient precision, tiny perturbations due to the gravitational tug from an unseen planet can be witnessed and, given the configuration of the optical system, the scale of the shifts in the image plane are about one millionth of a pixel. Image registration at this level of precision has never been demonstrated (to our knowledge) in any setting within science. In this paper we demonstrate that a Deep Convolutional Auto-Encoder is able to retrieve such a signal from simplified simulations of the TOLIMAN data and we present the full experimental pipeline to recreate out experiments from the simulations to the signal analysis. In future works, all the more realistic sources of noise and systematic effects present in the real-world system will be injected into the simulations.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
21st Century Statistical and Computational Challenges in Astrophysics
Authors:
Eric D. Feigelson,
Rafael S. de Souza,
Emille E. O. Ishida,
Gutti Jogesh Babu
Abstract:
Modern astronomy has been rapidly increasing our ability to see deeper into the universe, acquiring enormous samples of cosmic populations. Gaining astrophysical insights from these datasets requires a wide range of sophisticated statistical and machine learning methods. Long-standing problems in cosmology include characterization of galaxy clustering and estimation of galaxy distances from photom…
▽ More
Modern astronomy has been rapidly increasing our ability to see deeper into the universe, acquiring enormous samples of cosmic populations. Gaining astrophysical insights from these datasets requires a wide range of sophisticated statistical and machine learning methods. Long-standing problems in cosmology include characterization of galaxy clustering and estimation of galaxy distances from photometric colors. Bayesian inference, central to linking astronomical data to nonlinear astrophysical models, addresses problems in solar physics, properties of star clusters, and exoplanet systems. Likelihood-free methods are growing in importance. Detection of faint signals in complicated noise is needed to find periodic behaviors in stars and detect explosive gravitational wave events. Open issues concern treatment of heteroscedastic measurement errors and understanding probability distributions characterizing astrophysical systems. The field of astrostatistics needs increased collaboration with statisticians in the design and analysis stages of research projects, and to jointly develop new statistical methodologies. Together, they will draw more astrophysical insights into astronomical populations and the cosmos itself.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
Ridges in the Dark Energy Survey for cosmic trough identification
Authors:
Ben Moews,
Morgan A. Schmitz,
Andrew J. Lawler,
Joe Zuntz,
Alex I. Malz,
Rafael S. de Souza,
Ricardo Vilalta,
Alberto Krone-Martins,
Emille E. O. Ishida
Abstract:
Cosmic voids and their corresponding redshift-projected mass densities, known as troughs, play an important role in our attempt to model the large-scale structure of the Universe. Understanding these structures enables us to compare the standard model with alternative cosmologies, constrain the dark energy equation of state, and distinguish between different gravitational theories. In this paper,…
▽ More
Cosmic voids and their corresponding redshift-projected mass densities, known as troughs, play an important role in our attempt to model the large-scale structure of the Universe. Understanding these structures enables us to compare the standard model with alternative cosmologies, constrain the dark energy equation of state, and distinguish between different gravitational theories. In this paper, we extend the subspace-constrained mean shift algorithm, a recently introduced method to estimate density ridges, and apply it to 2D weak lensing mass density maps from the Dark Energy Survey Y1 data release to identify curvilinear filamentary structures. We compare the obtained ridges with previous approaches to extract trough structure in the same data, and apply curvelets as an alternative wavelet-based method to constrain densities. We then invoke the Wasserstein distance between noisy and noiseless simulations to validate the denoising capabilities of our method. Our results demonstrate the viability of ridge estimation as a precursor for denoising weak lensing observables to recover the large-scale structure, paving the way for a more versatile and effective search for troughs.
△ Less
Submitted 14 November, 2022; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Active Anomaly Detection for time-domain discoveries
Authors:
Emille E. O. Ishida,
Matwey V. Kornilov,
Konstantin L. Malanchev,
Maria V. Pruzhinskaya,
Alina A. Volnova,
Vladimir S. Korolev,
Florian Mondon,
Sreevarsha Sreejith,
Anastasia Malancheva,
Shubhomoy Das
Abstract:
We present the first evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets. Our method follows an active learning strategy where the learning algorithm chooses objects which can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine le…
▽ More
We present the first evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets. Our method follows an active learning strategy where the learning algorithm chooses objects which can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine learning model, allowing its accuracy to evolve with each new information. For the case of anomaly detection, the algorithm aims to maximize the number of scientifically interesting anomalies presented to the expert by slightly modifying the weights of a traditional Isolation Forest (IF) at each iteration. In order to demonstrate the potential of such techniques, we apply the Active Anomaly Discovery (AAD) algorithm to 2 data sets: simulated light curves from the PLAsTiCC challenge and real light curves from the Open Supernova Catalog. We compare the AAD results to those of a static IF. For both methods, we performed a detailed analysis for all objects with the ~2% highest anomaly scores. We show that, in the real data scenario, AAD was able to identify ~80\% more true anomalies than the IF. This result is the first evidence that AAD algorithms can play a central role in the search for new physics in the era of large scale sky surveys.
△ Less
Submitted 14 July, 2020; v1 submitted 29 September, 2019;
originally announced September 2019.
-
Machine Learning and the future of Supernova Cosmology
Authors:
Emille E. O. Ishida
Abstract:
Machine Learning methods will play a fundamental role in our ability to optimize the science output from the next generation of large scale surveys. Given the peculiarities of astronomical data, it is crucial that algorithms are adapted to the data situation at hand. In this comment, I review the recent efforts towards the development of automatic systems to identify and classify supernova with th…
▽ More
Machine Learning methods will play a fundamental role in our ability to optimize the science output from the next generation of large scale surveys. Given the peculiarities of astronomical data, it is crucial that algorithms are adapted to the data situation at hand. In this comment, I review the recent efforts towards the development of automatic systems to identify and classify supernova with the goal of enabling their use as cosmological standard candles.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Anomaly Detection in the Open Supernova Catalog
Authors:
Maria V. Pruzhinskaya,
Konstantin L. Malanchev,
Matwey V. Kornilov,
Emille E. O. Ishida,
Florian Mondon,
Alina A. Volnova,
Vladimir S. Korolev
Abstract:
In the upcoming decade large astronomical surveys will discover millions of transients raising unprecedented data challenges in the process. Only the use of the machine learning algorithms can process such large data volumes. Most of the discovered transients will belong to the known classes of astronomical objects. However, it is expected that some transients will be rare or completely new events…
▽ More
In the upcoming decade large astronomical surveys will discover millions of transients raising unprecedented data challenges in the process. Only the use of the machine learning algorithms can process such large data volumes. Most of the discovered transients will belong to the known classes of astronomical objects. However, it is expected that some transients will be rare or completely new events of unknown physical nature. The task of finding them can be framed as an anomaly detection problem. In this work, we perform for the first time an automated anomaly detection analysis in the photometric data of the Open Supernova Catalog (OSC), which serves as a proof of concept for the applicability of these methods to future large scale surveys. The analysis consists of the following steps: 1) data selection from the OSC and approximation of the pre-processed data with Gaussian processes, 2) dimensionality reduction, 3) searching for outliers with the use of the isolation forest algorithm, 4) expert analysis of the identified outliers. The pipeline returned 81 candidate anomalies, 27 (33%) of which were confirmed to be from astrophysically peculiar objects. Found anomalies correspond to a selected sample of 1.4% of the initial automatically identified data sample of ~2000 objects. Among the identified outliers we recognised superluminous supernovae, non-classical Type Ia supernovae, unusual Type II supernovae, one active galactic nucleus and one binary microlensing event. We also found that 16 anomalies classified as supernovae in the literature are likely to be quasars or stars. Our proposed pipeline represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire. All code and products of this investigation are made publicly available.
△ Less
Submitted 22 August, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Photometry of high-redshift blended galaxies using deep learning
Authors:
Alexandre Boucaud,
Marc Huertas-Company,
Caroline Heneka,
Emille E. O. Ishida,
Nima Sedaghat,
Rafael S. de Souza,
Ben Moews,
Hervé Dole,
Marco Castellano,
Emiliano Merlin,
Valerio Roscani,
Andrea Tramacere,
Madhura Killedar,
Arlindo M. M. Trindade
Abstract:
The new generation of deep photometric surveys requires unprecedentedly precise shape and photometry measurements of billions of galaxies to achieve their main science goals. At such depths, one major limiting factor is the blending of galaxies due to line-of-sight projection, with an expected fraction of blended galaxies of up to 50%. Current deblending approaches are in most cases either too slo…
▽ More
The new generation of deep photometric surveys requires unprecedentedly precise shape and photometry measurements of billions of galaxies to achieve their main science goals. At such depths, one major limiting factor is the blending of galaxies due to line-of-sight projection, with an expected fraction of blended galaxies of up to 50%. Current deblending approaches are in most cases either too slow or not accurate enough to reach the level of requirements. This work explores the use of deep neural networks to estimate the photometry of blended pairs of galaxies in monochrome space images, similar to the ones that will be delivered by the Euclid space telescope. Using a clean sample of isolated galaxies from the CANDELS survey, we artificially blend them and train two different network models to recover the photometry of the two galaxies. We show that our approach can recover the original photometry of the galaxies before being blended with $\sim$7% accuracy without any human intervention and without any assumption on the galaxy shape. This represents an improvement of at least a factor of 4 compared to the classical SExtractor approach. We also show that forcing the network to simultaneously estimate a binary segmentation map results in a slightly improved photometry. All data products and codes will be made public to ease the comparison with other approaches on a common data set.
△ Less
Submitted 3 May, 2019;
originally announced May 2019.
-
Models and Simulations for the Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC)
Authors:
R. Kessler,
G. Narayan,
A. Avelino,
E. Bachelet,
R. Biswas,
P. J. Brown,
D. F. Chernoff,
A. J. Connolly,
M. Dai,
S. Daniel,
R. Di Stefano,
M. R. Drout,
L. Galbany,
S. González-Gaitán,
M. L. Graham,
R. Hložek,
E. E. O. Ishida,
J. Guillochon,
S. W. Jha,
D. O. Jones,
K. S. Mandel,
D. Muthukrishna,
A. O'Grady,
C. M. Peters,
J. R. Pierel
, et al. (4 additional authors not shown)
Abstract:
We describe the simulated data sample for the "Photometric LSST Astronomical Time Series Classification Challenge" (PLAsTiCC), a publicly available challenge to classify transient and variable events that will be observed by the Large Synoptic Survey Telescope (LSST), a new facility expected to start in the early 2020s. The challenge was hosted by Kaggle, ran from 2018 September 28 to 2018 Decembe…
▽ More
We describe the simulated data sample for the "Photometric LSST Astronomical Time Series Classification Challenge" (PLAsTiCC), a publicly available challenge to classify transient and variable events that will be observed by the Large Synoptic Survey Telescope (LSST), a new facility expected to start in the early 2020s. The challenge was hosted by Kaggle, ran from 2018 September 28 to 2018 December 17, and included 1,094 teams competing for prizes. Here we provide details of the 18 transient and variable source models, which were not revealed until after the challenge, and release the model libraries at https://doi.org/10.5281/zenodo.2612896. We describe the LSST Operations Simulator used to predict realistic observing conditions, and we describe the publicly available SNANA simulation code used to transform the models into observed fluxes and uncertainties in the LSST passbands (ugrizy). Although PLAsTiCC has finished, the publicly available models and simulation tools are being used within the astronomy community to further improve classification, and to study contamination in photometrically identified samples of type Ia supernova used to measure properties of dark energy. Our simulation framework will continue serving as a platform to improve the PLAsTiCC models, and to develop new models.
△ Less
Submitted 10 July, 2019; v1 submitted 27 March, 2019;
originally announced March 2019.
-
Stress testing the dark energy equation of state imprint on supernova data
Authors:
Ben Moews,
Rafael S. de Souza,
Emille E. O. Ishida,
Alex I. Malz,
Caroline Heneka,
Ricardo Vilalta,
Joe Zuntz
Abstract:
This work determines the degree to which a standard Lambda-CDM analysis based on type Ia supernovae can identify deviations from a cosmological constant in the form of a redshift-dependent dark energy equation of state w(z). We introduce and apply a novel random curve generator to simulate instances of w(z) from constraint families with increasing distinction from a cosmological constant. After pr…
▽ More
This work determines the degree to which a standard Lambda-CDM analysis based on type Ia supernovae can identify deviations from a cosmological constant in the form of a redshift-dependent dark energy equation of state w(z). We introduce and apply a novel random curve generator to simulate instances of w(z) from constraint families with increasing distinction from a cosmological constant. After producing a series of mock catalogs of binned type Ia supernovae corresponding to each w(z) curve, we perform a standard Lambda-CDM analysis to estimate the corresponding posterior densities of the absolute magnitude of type Ia supernovae, the present-day matter density, and the equation of state parameter. Using the Kullback-Leibler divergence between posterior densities as a difference measure, we demonstrate that a standard type Ia supernova cosmology analysis has limited sensitivity to extensive redshift dependencies of the dark energy equation of state. In addition, we report that larger redshift-dependent departures from a cosmological constant do not necessarily manifest easier-detectable incompatibilities with the Lambda-CDM model. Our results suggest that physics beyond the standard model may simply be hidden in plain sight.
△ Less
Submitted 5 July, 2019; v1 submitted 23 December, 2018;
originally announced December 2018.
-
Gaia DR2 unravels incompleteness of nearby cluster population: New open clusters in the direction of Perseus
Authors:
T. Cantat-Gaudin,
A. Krone-Martins,
N. Sedaghat,
A. Farahi,
R. S. de Souza,
R. Skalidis,
A. I. Malz,
S. Macêdo,
B. Moews,
C. Jordi,
A. Moitinho,
A. Castro-Ginard,
E. E. O. Ishida,
C. Heneka,
A. Boucaud,
A. M. M. Trindade
Abstract:
Open clusters (OCs) are popular tracers of the structure and evolutionary history of the Galactic disk. The OC population is often considered to be complete within 1.8 kpc of the Sun. The recent Gaia Data Release 2 (DR2) allows the latter claim to be challenged. We perform a systematic search for new OCs in the direction of Perseus using precise and accurate astrometry from Gaia DR2. We implement…
▽ More
Open clusters (OCs) are popular tracers of the structure and evolutionary history of the Galactic disk. The OC population is often considered to be complete within 1.8 kpc of the Sun. The recent Gaia Data Release 2 (DR2) allows the latter claim to be challenged. We perform a systematic search for new OCs in the direction of Perseus using precise and accurate astrometry from Gaia DR2. We implement a coarse-to-fine search method. First, we exploit spatial proximity using a fast density-aware partitioning of the sky via a k-d tree in the spatial domain of Galactic coordinates, (l, b). Secondly, we employ a Gaussian mixture model in the proper motion space to quickly tag fields around OC candidates. Thirdly, we apply an unsupervised membership assignment method, UPMASK, to scrutinise the candidates. We visually inspect colour-magnitude diagrams to validate the detected objects. Finally, we perform a diagnostic to quantify the significance of each identified overdensity in proper motion and in parallax space We report the discovery of 41 new stellar clusters. This represents an increment of at least 20% of the previously known OC population in this volume of the Milky Way. We also report on the clear identification of NGC 886, an object previously considered an asterism. This letter challenges the previous claim of a near-complete sample of open clusters up to 1.8 kpc. Our results reveal that this claim requires revision, and a complete census of nearby open clusters is yet to be found.
△ Less
Submitted 21 March, 2019; v1 submitted 12 October, 2018;
originally announced October 2018.
-
The Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC): Data set
Authors:
The PLAsTiCC team,
Tarek Allam Jr.,
Anita Bahmanyar,
Rahul Biswas,
Mi Dai,
Lluís Galbany,
Renée Hložek,
Emille E. O. Ishida,
Saurabh W. Jha,
David O. Jones,
Richard Kessler,
Michelle Lochner,
Ashish A. Mahabal,
Alex I. Malz,
Kaisey S. Mandel,
Juan Rafael Martínez-Galarza,
Jason D. McEwen,
Daniel Muthukrishna,
Gautham Narayan,
Hiranya Peiris,
Christina M. Peters,
Kara Ponder,
Christian N. Setzer,
The LSST Dark Energy Science Collaboration,
The LSST Transients
, et al. (1 additional authors not shown)
Abstract:
The Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC) is an open data challenge to classify simulated astronomical time-series data in preparation for observations from the Large Synoptic Survey Telescope (LSST), which will achieve first light in 2019 and commence its 10-year main survey in 2022. LSST will revolutionize our understanding of the changing sky, discovering…
▽ More
The Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC) is an open data challenge to classify simulated astronomical time-series data in preparation for observations from the Large Synoptic Survey Telescope (LSST), which will achieve first light in 2019 and commence its 10-year main survey in 2022. LSST will revolutionize our understanding of the changing sky, discovering and measuring millions of time-varying objects.
In this challenge, we pose the question: how well can we classify objects in the sky that vary in brightness from simulated LSST time-series data, with all its challenges of non-representativity? In this note we explain the need for a data challenge to help classify such astronomical sources and describe the PLAsTiCC data set and Kaggle data challenge, noting that while the references are provided for context, they are not needed to participate in the challenge.
△ Less
Submitted 28 September, 2018;
originally announced October 2018.
-
The Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC): Selection of a performance metric for classification probabilities balancing diverse science goals
Authors:
A. I. Malz,
R. Hložek,
T. Allam Jr,
A. Bahmanyar,
R. Biswas,
M. Dai,
L. Galbany,
E. E. O. Ishida,
S. W. Jha,
D. O. Jones,
R. Kessler,
M. Lochner,
A. A. Mahabal,
K. S. Mandel,
J. R. Martínez-Galarza,
J. D. McEwen,
D. Muthukrishna,
G. Narayan,
H. Peiris,
C. M. Peters,
K. A. Ponder,
C. N. Setzer,
The LSST Dark Energy Science Collaboration,
The LSST Transients,
Variable Stars Science Collaboration
Abstract:
Classification of transient and variable light curves is an essential step in using astronomical observations to develop an understanding of their underlying physical processes. However, upcoming deep photometric surveys, including the Large Synoptic Survey Telescope (LSST), will produce a deluge of low signal-to-noise data for which traditional labeling procedures are inappropriate. Probabilistic…
▽ More
Classification of transient and variable light curves is an essential step in using astronomical observations to develop an understanding of their underlying physical processes. However, upcoming deep photometric surveys, including the Large Synoptic Survey Telescope (LSST), will produce a deluge of low signal-to-noise data for which traditional labeling procedures are inappropriate. Probabilistic classification is more appropriate for the data but are incompatible with the traditional metrics used on deterministic classifications. Furthermore, large survey collaborations intend to use these classification probabilities for diverse science objectives, indicating a need for a metric that balances a variety of goals. We describe the process used to develop an optimal performance metric for an open classification challenge that seeks probabilistic classifications and must serve many scientific interests. The Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC) is an open competition aiming to identify promising techniques for obtaining classification probabilities of transient and variable objects by engaging a broader community both within and outside astronomy. Using mock classification probability submissions emulating archetypes of those anticipated of PLAsTiCC, we compare the sensitivity of metrics of classification probabilities under various weighting schemes, finding that they yield qualitatively consistent results. We choose as a metric for PLAsTiCC a weighted modification of the cross-entropy because it can be meaningfully interpreted. Finally, we propose extensions of our methodology to ever more complex challenge goals and suggest some guiding principles for approaching the choice of a metric of probabilistic classifications.
△ Less
Submitted 31 July, 2021; v1 submitted 28 September, 2018;
originally announced September 2018.
-
Optimizing spectroscopic follow-up strategies for supernova photometric classification with active learning
Authors:
E. E. O. Ishida,
R. Beck,
S. Gonzalez-Gaitan,
R. S. de Souza,
A. Krone-Martins,
J. W. Barrett,
N. Kennamer,
R. Vilalta,
J. M. Burgess,
B. Quint,
A. Z. Vitorelli,
A. Mahabal,
E. Gangler
Abstract:
We report a framework for spectroscopic follow-up design for optimizing supernova photometric classification. The strategy accounts for the unavoidable mismatch between spectroscopic and photometric samples, and can be used even in the beginning of a new survey -- without any initial training set. The framework falls under the umbrella of active learning (AL), a class of algorithms that aims to mi…
▽ More
We report a framework for spectroscopic follow-up design for optimizing supernova photometric classification. The strategy accounts for the unavoidable mismatch between spectroscopic and photometric samples, and can be used even in the beginning of a new survey -- without any initial training set. The framework falls under the umbrella of active learning (AL), a class of algorithms that aims to minimize labelling costs by identifying a few, carefully chosen, objects which have high potential in improving the classifier predictions. As a proof of concept, we use the simulated data released after the Supernova Photometric Classification Challenge (SNPCC) and a random forest classifier. Our results show that, using only 12\% the number of training objects in the SNPCC spectroscopic sample, this approach is able to double purity results. Moreover, in order to take into account multiple spectroscopic observations in the same night, we propose a semi-supervised batch-mode AL algorithm which selects a set of $N=5$ most informative objects at each night. In comparison with the initial state using the traditional approach, our method achieves 2.3 times higher purity and comparable figure of merit results after only 180 days of observation, or 800 queries (73% of the SNPCC spectroscopic sample size). Such results were obtained using the same amount of spectroscopic time necessary to observe the original SNPCC spectroscopic sample, showing that this type of strategy is feasible with current available spectroscopic resources. The code used in this work is available in the COINtoolbox: https://github.com/COINtoolbox/ActSNClass .
△ Less
Submitted 3 January, 2019; v1 submitted 10 April, 2018;
originally announced April 2018.
-
Spatial field reconstruction with INLA: Application to IFU galaxy data
Authors:
S. González-Gaitán,
R. S. de Souza,
A. Krone-Martins,
E. Cameron,
P. Coelho,
L. Galbany,
E. E. O. Ishida
Abstract:
Astronomical observations of extended sources, such as cubes of integral field spectroscopy (IFS), encode auto-correlated spatial structures that cannot be optimally exploited by standard methodologies. This work introduces a novel technique to model IFS datasets, which treats the observed galaxy properties as realizations of an unobserved Gaussian Markov random field. The method is computationall…
▽ More
Astronomical observations of extended sources, such as cubes of integral field spectroscopy (IFS), encode auto-correlated spatial structures that cannot be optimally exploited by standard methodologies. This work introduces a novel technique to model IFS datasets, which treats the observed galaxy properties as realizations of an unobserved Gaussian Markov random field. The method is computationally efficient, resilient to the presence of low-signal-to-noise regions, and uses an alternative to Markov Chain Monte Carlo for fast Bayesian inference, the Integrated Nested Laplace Approximation (INLA). As a case study, we analyse 721 IFS data cubes of nearby galaxies from the CALIFA and PISCO surveys, for which we retrieve the maps of the following physical properties: age, metallicity, mass and extinction. The proposed Bayesian approach, built on a generative representation of the galaxy properties, enables the creation of synthetic images, recovery of areas with bad pixels, and an increased power to detect structures in datasets subject to substantial noise and/or sparsity of sampling. A snippet code to reproduce the analysis of this paper is available in the COIN toolbox, together with the field reconstructions of the CALIFA and PISCO samples.
△ Less
Submitted 30 December, 2018; v1 submitted 17 February, 2018;
originally announced February 2018.
-
On the realistic validation of photometric redshifts, or why Teddy will never be Happy
Authors:
R. Beck,
C. -A. Lin,
E. E. O. Ishida,
F. Gieseke,
R. S. de Souza,
M. V. Costa-Duarte,
M. W. Hattab,
A. Krone-Martins
Abstract:
Two of the main problems encountered in the development and accurate validation of photometric redshift (photo-z) techniques are the lack of spectroscopic coverage in feature space (e.g. colours and magnitudes) and the mismatch between photometric error distributions associated with the spectroscopic and photometric samples. Although these issues are well known, there is currently no standard benc…
▽ More
Two of the main problems encountered in the development and accurate validation of photometric redshift (photo-z) techniques are the lack of spectroscopic coverage in feature space (e.g. colours and magnitudes) and the mismatch between photometric error distributions associated with the spectroscopic and photometric samples. Although these issues are well known, there is currently no standard benchmark allowing a quantitative analysis of their impact on the final photo-z estimation. In this work, we present two galaxy catalogues, Teddy and Happy, built to enable a more demanding and realistic test of photo-z methods. Using photometry from the Sloan Digital Sky Survey and spectroscopy from a collection of sources, we constructed datasets which mimic the biases between the underlying probability distribution of the real spectroscopic and photometric sample. We demonstrate the potential of these catalogues by submitting them to the scrutiny of different photo-z methods, including machine learning (ML) and template fitting approaches. Beyond the expected bad results from most ML algorithms for cases with missing coverage in feature space, we were able to recognize the superiority of global models in the same situation and the general failure across all types of methods when incomplete coverage is convoluted with the presence of photometric errors - a data situation which photo-z methods were not trained to deal with up to now and which must be addressed by future large scale surveys. Our catalogues represent the first controlled environment allowing a straightforward implementation of such tests. The data are publicly available within the COINtoolbox (https://github.com/COINtoolbox/photoz_catalogues).
△ Less
Submitted 20 March, 2017; v1 submitted 30 January, 2017;
originally announced January 2017.
-
A metric space for type Ia supernova spectra: a new method to assess explosion scenarios
Authors:
Michele Sasdelli,
W. Hillebrandt,
M. Kromer,
E. E. O. Ishida,
F. K. Roepke,
S. A. Simm,
R. Pakmor
Abstract:
Over the past years type Ia supernovae (SNe Ia) have become a major tool to determine the expansion history of the Universe, and considerable attention has been given to, both, observations and models of these events. However, until now, their progenitors are not known. The observed diversity of light curves and spectra seems to point at different progenitor channels and explosion mechanisms. Here…
▽ More
Over the past years type Ia supernovae (SNe Ia) have become a major tool to determine the expansion history of the Universe, and considerable attention has been given to, both, observations and models of these events. However, until now, their progenitors are not known. The observed diversity of light curves and spectra seems to point at different progenitor channels and explosion mechanisms. Here, we present a new way to compare model predictions with observations in a systematic way. Our method is based on the construction of a metric space for SN Ia spectra by means of linear Principal Component Analysis (PCA), taking care of missing and/or noisy data, and making use of Partial Least Square regression (PLS) to find correlations between spectral properties and photometric data. We investigate realizations of the three major classes of explosion models that are presently discussed: delayed-detonation Chandrasekhar-mass explosions, sub-Chandrasekhar-mass detonations, and double-degenerate mergers, and compare them with data. We show that in the PC space all scenarios have observed counterparts, supporting the idea that different progenitors are likely. However, all classes of models face problems in reproducing the observed correlations between spectral properties and light curves and colors. Possible reasons are briefly discussed.
△ Less
Submitted 21 December, 2016;
originally announced December 2016.
-
The supernova impostor PSN J09132750+7627410 and its progenitor
Authors:
L. Tartaglia,
N. Elias-Rosa,
A. Pastorello,
S. Benetti,
S. Taubenberger,
E. Cappellaro,
G. Cortini,
V. Granata,
E. E. O. Ishida,
A. Morales-Garoffolo,
U. M. Noebauer,
P. Ochner,
L. Tomasella,
S. Zaggia
Abstract:
We report the results of our follow-up campaign of the supernova impostor PSN J09132750+7627410, based on optical data covering $\sim250\,\rm{d}$. From the beginning, the transient shows prominent narrow Balmer lines with P-Cygni profiles, with a blue-shifted absorption component becoming more prominent with time. Along the $\sim3\,\rm{months}$ of the spectroscopic monitoring, broad components are…
▽ More
We report the results of our follow-up campaign of the supernova impostor PSN J09132750+7627410, based on optical data covering $\sim250\,\rm{d}$. From the beginning, the transient shows prominent narrow Balmer lines with P-Cygni profiles, with a blue-shifted absorption component becoming more prominent with time. Along the $\sim3\,\rm{months}$ of the spectroscopic monitoring, broad components are never detected in the hydrogen lines, suggesting that these features are produced in slowly expanding material. The transient reaches an absolute magnitude $M_r=-13.60\pm0.19\,\rm{mag}$ at maximum, a typical luminosity for supernova impostors. Amateur astronomers provided $\sim4\,\rm{years}$ of archival observations of the host galaxy, NGC 2748. The detection of the quiescent progenitor star in archival images obtained with the Hubble Space Telescope suggests it to be an $18-20$\msun white-yellow supergiant.
△ Less
Submitted 15 April, 2016;
originally announced April 2016.
-
Breaking the color-reddening degeneracy in type Ia supernovae
Authors:
M. Sasdelli,
E. E. O. Ishida,
W. Hillebrandt,
C. Ashall,
P. A. Mazzali,
S. Prentice
Abstract:
A new method to study the intrinsic color and luminosity of type Ia supernovae (SNe Ia) is presented. A metric space built using principal component analysis (PCA) on spectral series SNe Ia between -12.5 and +17.5 days from B maximum is used as a set of predictors. This metric space is built to be insensitive to reddening. Hence, it does not predict the part of color excess due to dust-extinction.…
▽ More
A new method to study the intrinsic color and luminosity of type Ia supernovae (SNe Ia) is presented. A metric space built using principal component analysis (PCA) on spectral series SNe Ia between -12.5 and +17.5 days from B maximum is used as a set of predictors. This metric space is built to be insensitive to reddening. Hence, it does not predict the part of color excess due to dust-extinction. At the same time, the rich variability of SN Ia spectra is a good predictor of a large fraction of the intrinsic color variability. Such metric space is a good predictor of the epoch when the maximum in the B-V color curve is reached. Multivariate Partial Least Square (PLS) regression predicts the intrinsic B band light-curve and the intrinsic B-V color curve up to a month after maximum. This allows to study the relation between the light curves of SNe Ia and their spectra. The total-to-selective extinction ratio RV in the host-galaxy of SNe Ia is found, on average, to be consistent with typical Milky-Way values. This analysis shows the importance of collecting spectra to study SNe Ia, even with large sample publicly available. Future automated surveys as LSST will provide a large number of light curves. The analysis shows that observing accompaning spectra for a significative number of SNe will be important even in the case of "normal" SNe Ia.
△ Less
Submitted 13 April, 2016;
originally announced April 2016.
-
Large Magellanic Cloud Near-Infrared Synoptic Survey. III. A Statistical Study of Non-Linearity in the Leavitt Laws
Authors:
Anupam Bhardwaj,
Shashi M. Kanbur,
Lucas M. Macri,
Harinder P. Singh,
Chow-Choong Ngeow,
Emille E. O. Ishida
Abstract:
We present a detailed statistical analysis of possible non-linearities in the Period-Luminosity (P-L), Period-Wesenheit (P-W) and Period-Color (P-C) relations for Cepheid variables in the LMC at optical ($VI$) and near-infrared ($JHK_{s}$) wavelengths. We test for the presence of possible non-linearities and determine their statistical significance by applying a variety of robust statistical tests…
▽ More
We present a detailed statistical analysis of possible non-linearities in the Period-Luminosity (P-L), Period-Wesenheit (P-W) and Period-Color (P-C) relations for Cepheid variables in the LMC at optical ($VI$) and near-infrared ($JHK_{s}$) wavelengths. We test for the presence of possible non-linearities and determine their statistical significance by applying a variety of robust statistical tests ($F$-test, Random-Walk, Testimator and the Davies test) to optical data from OGLE III and near-infrared data from LMCNISS. For fundamental-mode Cepheids, we find that the optical P-L, P-W and P-C relations are non-linear at 10 days. The near-infrared P-L and the $W^H_{V,I}$ relations are non-linear around 18 days; this break is attributed to a distinct variation in mean Fourier amplitude parameters near this period for longer wavelengths as compared to optical bands. The near-infrared P-W relations are also non-linear except for the $W_{H,K_s}$ relation. For first-overtone mode Cepheids, a significant change in the slope of P-L, P-W and P-C relations is found around 2.5 days only at optical wavelengths. We determine a global slope of $\textrm{-}3.212\pm0.013$ for the $W^H_{V,I}$ relation by combining our LMC data with observations of Cepheids in Supernovae host galaxies \citep{riess11}. We find this slope to be consistent with the corresponding LMC relation at short periods, and significantly different to the long-period value. We do not find any significant difference in the slope of the global-fit solution using a linear or non-linear LMC P-L relation as calibrator, but the linear version provides a $2\times$ better constraint on the slope and metallicity coefficient.
△ Less
Submitted 5 January, 2016;
originally announced January 2016.
-
Exploring the spectroscopic diversity of type Ia supernovae with DRACULA: a machine learning approach
Authors:
Michele Sasdelli,
E. E. O. Ishida,
R. Vilalta,
M. Aguena,
V. C. Busti,
H. Camacho,
A. M. M. Trindade,
F. Gieseke,
R. S. de Souza,
Y. T. Fantaye,
P. A. Mazzali
Abstract:
The existence of multiple subclasses of type Ia supernovae (SNeIa) has been the subject of great debate in the last decade. One major challenge inevitably met when trying to infer the existence of one or more subclasses is the time consuming, and subjective, process of subclass definition. In this work, we show how machine learning tools facilitate identification of subtypes of SNeIa through the e…
▽ More
The existence of multiple subclasses of type Ia supernovae (SNeIa) has been the subject of great debate in the last decade. One major challenge inevitably met when trying to infer the existence of one or more subclasses is the time consuming, and subjective, process of subclass definition. In this work, we show how machine learning tools facilitate identification of subtypes of SNeIa through the establishment of a hierarchical group structure in the continuous space of spectral diversity formed by these objects. Using Deep Learning, we were capable of performing such identification in a 4 dimensional feature space (+1 for time evolution), while the standard Principal Component Analysis barely achieves similar results using 15 principal components. This is evidence that the progenitor system and the explosion mechanism can be described by a small number of initial physical parameters. As a proof of concept, we show that our results are in close agreement with a previously suggested classification scheme and that our proposed method can grasp the main spectral features behind the definition of such subtypes. This allows the confirmation of the velocity of lines as a first order effect in the determination of SNIa subtypes, followed by 91bg-like events. Given the expected data deluge in the forthcoming years, our proposed approach is essential to allow a quick and statistically coherent identification of SNeIa subtypes (and outliers). All tools used in this work were made publicly available in the Python package Dimensionality Reduction And Clustering for Unsupervised Learning in Astronomy (DRACULA) and can be found within COINtoolbox (https://github.com/COINtoolbox/DRACULA).
△ Less
Submitted 30 June, 2016; v1 submitted 21 December, 2015;
originally announced December 2015.