-
Transient Classifiers for Fink: Benchmarks for LSST
Authors:
B. M. O. Fraga,
C. R. Bom,
A. Santos,
E. Russeil,
M. Leoni,
J. Peloton,
E. E. O. Ishida,
A. Möller,
S. Blondin
Abstract:
The upcoming Legacy Survey of Space and Time (LSST) at the Vera Rubin Observatory is expected to detect a few million transients per night, which will generate a live alert stream during the entire 10 years of the survey. This will be distributed via community brokers whose task is to select subsets of the stream and direct them to scientific communities. Given the volume and complexity of data, m…
▽ More
The upcoming Legacy Survey of Space and Time (LSST) at the Vera Rubin Observatory is expected to detect a few million transients per night, which will generate a live alert stream during the entire 10 years of the survey. This will be distributed via community brokers whose task is to select subsets of the stream and direct them to scientific communities. Given the volume and complexity of data, machine learning (ML) algorithms will be paramount for this task. We present the infrastructure tests and classification methods developed within the {\sc Fink} broker in preparation for LSST. This work aims to provide detailed information regarding the underlying assumptions, and methods, behind each classifier, enabling users to make informed follow-up decisions from {\sc Fink} photometric classifications. Using simulated data from the Extended LSST Astronomical Time-series Classification Challenge (ELAsTiCC), we showcase the performance of binary and multi-class ML classifiers available in {\sc Fink}. These include tree-based classifiers coupled with tailored feature extraction strategies, as well as deep learning algorithms. We introduce the CBPF Alert Transient Search (CATS), a deep learning architecture specifically designed for this task. Results show that {\sc Fink} classifiers are able to handle the extra complexity which is expected from LSST data. CATS achieved $97\%$ accuracy on a multi-class classification while our best performing binary classifier achieve $99\%$ when classifying the Periodic class. ELAsTiCC was an important milestone in preparing {\sc Fink} infrastructure to deal with LSST-like data. Our results demonstrate that {\sc Fink} classifiers are well prepared for the arrival of the new stream; this experience also highlights that transitioning from current infrastructures to Rubin will require significant adaptation of currently available tools.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
M-dwarf flares in the Zwicky Transient Facility data and what we can learn from them
Authors:
A. S. Voloshina,
A. D. Lavrukhina,
M. V. Pruzhinskaya,
K. L. Malanchev,
E. E. O. Ishida,
V. V. Krushinsky,
P. D. Aleo,
E. Gangler,
M. V. Kornilov,
V. S. Korolev,
E. Russeil,
T. A. Semenikhin,
S. Sreejith,
A. A. Volnova
Abstract:
In this paper, we explore the possibility of detecting M-dwarf flares using data from the Zwicky Transient Facility data releases (ZTF DRs). We employ two different approaches: the traditional method of parametric fit search and a machine learning algorithm originally developed for anomaly detection. We analyzed over 35 million ZTF light curves and visually scrutinized 1168 candidates suggested by…
▽ More
In this paper, we explore the possibility of detecting M-dwarf flares using data from the Zwicky Transient Facility data releases (ZTF DRs). We employ two different approaches: the traditional method of parametric fit search and a machine learning algorithm originally developed for anomaly detection. We analyzed over 35 million ZTF light curves and visually scrutinized 1168 candidates suggested by the algorithms to filter out artifacts, occultations of a star by an asteroid, and known variable objects of other types. Our final sample comprises 134 flares with amplitude ranging from 0.2 to 4.6 magnitudes, including repeated flares and complex flares with multiple components. Using Pan-STARRS DR2 colors, we also assigned a corresponding spectral subclass to each object in the sample. For 13 flares with well-sampled light curves, we estimated the bolometric energy. Our results show that the ZTF's cadence strategy is suitable for identifying M-dwarf flares and other fast transients, allowing for the extraction of significant astrophysical information from their light curves.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Multi-View Symbolic Regression
Authors:
Etienne Russeil,
Fabrício Olivetti de França,
Konstantin Malanchev,
Bogdan Burlacu,
Emille E. O. Ishida,
Marion Leroux,
Clément Michelin,
Guillaume Moinard,
Emmanuel Gangler
Abstract:
Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail t…
▽ More
Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; θ) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behaviour, recovering known expressions from the literature as well as promising alternatives, thus enabling the use SR to a large range of experimental scenarios.
△ Less
Submitted 16 February, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Rainbow: a colorful approach on multi-passband light curve estimation
Authors:
E. Russeil,
K. L. Malanchev,
P. D. Aleo,
E. E. O. Ishida,
M. V. Pruzhinskaya,
E. Gangler,
A. D. Lavrukhina,
A. A. Volnova,
A. Voloshina,
T. Semenikhin,
S. Sreejith,
M. V. Kornilov,
V. S. Korolev
Abstract:
We present Rainbow, a physically motivated framework which enables simultaneous multi-band light curve fitting. It allows the user to construct a 2-dimensional continuous surface across wavelength and time, even in situations where the number of observations in each filter is significantly limited. Assuming the electromagnetic radiation emission from the transient can be approximated by a black-bo…
▽ More
We present Rainbow, a physically motivated framework which enables simultaneous multi-band light curve fitting. It allows the user to construct a 2-dimensional continuous surface across wavelength and time, even in situations where the number of observations in each filter is significantly limited. Assuming the electromagnetic radiation emission from the transient can be approximated by a black-body, we combined an expected temperature evolution and a parametric function describing its bolometric light curve. These three ingredients allow the information available in one passband to guide the reconstruction in the others, thus enabling a proper use of multi-survey data. We demonstrate the effectiveness of our method by applying it to simulated data from the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC) as well as real data from the Young Supernova Experiment (YSE DR1). We evaluate the quality of the estimated light curves according to three different tests: goodness of fit, time of peak prediction and ability to transfer information to machine learning (ML) based classifiers. Results confirm that Rainbow leads to equivalent (SNII) or up to 75% better (SN Ibc) goodness of fit when compared to the Monochromatic approach. Similarly, accuracy when using Rainbow best-fit values as a parameter space in multi-class ML classification improves for all classes in our sample. An efficient implementation of Rainbow has been publicly released as part of the light curve package at https://github.com/light-curve/light-curve-python. Our approach enables straight forward light curve estimation for objects with observations in multiple filters and from multiple experiments. It is particularly well suited for situations where light curve sampling is sparse.
△ Less
Submitted 5 October, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Finding active galactic nuclei through Fink
Authors:
Etienne Russeil,
Emille E. O. Ishida,
Roman Le Montagner,
Julien Peloton,
Anais Moller
Abstract:
We present the Active Galactic Nuclei (AGN) classifier as currently implemented within the Fink broker. Features were built upon summary statistics of available photometric points, as well as color estimation enabled by symbolic regression. The learning stage includes an active learning loop, used to build an optimized training sample from labels reported in astronomical catalogs. Using this metho…
▽ More
We present the Active Galactic Nuclei (AGN) classifier as currently implemented within the Fink broker. Features were built upon summary statistics of available photometric points, as well as color estimation enabled by symbolic regression. The learning stage includes an active learning loop, used to build an optimized training sample from labels reported in astronomical catalogs. Using this method to classify real alerts from the Zwicky Transient Facility (ZTF), we achieved 98.0% accuracy, 93.8% precision and 88.5% recall. We also describe the modifications necessary to enable processing data from the upcoming Vera C. Rubin Observatory Large Survey of Space and Time (LSST), and apply them to the training sample of the Extended LSST Astronomical Time-series Classification Challenge (ELAsTiCC). Results show that our designed feature space enables high performances of traditional machine learning algorithms in this binary classification task.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
The SNAD Viewer: Everything You Want to Know about Your Favorite ZTF Object
Authors:
Konstantin Malanchev,
Matwey V. Kornilov,
Maria V. Pruzhinskaya,
Emille E. O. Ishida,
Patrick D. Aleo,
Vladimir S. Korolev,
Anastasia Lavrukhina,
Etienne Russeil,
Sreevarsha Sreejith,
Alina A. Volnova,
Anastasiya Voloshina,
Alberto Krone-Martins
Abstract:
We describe the SNAD Viewer, a web portal for astronomers which presents a centralized view of individual objects from the Zwicky Transient Facility's (ZTF) data releases, including data gathered from multiple publicly available astronomical archives and data sources. Initially built to enable efficient expert feedback in the context of adaptive machine learning applications, it has evolved into a…
▽ More
We describe the SNAD Viewer, a web portal for astronomers which presents a centralized view of individual objects from the Zwicky Transient Facility's (ZTF) data releases, including data gathered from multiple publicly available astronomical archives and data sources. Initially built to enable efficient expert feedback in the context of adaptive machine learning applications, it has evolved into a full-fledged community asset that centralizes public information and provides a multi-dimensional view of ZTF sources. For users, we provide detailed descriptions of the data sources and choices underlying the information displayed in the portal. For developers, we describe our architectural choices and their consequences such that our experience can help others engaged in similar endeavors or in adapting our publicly released code to their requirements. The infrastructure we describe here is scalable and flexible and can be personalized and used by other surveys and for other science goals. The Viewer has been instrumental in highlighting the crucial roles domain experts retain in the era of big data in astronomy. Given the arrival of the upcoming generation of large-scale surveys, we believe similar systems will be paramount in enabling an optimal exploitation of the scientific potential enclosed in current terabyte and future petabyte-scale data sets. The Viewer is publicly available online at https://ztf.snad.space
△ Less
Submitted 3 March, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Supernova search with active learning in ZTF DR3
Authors:
Maria V. Pruzhinskaya,
Emille E. O. Ishida,
Alexandra K. Novinskaya,
Etienne Russeil,
Alina A. Volnova,
Konstantin L. Malanchev,
Matwey V. Kornilov,
Patrick D. Aleo,
Vladimir S. Korolev,
Vadim V. Krushinsky,
Sreevarsha Sreejith,
Emmanuel Gangler
Abstract:
We provide the first results from the complete SNAD adaptive learning pipeline in the context of a broad scope of data from large-scale astronomical surveys. The main goal of this work is to explore the potential of adaptive learning techniques in application to big data sets. Our SNAD team used Active Anomaly Discovery (AAD) as a tool to search for new supernova (SN) candidates in the photometric…
▽ More
We provide the first results from the complete SNAD adaptive learning pipeline in the context of a broad scope of data from large-scale astronomical surveys. The main goal of this work is to explore the potential of adaptive learning techniques in application to big data sets. Our SNAD team used Active Anomaly Discovery (AAD) as a tool to search for new supernova (SN) candidates in the photometric data from the first 9.4 months of the Zwicky Transient Facility (ZTF) survey, namely, between March 17 and December 31 2018 (58194 < MJD < 58483). We analysed 70 ZTF fields at a high galactic latitude and visually inspected 2100 outliers. This resulted in 104 SN-like objects being found, 57 of which were reported to the Transient Name Server for the first time and with 47 having previously been mentioned in other catalogues, either as SNe with known types or as SN candidates. We visually inspected the multi-colour light curves of the non-catalogued transients and performed fittings with different supernova models to assign it to a probable photometric class: Ia, Ib/c, IIP, IIL, or IIn. Moreover, we also identified unreported slow-evolving transients that are good superluminous SN candidates, along with a few other non-catalogued objects, such as red dwarf flares and active galactic nuclei. Beyond confirming the effectiveness of human-machine integration underlying the AAD strategy, our results shed light on potential leaks in currently available pipelines. These findings can help avoid similar losses in future large-scale astronomical surveys. Furthermore, the algorithm enables direct searches of any type of data and based on any definition of an anomaly set by the expert.
△ Less
Submitted 27 March, 2023; v1 submitted 18 August, 2022;
originally announced August 2022.
-
SNAD Transient Miner: Finding Missed Transient Events in ZTF DR4 using k-D trees
Authors:
P. D. Aleo,
K. L. Malanchev,
M. V. Pruzhinskaya,
E. E. O. Ishida,
E. Russeil,
M. V. Kornilov,
V. S. Korolev,
S. Sreejith,
A. A. Volnova,
G. S. Narayan
Abstract:
We report the automatic detection of 11 transients (7 possible supernovae and 4 active galactic nuclei candidates) within the Zwicky Transient Facility fourth data release (ZTF DR4), all of them observed in 2018 and absent from public catalogs. Among these, three were not part of the ZTF alert stream. Our transient mining strategy employs 41 physically motivated features extracted from both real l…
▽ More
We report the automatic detection of 11 transients (7 possible supernovae and 4 active galactic nuclei candidates) within the Zwicky Transient Facility fourth data release (ZTF DR4), all of them observed in 2018 and absent from public catalogs. Among these, three were not part of the ZTF alert stream. Our transient mining strategy employs 41 physically motivated features extracted from both real light curves and four simulated light curve models (SN Ia, SN II, TDE, SLSN-I). These features are input to a k-D tree algorithm, from which we calculate the 15 nearest neighbors. After pre-processing and selection cuts, our dataset contained approximately a million objects among which we visually inspected the 105 closest neighbors from seven of our brightest, most well-sampled simulations, comprising 89 unique ZTF DR4 sources. Our result illustrates the potential of coherently incorporating domain knowledge and automatic learning algorithms, which is one of the guiding principles directing the SNAD team. It also demonstrates that the ZTF DR is a suitable testing ground for data mining algorithms aiming to prepare for the next generation of astronomical data.
△ Less
Submitted 4 May, 2022; v1 submitted 22 November, 2021;
originally announced November 2021.