Search | arXiv e-print repository

doi 10.1051/0004-6361/202347488

The miniJPAS survey quasar selection IV: Classification and redshift estimation with SQUEzE

Authors: Ignasi Pérez-Ràfols, L. Raul Abramo, Ginés Martínez-Solaeche, Matthew M. Pieri, Carolina Queiroz, Natália V. N. Rodrigues, Silvia Bonoli, Jonás Chaves-Montero, Sean S. Morrison, Jailson Alcaniz, Narciso Benitez, Saulo Carneiro, Javier Cenarro, David Cristóbal-Hornillos, Renato Dupke, Alessandro Ederoclite, Rosa M. González Delgado, Antonio Hernán-Caballero, Carlos López-Sanjuan, Antonio Marín-Franch, Valerio Marra, Claudia Mendes de Oliveira, Mariano Moles, Laerte Sodré Jr., Keith Taylor , et al. (2 additional authors not shown)

Abstract: We present a list of quasar candidates including photometric redshift estimates from the miniJPAS Data Release constructed using SQUEzE. This work is based on machine-learning classification of photometric data of quasar candidates using SQUEzE. It has the advantage that its classification procedure can be explained to some extent, making it less of a `black box' when compared with other classifie… ▽ More We present a list of quasar candidates including photometric redshift estimates from the miniJPAS Data Release constructed using SQUEzE. This work is based on machine-learning classification of photometric data of quasar candidates using SQUEzE. It has the advantage that its classification procedure can be explained to some extent, making it less of a `black box' when compared with other classifiers. Another key advantage is that using user-defined metrics means the user has more control over the classification. While SQUEzE was designed for spectroscopic data, here we adapt it for multi-band photometric data, i.e. we treat multiple narrow-band filters as very low-resolution spectra. We train our models using specialized mocks from Queiroz et al. (2022). We estimate our redshift precision using the normalized median absolute deviation, $σ_{\rm NMAD}$ applied to our test sample. Our test sample returns an $f_1$ score (effectively the purity and completeness) of 0.49 for quasars down to magnitude $r=24.3$ with $z\geq2.1$ and 0.24 for quasars with $z<2.1$. For high-z quasars, this goes up to 0.9 for $r<21.0$. We present two catalogues of quasar candidates including redshift estimates: 301 from point-like sources and 1049 when also including extended sources. We discuss the impact of including extended sources in our predictions (they are not included in the mocks), as well as the impact of changing the noise model of the mocks. We also give an explanation of SQUEzE reasoning. Our estimates for the redshift precision using the test sample indicate a $σ_{NMAD}=0.92\%$ for the entire sample, reduced to 0.81\% for $r<22.5$ and 0.74\% for $r<21.3$. Spectroscopic follow-up of the candidates is required in order to confirm the validity of our findings. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: Accepted in A&A 24 pages, 24 figures, 7 tables

Journal ref: A&A 678, A144 (2023)

arXiv:2303.12684 [pdf, other]

doi 10.1051/0004-6361/202245750

The miniJPAS survey quasar selection III: Classification with artificial neural networks and hybridisation

Authors: G. Martínez-Solaeche, Carolina Queiroz, R. M. González Delgado, Natália V. N. Rodrigues, R. García-Benito, Ignasi Pérez-Ràfols, L. Raul Abramo, Luis Díaz-García, Matthew M. Pieri, Jonás Chaves-Montero, A. Hernán-Caballero, J. E. Rodríguez-Martín, Silvia Bonoli, Sean S. Morrison, Isabel Márquez, J. M. Vílchez, C. López-Sanjuan, A. J. Cenarro, R. A. Dupke, A. Martín-Franch, J. Varel, H. Vázquez Ramió, D. Cristóbal-Hornillos, M. Moles, J. Alcaniz , et al. (6 additional authors not shown)

Abstract: This paper is part of large effort within the J-PAS collaboration that aims to classify point-like sources in miniJPAS, which were observed in 60 optical bands over $\sim$ 1 deg$^2$ in the AEGIS field. We developed two algorithms based on artificial neural networks (ANN) to classify objects into four categories: stars, galaxies, quasars at low redshift ($z < 2.1)$, and quasars at high redshift (… ▽ More This paper is part of large effort within the J-PAS collaboration that aims to classify point-like sources in miniJPAS, which were observed in 60 optical bands over $\sim$ 1 deg$^2$ in the AEGIS field. We developed two algorithms based on artificial neural networks (ANN) to classify objects into four categories: stars, galaxies, quasars at low redshift ($z < 2.1)$, and quasars at high redshift ($z \geq 2.1$). As inputs, we used miniJPAS fluxes for one of the classifiers (ANN$_1$) and colours for the other (ANN$_2$). The ANNs were trained and tested using mock data in the first place. We studied the effect of augmenting the training set by creating hybrid objects, which combines fluxes from stars, galaxies, and quasars. Nevertheless, the augmentation processing did not improve the score of the ANN. We also evaluated the performance of the classifiers in a small subset of the SDSS DR12Q superset observed by miniJPAS. In the mock test set, the f1-score for quasars at high redshift with the ANN$_1$ (ANN$_2$) are $0.99$ ($0.99$), $0.93$ ($0.92$), and $0.63$ ($0.57$) for $17 < r \leq 20$, $20 < r \leq 22.5$, and $22.5 < r \leq 23.6$, respectively, where $r$ is the J-PAS rSDSS band. In the case of low-redshift quasars, galaxies, and stars, we reached $0.97$ ($0.97$), $0.82$ ($0.79$), and $0.61$ ($0.58$); $0.94$ ($0.94$), $0.90$ ($0.89$), and $0.81$ ($0.80$); and $1.0$ ($1.0$), $0.96$ ($0.94$), and $0.70$ ($0.52$) in the same r bins. In the SDSS DR12Q superset miniJPAS sample, the weighted f1-score reaches 0.87 (0.88) for objects that are mostly within $20 < r \leq 22.5$. Finally, we estimate the number of point-like sources that are quasars, galaxies, and stars in miniJPAS. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Journal ref: A&A 673, A103 (2023)

arXiv:2303.00489 [pdf, other]

doi 10.1093/mnras/stac2836

The miniJPAS survey quasar selection II: Machine learning classification with photometric measurements and uncertainties

Authors: Natália V. N. Rodrigues, L. Raul Abramo, Carolina Queiroz, Ginés Martínez-Solaeche, Ignasi Pérez-Ràfols, Silvia Bonoli, Jonás Chaves-Montero, Matthew M. Pieri, Rosa M. González Delgado, Sean S. Morrison, Valerio Marra, Isabel Márquez, A. Hernán-Caballero, L. A. Díaz-García, Narciso Benítez, A. Javier Cenarro, Renato A. Dupke, Alessandro Ederoclite, Carlos López-Sanjuan, Antonio Marín-Franch, Claudia Mendes de Oliveira, Mariano Moles, Laerte Sodré Jr., Jesús Varela, Héctor Vázquez Ramió , et al. (1 additional authors not shown)

Abstract: Astrophysical surveys rely heavily on the classification of sources as stars, galaxies or quasars from multi-band photometry. Surveys in narrow-band filters allow for greater discriminatory power, but the variety of different types and redshifts of the objects present a challenge to standard template-based methods. In this work, which is part of larger effort that aims at building a catalogue of q… ▽ More Astrophysical surveys rely heavily on the classification of sources as stars, galaxies or quasars from multi-band photometry. Surveys in narrow-band filters allow for greater discriminatory power, but the variety of different types and redshifts of the objects present a challenge to standard template-based methods. In this work, which is part of larger effort that aims at building a catalogue of quasars from the miniJPAS survey, we present a Machine Learning-based method that employs Convolutional Neural Networks (CNNs) to classify point-like sources including the information in the measurement errors. We validate our methods using data from the miniJPAS survey, a proof-of-concept project of the J-PAS collaboration covering $\sim$ 1 deg$^2$ of the northern sky using the 56 narrow-band filters of the J-PAS survey. Due to the scarcity of real data, we trained our algorithms using mocks that were purpose-built to reproduce the distributions of different types of objects that we expect to find in the miniJPAS survey, as well as the properties of the real observations in terms of signal and noise. We compare the performance of the CNNs with other well-established Machine Learning classification methods based on decision trees, finding that the CNNs improve the classification when the measurement errors are provided as inputs. The predicted distribution of objects in miniJPAS is consistent with the putative luminosity functions of stars, quasars and unresolved galaxies. Our results are a proof-of-concept for the idea that the J-PAS survey will be able to detect unprecedented numbers of quasars with high confidence. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 16 pages, 15 figures, published by MNRAS

Journal ref: Monthly Notices of the Royal Astronomical Society, 2023, 520, 3494-3509

arXiv:2301.06398 [pdf, other]

doi 10.1093/mnras/stad1186

High-fidelity reproduction of central galaxy joint distributions with Neural Networks

Authors: Natália V. N. Rodrigues, Natalí S. M. de Santi, Antonio D. Montero-Dorta, L. Raul Abramo

Abstract: The relationship between galaxies and haloes is central to the description of galaxy formation, and a fundamental step towards extracting precise cosmological information from galaxy maps. However, this connection involves several complex processes that are interconnected. Machine Learning methods are flexible tools that can learn complex correlations between a large number of features, but are tr… ▽ More The relationship between galaxies and haloes is central to the description of galaxy formation, and a fundamental step towards extracting precise cosmological information from galaxy maps. However, this connection involves several complex processes that are interconnected. Machine Learning methods are flexible tools that can learn complex correlations between a large number of features, but are traditionally designed as deterministic estimators. In this work, we use the IllustrisTNG300-1 simulation and apply neural networks in a binning classification scheme to predict probability distributions of central galaxy properties, namely stellar mass, colour, specific star formation rate, and radius, using as input features the halo mass, concentration, spin, age, and the overdensity on a scale of 3 $h^{-1}$ Mpc. The model captures the intrinsic scatter in the relation between halo and galaxy properties, and can thus be used to quantify the uncertainties related to the stochasticity of the galaxy properties with respect to the halo properties. In particular, with our proposed method, one can define and accurately reproduce the properties of the different galaxy populations in great detail. We demonstrate the power of this tool by directly comparing traditional single-point estimators and the predicted joint probability distributions, and also by computing the power spectrum of a large number of tracers defined on the basis of the predicted colour-stellar mass diagram. We show that the neural networks reproduce clustering statistics of the individual galaxy populations with excellent precision and accuracy. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Comments: 12 pages, 7 figures

arXiv:2202.00103 [pdf, other]

doi 10.1093/mnras/stac2962

The miniJPAS survey quasar selection I: Mock catalogues for classification

Authors: Carolina Queiroz, L. Raul Abramo, Natália V. N. Rodrigues, Ignasi Pérez-Ràfols, Ginés Martínez-Solaeche, Antonio Hernán-Caballero, Carlos Hernández-Monteagudo, Alejandro Lumbreras-Calle, Matthew M. Pieri, Sean S. Morrison, Silvia Bonoli, Jonás Chaves-Montero, Ana L. Chies-Santos, L. A. Díaz-García, Alberto Fernandez-Soto, Rosa M. González Delgado, Jailson Alcaniz, Narciso Benítez, A. Javier Cenarro, Tamara Civera, Renato A. Dupke, Alessandro Ederoclite, Carlos López-Sanjuan, Antonio Marín-Franch, Claudia Mendes de Oliveira , et al. (5 additional authors not shown)

Abstract: In this series of papers, we employ several machine learning (ML) methods to classify the point-like sources from the miniJPAS catalogue, and identify quasar candidates. Since no representative sample of spectroscopically confirmed sources exists at present to train these ML algorithms, we rely on mock catalogues. In this first paper we develop a pipeline to compute synthetic photometry of quasars… ▽ More In this series of papers, we employ several machine learning (ML) methods to classify the point-like sources from the miniJPAS catalogue, and identify quasar candidates. Since no representative sample of spectroscopically confirmed sources exists at present to train these ML algorithms, we rely on mock catalogues. In this first paper we develop a pipeline to compute synthetic photometry of quasars, galaxies and stars using spectra of objects targeted as quasars in the Sloan Digital Sky Survey. To match the same depths and signal-to-noise ratio distributions in all bands expected for miniJPAS point sources in the range $17.5\leq r<24$, we augment our sample of available spectra by shifting the original $r$-band magnitude distributions towards the faint end, ensure that the relative incidence rates of the different objects are distributed according to their respective luminosity functions, and perform a thorough modeling of the noise distribution in each filter, by sampling the flux variance either from Gaussian realizations with given widths, or from combinations of Gaussian functions. Finally, we also add in the mocks the patterns of non-detections which are present in all real observations. Although the mock catalogues presented in this work are a first step towards simulated data sets that match the properties of the miniJPAS observations, these mocks can be adapted to serve the purposes of other photometric surveys. △ Less

Submitted 31 January, 2022; originally announced February 2022.

Comments: 20 pages, 18 figures, submitted to MNRAS

arXiv:2201.06054 [pdf, other]

doi 10.1093/mnras/stac1469

Mimicking the halo-galaxy connection using machine learning

Authors: Natalí S. M. de Santi, Natália V. N. Rodrigues, Antonio D. Montero-Dorta, L. Raul Abramo, Beatriz Tucci, M. Celeste Artale

Abstract: Elucidating the connection between the properties of galaxies and the properties of their hosting haloes is a key element in galaxy formation. When the spatial distribution of objects is also taken under consideration, it becomes very relevant for cosmological measurements. In this paper, we use machine learning techniques to analyse these intricate relations in the IllustrisTNG300 magnetohydrodyn… ▽ More Elucidating the connection between the properties of galaxies and the properties of their hosting haloes is a key element in galaxy formation. When the spatial distribution of objects is also taken under consideration, it becomes very relevant for cosmological measurements. In this paper, we use machine learning techniques to analyse these intricate relations in the IllustrisTNG300 magnetohydrodynamical simulation, predicting baryonic properties from halo properties. We employ four different algorithms: extremely randomized trees, K-nearest neighbours, light gradient boosting machine, and neural networks, along with a unique and powerful combination of the results from all four approaches. Overall, the different algorithms produce consistent results in terms of predicting galaxy properties from a set of input halo properties that include halo mass, concentration, spin, and halo overdensity. For stellar mass, the Pearson correlation coefficient is 0.98, drop** down to 0.7-0.8 for specific star formation rate (sSFR), colour, and size. In addition, we apply, for the first time in this context, an existing data augmentation method, synthetic minority over-sampling technique for regression with Gaussian noise (SMOGN), designed to alleviate the problem of imbalanced data sets, showing that it improves the overall shape of the predicted distributions and the scatter in the halo-galaxy relations. We also demonstrate that our predictions are good enough to reproduce the power spectra of multiple galaxy populations, defined in terms of stellar mass, sSFR, colour, and size with high accuracy. Our results align with previous reports suggesting that certain galaxy properties cannot be reproduced using halo features alone. △ Less

Submitted 1 July, 2022; v1 submitted 16 January, 2022; originally announced January 2022.

Comments: Matches published version; very minor changes wrt V1

Journal ref: Volume 514, 2022, Pages 2463-2478

arXiv:2108.04742 [pdf, other]

The information of attribute uncertainties: what convolutional neural networks can learn about errors in input data

Authors: Natália V. N. Rodrigues, L. Raul Abramo, Nina S. Hirata

Abstract: Errors in measurements are key to weighting the value of data, but are often neglected in Machine Learning (ML). We show how Convolutional Neural Networks (CNNs) are able to learn about the context and patterns of signal and noise, leading to improvements in the performance of classification methods. We construct a model whereby two classes of objects follow an underlying Gaussian distribution, an… ▽ More Errors in measurements are key to weighting the value of data, but are often neglected in Machine Learning (ML). We show how Convolutional Neural Networks (CNNs) are able to learn about the context and patterns of signal and noise, leading to improvements in the performance of classification methods. We construct a model whereby two classes of objects follow an underlying Gaussian distribution, and where the features (the input data) have varying, but known, levels of noise. This model mimics the nature of scientific data sets, where the noises arise as realizations of some random processes whose underlying distributions are known. The classification of these objects can then be performed using standard statistical techniques (e.g., least-squares minimization or Markov-Chain Monte Carlo), as well as ML techniques. This allows us to take advantage of a maximum likelihood approach to object classification, and to measure the amount by which the ML methods are incorporating the information in the input data uncertainties. We show that, when each data point is subject to different levels of noise (i.e., noises with different distribution functions), that information can be learned by the CNNs, raising the ML performance to at least the same level of the least-squares method -- and sometimes even surpassing it. Furthermore, we show that, with varying noise levels, the confidence of the ML classifiers serves as a proxy for the underlying cumulative distribution function, but only if the information about specific input data uncertainties is provided to the CNNs. △ Less

Submitted 10 August, 2021; originally announced August 2021.

Comments: 32 pages, 15 figures

Showing 1–7 of 7 results for author: Rodrigues, N V N