Astrophysics > Instrumentation and Methods for Astrophysics
[Submitted on 20 Dec 2023]
Title:Boost recall in QSO selection from highly imbalanced photometric datasets
View PDFAbstract:Context. The identification of bright QSOs is of great importance to probe the intergalactic medium and address open questions in cosmology. Several approaches have been adopted to find such sources in currently available photometric surveys, including machine learning methods. However, the rarity of bright QSOs at high redshifts compared to contaminating sources (such as stars and galaxies) makes the selection of reliable candidates a difficult task, especially when high completeness is required. Aims. We present a novel technique to boost recall (i.e., completeness within the considered sample) in the selection of QSOs from photometric datasets dominated by stars, galaxies, and low-z QSOs (imbalanced datasets). Methods. Our method operates by iteratively removing sources whose probability of belonging to a noninteresting class exceeds a user-defined threshold, until the remaining dataset contains mainly high-z QSOs. Any existing machine learning method can be used as underlying classifier, provided it allows for a classification probability to be estimated. We applied the method to a dataset obtained by cross-matching PanSTARRS1, Gaia, and WISE, and identified the high-z QSO candidates using both our method and its direct multi-label counterpart. Results. We ran several tests by randomly choosing the training and test datasets, and achieved significant improvements in recall which increased from 50% to 85% for QSOs with z>2.5, and from 70% to 90% for QSOs with z>3. Also, we identified a sample of 3098 new QSO candidates on a sample of 2.6x10^6 sources with no known classification. We obtained follow-up spectroscopy for 121 candidates, confirming 107 new QSOs with z>2.5. Finally, a comparison of our candidates with those selected by an independent method shows that the two samples overlap by more than 90% and that both methods are capable of achieving a high level of completeness.
Current browse context:
astro-ph.IM
Change to browse by:
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
IArxiv Recommender
(What is IArxiv?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.