Estimating Model Performance Under Covariate Shift Without Labels
Authors:
Jakub BiaĆek,
Wojtek Kuberski,
Nikolaos Perrakis,
Albert Bifet
Abstract:
Machine learning models often experience performance degradation post-deployment due to shifts in data distribution. It is challenging to assess model's performance accurately when labels are missing or delayed. Existing proxy methods, such as drift detection, fail to measure the effects of these shifts adequately. To address this, we introduce a new method, Probabilistic Adaptive Performance Esti…
▽ More
Machine learning models often experience performance degradation post-deployment due to shifts in data distribution. It is challenging to assess model's performance accurately when labels are missing or delayed. Existing proxy methods, such as drift detection, fail to measure the effects of these shifts adequately. To address this, we introduce a new method, Probabilistic Adaptive Performance Estimation (PAPE), for evaluating classification models on unlabeled data that accurately quantifies the impact of covariate shift on model performance. It is model and data-type agnostic and works for various performance metrics. Crucially, PAPE operates independently of the original model, relying only on its predictions and probability estimates, and does not need any assumptions about the nature of the covariate shift, learning directly from data instead. We tested PAPE on tabular data using over 900 dataset-model combinations created from US census data, assessing its performance against multiple benchmarks. Overall, PAPE provided more accurate performance estimates than other evaluated methodologies.
△ Less
Submitted 28 May, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
Machine-learning classification of astronomical sources: estimating F1-score in the absence of ground truth
Authors:
A. Humphrey,
W. Kuberski,
J. Bialek,
N. Perrakis,
W. Cools,
N. Nuyttens,
H. Elakhrass,
P. A. C. Cunha
Abstract:
Machine-learning based classifiers have become indispensable in the field of astrophysics, allowing separation of astronomical sources into various classes, with computational efficiency suitable for application to the enormous data volumes that wide-area surveys now typically produce. In the standard supervised classification paradigm, a model is typically trained and validated using data from re…
▽ More
Machine-learning based classifiers have become indispensable in the field of astrophysics, allowing separation of astronomical sources into various classes, with computational efficiency suitable for application to the enormous data volumes that wide-area surveys now typically produce. In the standard supervised classification paradigm, a model is typically trained and validated using data from relatively small areas of sky, before being used to classify sources in other areas of the sky. However, population shifts between the training examples and the sources to be classified can lead to `silent' degradation in model performance, which can be challenging to identify when the ground-truth is not available. In this Letter, we present a novel methodology using the NannyML Confidence-Based Performance Estimation (CBPE) method to predict classifier F1-score in the presence of population shifts, but without ground-truth labels. We apply CBPE to the selection of quasars with decision-tree ensemble models, using broad-band photometry, and show that the F1-scores are predicted remarkably well (MAPE ~ 10%; R^2 = 0.74-0.92). We discuss potential use-cases in the domain of astronomy, including machine-learning model and/or hyperparameter selection, and evaluation of the suitability of training datasets for a particular classification problem.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.