XAMI - A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images

Elisabeta-Iulia Dima Corresponding author. Email: [email protected] Department of Computers and Information Technology, Politehnica University of Timişoara, Blvd. V. Pârvan, No. 2, 300223 Timişoara, Romania Pablo Gómez European Space Agency (ESA), European Space Astronomy Centre (ESAC), Camino Bajo del Castillo s/n, 28692 Villanueva de la Cañada, Madrid, Spain Sandor Kruk European Space Agency (ESA), European Space Astronomy Centre (ESAC), Camino Bajo del Castillo s/n, 28692 Villanueva de la Cañada, Madrid, Spain Peter Kretschmar European Space Agency (ESA), European Space Astronomy Centre (ESAC), Camino Bajo del Castillo s/n, 28692 Villanueva de la Cañada, Madrid, Spain
Simon Rosen Serco Ltd., ESAC, Camino Bajo del Castillo s/n, 28692 Villanueva de la Cañada, Madrid, Spain Călin-Adrian Popa Department of Computers and Information Technology, Politehnica University of Timişoara, Blvd. V. Pârvan, No. 2, 300223 Timişoara, Romania

Abstract

Reflected or scattered light produce artefacts in astronomical observations that can negatively impact the scientific study. Hence, automated detection of these artefacts is highly beneficial, especially with the increasing amounts of data gathered. Machine learning methods are well-suited to this problem, but currently there is a lack of annotated data to train such approaches to detect artefacts in astronomical observations. In this work, we present a dataset of images from the XMM-Newton space telescope Optical Monitoring camera showing different types of artefacts. We hand-annotated a sample of 1000 images with artefacts which we use to train automated ML methods. We further demonstrate techniques tailored for accurate detection and masking of artefacts using instance segmentation. We adopt a hybrid approach, combining knowledge from both convolutional neural networks (CNNs) and transformer-based models and use their advantages in segmentation.

The presented method and dataset will advance artefact detection in astronomical observations by providing a reproducible baseline. All code and data are made available publicly¹¹1https://github.com/ESA-Datalabs/XAMI-model^,²²2https://github.com/ESA-Datalabs/XAMI-dataset.

\makeCustomtitle

1 Introduction

Astronomical surveys and space missions (e.g., LSST [Ivezi__2019] and European Space Agency’s Euclid mission [laureijs2009euclid]) will enhance our understanding of the cosmos by delivering unprecedented images, measurements and insights into billions of stars and galaxies, the expansion of the Universe, dark energy and dark matter. Such surveys will produce enormous amounts of data daily, thus the ongoing demand for the effective processing and analysis of large image data produced by space missions underscores the necessity for automated methodologies. The presence of artefacts (e.g. ghost reflections, star loops, read-out-streaks) (e.g., Figure 1) poses challenges, potentially leading to false detections or affecting the photometric measurements of genuine sources.

Refer to caption — Figure 1: Examples of artefacts in various space missions. (upper left) An optical ghost detected in Euclid’s First Light near-infrared images. (upper right) Ghost rays and stray light patterns present in NuSTAR mission. (bottom left) Star loops and dragon’s breath artefacts appearing in the Hubble Space Telescope images. (bottom right) Star loops and streaks present in the XMM-Newton Optical Monitor.

XMM-Newton Optical Monitor. ESA’s X-ray Multi-Mirror Mission (XMM-Newton) [xmm_newton_2000, Schartel_2022] is an orbiting observatory with the principal goal to conduct detailed X-ray spectroscopy of various celestial objects. The XMM-Newton Optical Monitor (XMM-OM) [Mason_2001, Cordova1989, Lumb1991] extends the simultaneous observational capability of the three main X-ray telescopes into the ultraviolet and optical bands. The XMM-OM source catalogue is a valuable resource containing approximately 9 million detections of around 6 million distinct sources. It plays a pivotal role in individual object analyses [Soria_2001, refId0, 10.1111/j.1365-2966.2004.07660.x] and contributes significantly to survey science. However, the process of source detection within the XMM-OM data analysis process would benefit significantly from improved artefact recognition.

Current non-AI approaches to detecting artefacts [Mukhin_2023, article_nustar_straycats, DESAI201667] often struggle due to their reliance on generalised physical models. These models, while broadly applicable, fail to address specific scenarios effectively, leading to limitations in their practical utility.

AI methods based on CNN and Vision Transformer (ViT) models have achieved notable success and have benefited real-world applications in tasks such as object detection [wang2022yolov7, 10.1007/978-3-031-20053-3_27, maaz2022classagnostic, zong2023detrs] and segmentation [srivastava2023omnivec, wang2022image, hümmer2023vltseg, https://doi.org/10.48550/arxiv.2401.15741, fang2022eva, wang2023internimage, liu2021swin, he2018mask, rs13234779]. Instance segmentation techniques for astronomical sources present significant progress [10.1093/mnras/stad2785, Sortino_2023, hausen2022partialattribution], yet there has been limited focus on artefacts detection [tanoglidis2021deepghostbusters]. ViT models are increasingly preferred in computer vision due to their self-attention mechanisms. The Segment Anything Model (SAM) [kirillov2023segment], a ViT-based architecture, excels in class-agnostic instance segmentation and zero-shot learning, allowing it to identify objects not seen during training.

We introduce XAMI (XMM-Newton optical Artefact Map** for astronomical Instance segmentation), a hybrid CNN and ViT-based model, and XAMI-Dataset, a high-precision instance segmentation dataset for astronomical images. Together, they provide a first baseline demonstrating ML-based artefact detection on astronomical images as well as benchmark and starting point for other researchers to build on.

2 Methods

2.1 Dataset

We use 1000 single-channel images at various wavelengths (see Table 1 and [xmmom_filters_handbook]) from the XMM-OM as the baseline artefacts dataset. Each image comprises a stack of all available windows in a given filter of an observation that, together, cover the full $17^{\prime}\times 17^{\prime}$ field of view. This corresponds to a full frame of $2048\times 2048$ px resolution, with an effective resolution of $0.477^{\prime}$ /pixel. We rebinned the full-frame images to $512\times 512$ px for computational efficiency. We normalised images using ZScaleInterval algorithm and enhanced them with Asinh stretching to increase dynamic range without negatively affecting contrast.

The XAMI dataset consists of 7021 annotated artefacts which can be divided into the following categories (Figure 3):

1.

Read-Out-Streaks (ROS) - arising from shutterless camera and continuous Charge-Coupled Device (CCD) photon recording during readout.
2.

Smoke rings (SR) - resulting from internal reflections of starlight within the detector.
3.

Central ring (CR) - appearing in the centre of the detector, approximately $2^{\prime}$ in diameter, resulting from background light scattering from a chamfer on the detector window mounting ring.
4.

Star loops (SL) - elongated scattered light features caused by light from bright stars within a $12^{\prime}-15^{\prime}$ off-axis range, scattered from the chamfer.
5.

Other - other types of artefacts which usually represent scattered light spread over large areas.

Filter	$\lambda$ (nm)	width	$\#$ images	$\#$ masks
V	543	70	102	880
B	450	105	116	1259
U	344	84	193	1837
UVW1(L)	291	83	403	2127
UVM2(M)	231	48	175	681
UVW2(S)	212	50	63	226
White(W)	406	347	3	11

Table 1: Dataset information per observing filter, together with their central wavelength and width (nm).

Class	Train	Validation
CR	500 (9.43%)	168 (9.75%)
SR	1267 (23.91%)	402 (23.33%)
SL	1377 (25.99%)	467 (27.10%)
ROS	2122 (40.05%)	677 (39.29%)
Other	32 (0.60%)	9 (0.52%)

Table 2: Dataset distribution across splits, given class labels.

We use the stratified k-fold technique to maintain consistent class proportions across dataset splits, thus ensuring accurate performance estimation. Resulting class distributions can be seen in Table 2.

2.2 Baseline Model

We propose a class-aware approach for instance segmentation that integrates an object detector, specifically the YOLOv8 model [reis2023realtime], into our SAM prediction logic to facilitate auto-generated input prompts.

Unlike CNNs, which strictly delineate object masks by bounding boxes, transformer-based models like SAM integrate self-attention to potentially extend beyond these initial margins. However, spatial invariance and accurate segmentation of faint objects remain a challenge for ViTs, in contrast with CNN approaches. By utilising SAM for smooth masks and YOLOv8 for faint objects with certain classes, we aim to overcome these limitations.

3 Results

Our methodology initially involves training SAM with ground-truth annotations using a distilled image encoder from MobileSAM [zhang2023faster]. For SAM, images are resized to $1024\times 1024$ px and have their colours normalized. We use a batch size of 8, a warmup learning rate scheduler ( $\mathrm{lr}_{\mathrm{init}}=3\times 10^{-4}$ , $\mathrm{lr}_{\mathrm{final}}=6\times 10^{-5}$ ) for 16 steps, weight decay of $10^{-5}$ and AdamW optimizer. We train the Mask Decoder only, while freezing the Image Encoder and Prompt Embedding layers.

Following recommendations in [kirillov2023segment], we utilise the focal loss and dice loss in a 20:1 weighted scheme. At this stage, predicted and actual masks can be directly compared. Unlike usual SAM implementations, we choose to train the Intersection-over-Union (IoU) head to provide more representative mean Average Precision (mAP) metrics (see Eq. 1). Also, when generating masks, we configure the model to allow three predicted mask outputs and select the final mask based on the highest IoU score. The overall loss calculation integrates both segmentation and IoU loss.

After training the YOLO and SAM models separately to optimise their individual performances, we freeze the YOLO layers, couple its predicted bounding boxes to the SAM Prompt Encoder and continue training the SAM Mask Decoder to refine the segmentation process for 10 additional epochs. The alignment of predicted and ground truth masks is managed using the Kuhn-Munkres assignment algorithm [https://doi.org/10.1002/nav.3800020109] by minimizing the IoU cost matrix. Due to higher spatial complexity of certain classes, particularly SL and Other, we select YOLO masks for faint objects of such classes at $1\sigma$ background level, as these predictions are more stable for low-intensity artefacts. We provide the segmentation mAP (see Table 3) using a fixed seed for reproducibility. The mAP formula for instance segmentation is given by:

\text{mAP}=\frac{1}{Q}\sum_{q=1}^{Q}\text{AP}_{q}

(1)

where $Q$ is the number of classes, and $\text{AP}_{q}$ is the average precision for the $q$ -th class, calculated as the area under the precision-recall curve at different IoU thresholds.

Category	mAP50	mAP75	mAP50-90
Overall	90.1	73.5	55.4
Small	82.6	59.6	48.3
Medium	91.2	75.3	57.1
Large	84.6	63.4	58.0
CR	97.6	94.9	70.2
SR	94.8	77.7	50.6
SL	91.8	75.5	57.3
ROS	82.3	55.2	48.2
Other	83.9	64.1	50.6

Table 3: Mask mAP at different IoU thresholds resulted from SAM predictions on validation set. While smaller mAP for Other class may be caused by its under-representation, the CR class shows highest scores, which may be attributed to its predictable location. Category pixel areas are: small

[0,32^{2})

, medium

[32^{2},96^{2})

and large

[96^{2},\infty)

4 Discussion

In our study, we enhanced artefact detection and segmentation in XMM-Newton images by integrating CNNs with ViT-based models, significantly boosting accuracy and reducing false positives in astronomical analysis. We combined traditional YOLO models for bounding box predictions with advanced SAM models for zero-shot segmentation, demonstrating the benefits of diverse neural network strategies in addressing complex image processing challenges. Despite improvements, high variation in exposure times and intensity levels in space imagery necessitate further model refinement tailored for astronomical missions. Additionally, these variations pose challenges in dataset annotation, eventually making it difficult to establish clear thresholds for distinguishing artefacts from the background. The XAMI average end-to-end inference time per image containing annotations is $100\text{ms}$ , suitable for medium to large image data (up until hundreds of thousands of images similar to ours) and applications which do not particularly require instant real-time processing. The SAM heavy architecture still represents a bottleneck for prediction, with $70-80\text{ms}/$ image.

Acknowledgements. The authors acknowledge the contribution of Inès Perez, Léa Zuili, Simon Astarita to dataset annotations. This publication uses data generated via the Roboflow.com and Zooniverse.org platforms.

\printbibliography