NEural Engine for Discovering Luminous Events (NEEDLE): identifying rare transient candidates in real time from host galaxy images

Xinyue Sheng,

{}^{1}

Matt Nicholl,

{}^{1}

Ken W. Smith,

{}^{1}

David R. Young,

{}^{1}

Roy D. Williams,

{}^{2}

Heloise F. Stevance,

{}^{3}

Stephen J. Smartt,

{}^{3,1}

Shubham Srivastav,

{}^{3,1}

Thomas Moore

{}^{1}

{}^{1}

Astrophysics Research Centre, School of Physics and Astronomy, Queen’s University, Belfast, BT7 1NN, UK

{}^{2}

Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, EH9 3HJ, UK

{}^{3}

Department of Physics, University of Oxford, Denys Wilkinson Building, Keble Road, Oxford OX1 3RH, UK
E-mail: [email protected]

(Accepted XXX. Received YYY; in original form ZZZ)

Abstract

Known for their efficiency in analyzing large data sets, machine learning-based classifiers have been widely used in wide-field sky survey pipelines. The upcoming Vera C. Rubin Observatory Legacy of Time and Space Survey (LSST) will generate millions of real-time alerts every night, enabling the discovery of large samples of rare events. Identifying such objects soon after explosion will be essential to study their evolution. This requires a machine learning framework that makes use of all the information available, including light curve, host galaxy and other contextual data. Using $\sim 5400$ transients from the ZTF Bright Transient Survey as training and testing data, we develop NEEDLE, a novel hybrid (convolutional neural network $+$ dense neural network) classifier to select for two rare classes with strong environmental preferences: superluminous supernovae (SLSNe) preferring dwarf galaxies, and tidal disruption events (TDEs) occurring in the centres of nucleated galaxies. The input data includes (i) cutouts of the detection and reference images, (ii) photometric information contained directly in the alert packets, and (iii) host galaxy magnitudes from Pan-STARRS. Despite having only a few tens of examples of the rare classes, our average (best) completeness on an unseen test set reaches 77% (93%) for SLSNe and 72% (87%) for TDEs. While very encouraging for completeness, this may still result in a large fraction of false positives (relatively low purity) for the rare transients, given the large class imbalance in real surveys. However, the goal of NEEDLE is to find good candidates for spectroscopic classification, rather than to select pure photometric samples. Our network is designed with LSST in mind and we expect performance to improve further with the higher resolution images and more accurate transient and host photometry that will be available from Rubin. Our system will be deployed as an annotator on the UK alert broker, Lasair, to provide predictions to the community in real time.

keywords:

transient – neural networks – classification

^†^†pubyear: 2023

1 Introduction

Thanks to modern time-domain sky surveys, such as the Zwicky Transient Facility (ZTF; Bellm et al. 2018), Asteroid Terrestrial impact Last Alert System (ATLAS; Tonry et al. 2018), Panoramic Survey Telescope and Rapid Response System (Pan-STARRS; Chambers et al. 2019), and All-sky Automated Search for Supernovae (ASAS-SN; Shappee et al. 2014), increasing numbers of transients have been discovered, catalogued and studied. The diversity of their spectra, and even photometric properties such as absolute magnitude, rise and decline rate and duration, have led to the identification of new and rare classes of events. Even more encouraging is that the upcoming Legacy Survey of Space and Time (LSST; Ivezić et al. 2019) survey will significantly increase the transient discovery rate through deeper observations, wide field of view and colour information from six filters.

In recent years, novel and rare superluminous supernovae (SLSNe) and tidal disruption events (TDEs) have been intensively studied, although their intrinsic physical mechanisms remain unclear. SLSNe are around $\sim 10$ times brighter than a Type Ia supernova (SN) and $\sim 100$ times brighter than core-collapse SNe. Their rise timescales of greater than $\gtrsim 15-30$ days are also longer than typical supernovae (Quimby et al., 2011; Gal-Yam, 2019; Nicholl, 2021). The hydrogen-poor events, also termed SLSNe Type I, mostly occur in low-mass dwarf galaxies with high specific star-formation rates and low metallicities (Lunnan et al., 2014; Leloudas et al., 2015; Perley et al., 2016; Angus et al., 2016; Schulze et al., 2017; Chen et al., 2017), which provides important hints for finding and identifying such events. Studying SLSNe allows researchers to fill gaps in our understanding of stellar evolution, particularly core-collapse supernovae in low-metallicity environments, and explore the extreme mass loss and rotation of possible massive progenitor stars.

A Tidal Disruption Event (TDE) occurs when a star’s orbit gets close enough to be disrupted by the massive black hole (MBH) at the centre of a galaxy, leading to accretion onto the MBH with luminous emission and possibly jets (Hills, 1975; Rees, 1988; Gezari, 2021). Such rare events provide researchers with an opportunity to conveniently investigate accretion flows on quiescent black holes (at the low end of the MBH mass distribution), with accretion rates that change by orders of magnitude on human timescales. Compared with SNe and SLSNe, TDEs can be differentiated by their locations in centres of their host galaxies as well as light curves that show a constant temperature. This provides helpful information for machine learning algorithms to learn their unique features.

Modern research on transients is mainly conducted on their spectra in the frequency domain and photometric information in the time domain. Spectra are essential to reveal their chemical compositions and physical properties (mass, velocity, redshift, etc). However, as photometric images require much shorter exposure time than spectra, they are preferred for observing missions that pursue large night sky coverage and long-term repeated detection. As the number of transients discovered in wide-field imaging sky surveys grows exponentially, it is no longer possible to obtain spectra for most transients due to the expensive exposure times required.

The Vera C. Rubin Observatory (VRO) is planning to conduct LSST starting in 2025 (Ivezić et al., 2019). LSST will observe the whole Southern sky and part of the Northern sky, including a Wide-Fast-Deep field ( $90\%$ of the observing time) with seasonal cadence and a Deep-Drilling field with dense and deep detection. Alert brokers, such as the UK alert broker Lasair (Smith et al., 2019), will provide researchers with real-time (within minutes to days) access to transient data. LSST is predicted to detect about 10 million transient alerts (defined as detections of time-varying flux) per night (Kantor, 2014). These alerts will include $\sim 10^{4}$ SLSNe (Villar et al., 2018) and $3,500-8,000$ TDEs (Bricman & Gomboc, 2020) per year. However, the number of conventional SNe detected each year will be $\gtrsim 10^{6}$ , meaning that only a small fraction of events will ever be observed spectroscopically. It is therefore essential to identify the most interesting candidates photometrically, in order to prioritise them for spectroscopy.

Machine learning algorithms will play an important role in classifying and filtering these alerts in real-time. This project aims to build up a hybrid classifier that fully takes advantage of various machine learning algorithms and combines different astronomical resources to identify candidate rare transients, such as SLSNe and TDEs, at or before their luminosity peak. For this reason, we are motivated to use only the properties available at the time of an early photometric detection: the early light curve, the associated discovery and reference images, and any cataloged host galaxy, but no information (such as redshift) that would require additional observations. We call this classifier the NEural Engine for Discovering Luminous Events (NEEDLE).

The paper outline is below: Section 2 reviews some of the existing techniques in machine learning classification and why SLSNe and TDEs are promising targets. Section 3 illustrates the data sources from ZTF bright transient survey and Pan-STARRS, and analyses the correlations between different features and transients. Section 4 describe the image and metadata pre-processing methods, including a binary classifier to assess image quality. Section 5 shows the model architecture, training and test sets, and development details. Section 6 shows the performance of the classifiers by confusion matrix as well as their completeness and purity diagrams, and illustrates the pipeline of NEEDLE to provide classifications publicly on Lasair. Then, Section 7 discusses the transient labelling issues, and comparisons with currently popular classifiers, and difficulties and improvements. Finally, Section 8 is the summary of this paper.

2 Contextual and machine learning classification

2.1 The host galaxy matters

The environment of transients have shown strong correlations with transients properties. For example, the rates of typical Type Ia and core-collapse SNe scale with host galaxy stellar mass (Sullivan et al., 2006; Li et al., 2011). The relative fractions of different SN types vary between galaxies with different masses (Graur et al., 2017) and star-formation rates (Botticella et al., 2017). The locations within their hosts vary, with some types of SNe also showing strong preferences for occurring in the brightest or bluest parts of their hosts (Fruchter et al., 2006; Kelly & Kirshner, 2012; Blanchard et al., 2016). The rare transient classes that we are interested in, SLSNe and TDEs, are prime candidates for selection via their environments, as each shows strong biases in their host galaxies.

SLSNe are very unusual in that they show a strong preference (shared only by long gamma-ray bursts) for dwarf star-forming galaxies. SLSN samples also show a high fraction of irregular or interacting galaxies (Chen et al., 2017; Ørum et al., 2020), but overall occur in low-density environments rather than groups or clusters (Cleland et al., 2023). The locations of SLSNe within their hosts broadly track an exponential disk profile, but many events also occur at large offsets or in regions of low UV flux (Hsu et al., 2023).

TDEs, like active galactic nuclei (AGN), occur at the centres of galaxies hosting MBHs. However, TDEs are rarely observed in galaxies with masses above $\sim{\rm few}\times 10^{10}{\rm M}_{\odot}$ (van Velzen et al., 2021; Ramsden et al., 2022), since for very massive black holes the disruption occurs inside the event horizon. TDE hosts in particular show a large over-representation of recently quenched (French et al., 2016) galaxies with green colours (Hammerstein et al., 2022; Yao et al., 2023). Compared to typical galaxies, their light profiles tend to be strongly peaked towards the nucleus (Law-Smith et al., 2017; Graur et al., 2018).

Some existing codes employ the context of where a transient appears to aid classification. For example, Sherlock, applied on Lasair is an integrated massive database system that classifies transients by cross-matching the position of a transient with all major astronomical catalogues (Smith et al., 2020). By associating transients with galaxies, galaxy nuclei, known AGN, variables or very bright stars, Sherlock provides a top-level classification of any transient as a likely SN, nuclear transient, AGN, etc. Similarly, using contextual information, the ALeRCE Stamp Classifier takes the first images and alert metadata for an object to provide a preliminary classification of AGN, SN, variable star, asteroid or bogus (Carrasco-Davis et al., 2021). Baldeschi et al. (2020) presents a Random Forest (RF) classifier for galaxy classification based on recent star formation history and morphology, and applies it to the hosts of core-collapse and thermonuclear SNe, indicating that the colours and shapes of hosts can help the separation between two classes, better than random guessing.

Other codes go further and attempt to predict the spectroscopic sub-type of transient. Foley & Mandel (2013); Kisley et al. (2022) use purely host galaxy photometry to provide the probabilities of different types of SNe. GHOST (Gagliano et al., 2021) employs a novel gradient ascent method to find the associated host galaxies, and based on the features of hosts and angular offset, they apply a RF to distinguish SLSNe, Type Ia SNe and core-collapse SNe. Gagliano et al. (2023) takes host properties and light curves as inputs to classify SNe Ia, SNe II, and SNe Ib/c, and obtain increasing accuracy with later phases. In summary, different transients have unique preferences for where they occur, and these can help reveal their likely nature.

Refer to caption — Figure 1: Examples of the science, reference and difference images of three general classes: SLSN, TDE and SN.

2.2 Machine learning architectures on transient classification

Machine learning has been widely applied to astrophysical transients for classification tasks, such as BDT (SNGuess, Miranda et al. 2022; Avocado, Boone 2019; Sherlock, Smith et al. 2020), RF (FLEET, Gomez et al. 2020; Baldeschi et al. 2020; ALeRCE light curve classifier, Sánchez-Sáez et al. 2021), and Neural Networks. Particularly, deep learning algorithms (deep neural networks) have shown powerful performance in extracting features of data to improve classification without manual feature selection.

Recurrent Neural Networks (RNN) are capable of learning the correlations among close and distant time steps among time-series and are designed for classification, modelling and prediction. RNNs can extract features from light curves of different classes of transients to distinguish them. Examples of such codes include RAPID (Muthukrishna et al., 2019), SuperRAENN (Villar et al., 2020), Superphot (Hosseinzadeh et al., 2020), Classifier for GOTO (Burhanudin et al., 2021) and Early-time transient Classifier (Gagliano et al., 2023). Attention mechanism has also been applied, such as TimeModAttn (Pimentel et al., 2022).

On the other hand, Convolutional Neural Networks (CNN) are mainly designed for visual imagery classification. They can generate feature maps of the input data while training, and attempt to associate these features with class labels. Image-based classifiers have not yet attained the widespread use of light-curve classifiers, but experiments to date have shown that this approach is a very promising alternative, as it can take into account the transient position and host galaxy morphology, as discussed in 2.1. Transients researchers have been implementing CNNs for transient classification with codes such as ALeRCE (Carrasco-Davis et al., 2021), DELIGHT (Förster et al., 2022), and recent work on light curves by Burhanudin & Maund (2023), proving that CNNs are able to achieve high accuracy in identifying various types of transients.

The above architectures have also shown promising performance in classifying rare events like SLSNe and TDEs. For SLSNe, classifiers using light curves achieve completeness $\sim 0.69-0.83$ , and in one case up to 1.00 completeness (Qu & Sako, 2022; Muthukrishna et al., 2019; Sánchez-Sáez et al., 2021). For TDEs, existing codes achieve 0.40 completeness (Gomez et al., 2023) at early phases, or better than 0.80 with full light curves (Stein et al., 2023). A detailed review is shown in Table 3.

However, there are still difficulties and limitations remaining. CNNs may struggle with low image quality due to issues of signal-to-noise, resolution, bright nearby objects or detector cosmetics, resulting in mislabelling. Light curves with sparse cadences and few observations are difficult for RNNs to extract the correlations. For training and test data sets, the number of spectroscopically-confirmed SLSNe and TDEs make up only 1-2% of all transient samples, leading to possible underrepresented learning. Many classifiers, such as Hložek et al. (2023), have been trained on simulated data sets (e.g. PlasTiCC; Kessler et al. 2019), which avoids the difficulties that must be overcome when dealing with real data. Finally, any classifiers that require redshift information or the declining part of a light curve may not be suitable for early-time classification.

Novel architectures are required to gain better accuracy. In recent years, hybrid neural networks have become more popular. CNNs with artificial neural networks (ANN, fully connected neural networks) are able to fully use images and metadata (position, redshift, etc) together to provide high accuracy predictions (e.g. GaZNets, Li et al. 2022; ALeRCE, Carrasco-Davis et al. 2021). Other architectures, such as transformers like ASTROMER (Donoso-Oliva et al., 2022), use an Autoencoder with positional embedding and self-attention blocks to gain the representation of transients’ light curves, which can be further applied to classification and modelling.

In short, more deep learning applications for astronomical study are expected to digest multivariate data. This might include magnitudes or fluxes in the time dimension, images (in one or more filters), and contextual information from existing catalogues. Our goal here is to take the first steps in realising a hybrid classifier that tries to maximise the information used from images, simple light curve features and host galaxy features, and apply this to the case of finding SLSNe and TDEs in wide-field surveys.

3 Data Set

For this project we require a training and test set of transients with known classes (based on spectroscopic classifications). In this section we outline the sources of data used to train and validate our code.

3.1 ZTF bright transients database

Although our ultimate goal is to develop a classifier for LSST, for our initial training and test set before that survey begins we use the Zwicky Transient Facility (ZTF) Bright Transient Survey (BTS) (Bellm et al., 2018; Fremling et al., 2020; Perley et al., 2020). The ZTF public survey covers the entire Northern sky to a depth of $\approx 20-20.5$ mag every 2-3 nights in $g$ and $r$ filters. The BTS has been spectroscopically classifying all ZTF-detected supernovae brighter than $\approx 19$ mag since June of 2018. We choose this dataset as it is the largest homogeneous set of labelled transients available, and the data are comparable to LSST in terms of imaging cadence and the format of the real-time alerts.

We downloaded the entire ZTF BTS sample brighter than 19 mag, up to March 2022, and use this as the basis of our sample. This contains 5703 spectroscopically classified transients. Information, such as ZTF object ID, coordinates, discovery date and spectroscopic type can be found from ZTF Bright Transient Survey Sample Explorer¹¹1https://sites.astro.caltech.edu/ztf/bts/explorer.php?f=s&subsample=trans&classstring=&endpeakmag=19.0&purity=y&quality=y. After removing duplicates and missing objects in the ZTF database, 5388 ZTF objects are obtained. This includes over 5000 SNe, but only 37 SLSNe and 18 TDEs. We therefore supplement the BTS data set with any SLSNe or TDEs published in ZTF sample papers. This includes TDEs from Hammerstein et al. (2022); van Velzen et al. (2021) and SLSNe from Chen et al. (2022). Given that some of these objects are already in the BTS data, our total numbers of SLSNe and TDEs are 87 and 64, respectively.

All transients fall into five general categories, shown in Table 4:

•

“Common” Supernovae (including all spectroscopic Type Ia, core-collapse, and interacting SNe)
•

Superluminous supernovae (considering here only the hydrogen-poor SLSNe Type I)
•

Tidal Disruption Events
•

Possible SNe/transients of ambiguous nature (calcium-rich, gap transients)
•

Non-SN (novae and stellar outbursts).

The latter two categories make up only 1% of the sample, and can generally be filtered out by their fast light curves before being passed to the machine learning classifier. We do not include them in our training or test set, but include them in the Appendix for completeness. The first category is very broad, containing 97.2% of events. However, as the task of NEEDLE is to distinguish among SNe, TDEs and SLSNe, we avoid sub-dividing the SN class so that more attention can be focused on the rare classes of interest. This is the first version of NEEDLE, our aim is that future versions with improved architecture and more data will be able to perform more fine-grained classification of the various supernova sub-types. Table 1 provides counts of objects with image and magnitude data from ZTF, as well as the numbers that also have cataloged host data in $g$ and $r$ bands from deeper surveys.

Band	g			r
Label	SN	SLSN	TDE	SN	SLSN	TDE
Object	5185	80	62	5237	87	64
Object & Host	4959	37	60	5016	41	62

Table 1: The number of objects in each class with images and light curve information from ZTF, as well as those with host galaxy matches in existing catalogs from Sherlock. The information is provided separately for the

g

and

r

bands.

3.2 Images

We wrote Python scripts to download ZTF cutout images centered at the transient positions, using the ZTF image database API. For each ZTF object, starting from its discovery date to 200 days later, we downloaded all available images in the $g$ and $r$ bands. This includes: the Science image - the image taken in each visit, containing the transient flux; the Reference image - a stacked image from before the event’s discovery, providing a template of the host galaxy and surrounding field; and the Difference image - the subtraction of the above two images, containing only transient flux. The requested image size is 1 arcmin. This size is large enough to include most host galaxies, while larger images would include more unrelated sources in the field. Figure 1 shows examples of ZTF images obtained for three classes of transients we consider.

3.3 Image metadata

We label each image with metadata including ZTF object ID, class label (SN, SLSN or TDE), RA, Dec, image size, and date. We retrieve separately for each filter the start and end Julian dates where the object is detected. This information is stored in a JSON file for each object. Although we download all images and their associated metadata for these objects, we find better performance in training when we give our network only one image per object, therefore when training the model we use the image metadata to select one image from close to the time of light curve peak.

3.4 Light curve metadata

For each object, we also retrieve its photometry for each available detection through Lasair. Our aim is to include some simple light curve parameters (features) as additional data for our classifier. We use the Lasair API to query the light curve using the ZTF object ID. After cross-matching with the image data, we found that not every image has a corresponding magnitude, as Lasair contains only the public ZTF photometry. Although more light curve data would be available by querying the ZTF forced photometry, using the data from Lasair ensures consistent formatting between our training data set and future real-time alert classifications we wish to perform with our trained model. This light curve data (detection dates and magnitudes) is appended to the metadata file for each object. The simple features extracted are listed in Table 2.

3.5 Host galaxy metadata

The Sherlock software package (Smith et al., 2020) is integrated into Lasair, and automatically provides a contextual classification by cross-matching with a library of historical and on-going sky survey catalogs. This provides preliminary classifications as transients, variables, artefacts, etc, based on association with nearby galaxies, known cataclysmic variable stars, active galactic nuclei, or bright stars. For this project, we query the Sherlock table on Lasair to find the coordinates of the most likely host galaxy for each of our transients.

We use these coordinates to retrieve host galaxy magnitudes from Data Release 2 of the Pan-STARRS survey. Pan-STARRS is a wide-field imaging system that observes 30,000 deg ${}^{2}$ of the Northern sky in five broadband filters ( $g$ , $r$ , $i$ , $z$ and $y$ ). The stacked depth is up to 23.3 mag, 23.2 mag, 23.1 mag, 22.3 mag, 21.3 mag, respectively (Chambers et al., 2019; Flewelling et al., 2020). The Pan-STARRS footprint completely overlaps the ZTF coverage, and DR2 contains data taken between 2010 and 2014, prior to the ZTF survey. Therefore, Pan-STARRS should contain all ZTF transient host galaxies brighter than $\sim 23$ mag.

Possible hosts were found by Sherlock for most transients, but about half of the SLSN hosts are missed, as they are likely fainter than the Pan-STARRS DR2 limiting magnitude. This is unsurprising given that most SLSNe explode in distant dwarf galaxies. We used the Pan-STARRS DR2 API to obtain the Aperture magnitude in $g$ , $r$ , $i$ , $z$ and $y$ bands. The colour of the host can be measured by $g-r$ , and $r-i$ , and is correlated with the age and star-formation rate of the stellar population.

The full list of metadata used in this study is given in Table 2.

Metadata	Feature	Definition
Transients alerts	$m_{peak}$	Peak magnitude among all existing observations.
	$m_{discovery}$	Magnitude of the discovery observation.
	$\Delta T_{discovery}$	Time difference between current observation and discovery observation
	$\Delta m_{discovery}$	Magnitude difference between current observation and discovery observation
	$\frac{\Delta m_{discovery}}{\Delta T_{discovery}}$	Ratio of the magnitude difference over time difference since discovery
	$\frac{\Delta m_{recent}}{\Delta T_{recent}}$	Ratio of the magnitude difference over time difference since last detection
Host galaxy	$m_{gAp}$	Host magnitude in $g$ band
	$m_{rAp}$	Host magnitude in $r$ band
	$m_{iAp}$	Host magnitude in $i$ band
	$m_{zAp}$	Host magnitude in $z$ band
	$m_{yAp}$	Host magnitude in $y$ band
	$m_{g-r}$	Host magnitude difference between $g$ and $r$ bands
	$m_{r-i}$	Host magnitude difference between $r$ and $i$ bands
	separationArcsec	The distance between the host centre and transient on the image
	$\Delta m_{host-peak}$	Magnitude difference between the host magnitude and $M_{peak}$

Table 2: Summary of light curve and host galaxy features included in our metadata.

3.6 Data Analysis

Before building a model, we check for correlations or clusters within our metadata. Figure 2 shows simple features obtained from the ZTF $r$ -band light curves. We show the apparent magnitude around the peak $m_{\rm peak}$ , the magnitude contrast between transient and host, the elapsed time between first detection and light curve peak, as well as a measure of the light curve slope during the rise, the “rising ratio” ( $m_{\rm peak}$ - $m_{\rm discovery}$ )/( $t_{\rm peak}-t_{\rm discovery}$ ).

It can be seen that the distributions of SLSN and TDE apparent magnitudes in our sample skew dimmer than the distribution for other SNe. This is due in part to the lack of nearby events in these rare classes. The need to include examples from outside of the magnitude-limiting BTS sample may also bias these events towards fainter magnitudes, but is unavoidable given the class size imbalance.

SLSNe show a much larger contrast at peak with their host galaxies, standing out from SNe and TDEs. Moreover, their rising timescales are longer than other transients. Normal SNe show the fastest rise, with TDEs showing a broad distribution peaking in between the other classes. The median SLSN rising ratio is similar to TDEs, but the deviation is smaller. To compare some of the key parameters more clearly, Figure 3 presents the cumulative distributions of host galaxy contrast ( $m_{\rm peak}$ - $m_{\rm host}$ ) and approximate rise time ( $t_{\rm peak}$ - $t_{\rm discovery}$ ), where the three classes show clear differences.

Similarly, Figure 4 shows a corner plot for host galaxy metadata, including magnitudes and colours in $g$ , $r$ and $i$ bands, and the offset in arcseconds between the transient coordinates and the host galaxy centroid. Again the plot shows that SLSNe tend to have the faintest hosts, with a slight bias to bluer colours in $g-r$ . As expected, the host offset for most TDEs is clustered around $\sim 0.0-1.0$ arcseconds. SLSNe, with their compact hosts, tend to show small offsets of a few arcseconds, whereas the distribution is much broader for typical SNe in extended galaxies.

4 Data Preprocessing

In this section, we illustrate the steps used to clean and prepare our data set before training our model.

4.1 Image Preprocessing

Some of the images we obtained from the ZTF database were found to have irregular sizes, shapes and missing pixels. Such issues can be caused by a transient position close to the edge of the detector field of view, or nearby bright stars that are masked out (but can still leave diffraction spikes or subtraction artefacts). Examples are shown in Figure 5. These poor quality images can severely impact the training process. Therefore, there are a range of ways to identify, modify or delete them before training.

4.1.1 Image size cutout

An image with a shape slightly smaller than 60x60 pixels (for example, 58x58 pixels) will be expanded to 60x60 pixels by repeating the last row or column on each side. However, for those with very small sizes, they will be removed. On the other hand, those larger than 60x60 pixels are reduced to 60x60 size.

4.1.2 Quality check model

Images with missing or unreliable pixels are tricky to deal with, and those bad images greatly harm the training process. One common feature is that such images often have very large standard deviations ( $\sigma$ ), much larger than normal images. However, our experiments showed that a quality cut based only on $\sigma$ still cannot get rid of a small number of problematic images that have reasonable standard deviations. Therefore, a binary convolutional neural network is developed to determine whether an image is good or bad.

Firstly, we label those image with $\sigma>1000$ as ‘bad’, and manually select some examples of these bad images (in $g$ and $r$ bands). We label the others as ‘good’. Then we feed them into a simple two-layer CNN classifier for training and testing. The outputs give the probability of being a good image, shown in Figure 5. Those good-quality images with a confidence greater than 0.5 are allowed for further processing, and those bad images are excluded. Figure 5(a) and Figure 5(b) show the confusion matrix and Receiver operating characteristic curve (ROC). The closer the curve is to the upper left corner, the more accurate the classifier is. The model rejects 98.4% of bad images, and so we apply it as the first stage of data preprocessing process. In following experiments, about 12 peak images of ZTF objects are removed, taking 0.22% of the whole image set.

4.1.3 Z-scaling and normalization

Astronomical pixel data can span a large dynamic range within a single image, which can cause problems for classifiers that need to learn faint features. The IRAF Z-scale algorithm²²2https://js9.si.edu/js9/plugins/help/scalecontrols.html, designed for displaying images as pixel intensity maps, is widely used to pick out features close to the background level of the image. The algorithm determines a minimum (z-min) and maximum (z-max) pixel value to display (pixels with values outside this range are displayed with the intensity at zero or saturated).

In our case, we apply the same z-scaling algorithm to replace any NaN values or anomalously faint pixels with the z-min value. We do not apply a mask for z-max, to avoid treating real features (such as a bright transient) as saturated. Min-max normalization is then applied to the scaled data, limiting the values to [0,1].

4.1.4 Data augmentation

Data augmentation is a technique to create additional artificial samples within a training set. This is particularly helpful when dealing with classes containing few examples, such as our SLSNe and TDEs. Augmentation techniques for images include resizing, random flip** (horizontally or vertically), and random rotation (between 0 and 360 degrees, with any missing data at the edges filled with neighboring values). It is developed as a custom layer built after the input layer for convenience. While training, the images will be randomly modified through this layer for each epoch. Flips and rotations mean that the model is not encouraged to incorrectly learn specific location or orientation features. We do not apply resizing, in order to preserve the pixel scale of the data.

4.2 Metadata preprocessing

Metadata consists of the light curve features and the host galaxy magnitudes, colours and offsets. Details are shown in Table 2. Currently, any missing metadata is replaced with zeros. Although a magnitude, time difference or offset of zero does have a physical meaning in this case, we find that adding zeros doesn’t influence classifier performance in our experiments. Alternative methods will be considered in the next version of NEEDLE.

Data standardization is applied for data scaling. Every feature is assumed to follow a Gaussian distribution among all samples, and individual values are scaled by its mean and standard deviation. In this way, the model can learn different feature distributions of the three classes, individually. Such scaling data are stored with the model.

4.3 Data compression and indexing

In order to feed a large amount of pre-processed data into the classifier for training on any computing platform, one convenient method is to store and fetch the data with HDF5 binary data format (Collette, 2013). This allows users to transfer data among different facilities easily, and accelerate the training time for parameter optimization. In addition, a custom index has been added to each sample participating in training and testing, which can help users easily trace their ZTF IDs, thereby assisting case studies. Here we store the image set, metadata set, labels, and sample index set in HDF5 format. Training/test set separation is conducted after loading the HDF5 data.

5 Classifier Architecture and training

In this section, we introduce the design of our NEEDLE code and discuss the details of the model architecture.

We build our model within the Tensorflow Keras framework. We implement a custom Class called NEEDLE that inherits from Keras.Model. This Class includes the basic user-defined model functions (train, test, build, predict, loss function, etc), as well as model plotting and model visualization.

5.1 Hybrid neural network

To fully utilize the image and metadata, a hybrid model is required. Inspired by Carrasco-Davis et al. (2021), we build up a model that involves a block of convolutional layers for image inputs and a block of fully-connected layers for metadata inputs.

Figure 7 shows the model architecture. The image block consists of a data augmentation layer (random flip** and rotations) and two convolutional layers, each followed by a MaxPooling layer. The output of the last pooling layer is flattened into a 1D vector and fed into a fully connected dense layer with 64 neurons. The metadata block consists of two fully connected dense layers with 128 neurons each. The two types of outputs are then concatenated and fed into two dense layers (192, 32 neurons, respectively). Finally the outputs are fed into the output layer.

Each layer uses a ReLU acitivation function. The exception to this is in the final output layer, which uses a softmax activation function to provide the probablities that an object belongs to each class.

5.2 Training and test sets

Since the samples of SLSNe and TDEs are very small compared to the large number of normal SNe, training becomes difficult when more than $\sim 20\%$ of them are put into the test set. Through experiments, we found that some objects are easily classified correctly with high probability, regardless of whether they are in the training set or the test set, however, some objects are difficult and return poor predictions. Therefore, a fair approach that still allows us to train our model on a reasonable number of objects is to give a unique random seed, shuffle the dataset and randomly select test objects, repeating this process and training the model several times to average the results. We choose to include 15 SLSNe, 15 TDEs and 15 SNe in the test set each time, and repeat this process 10 times to calculate the average model performance.

5.3 Weighted loss function

We start by importing the loss function SparseCategoricalCrossentropy from Keras. This is designed for multi-class tasks. As our training data have extremely unbalanced labels and the majority are SNe, the model will naturally learn more SN features to quickly decrease the loss function, resulting in poor predictions for other classes. One solution is that we give more weight to rare labels and less weight to common labels. In this way, our model can extract features of different classes equally. Our weighted loss function is

loss=-\frac{1}{N}\sum_{m=1}^{M}W_{m}\sum_{i=1}^{N_{m}}[y_{i}\log(\hat{y_{i}})+% (1-y_{i})\log(1-\hat{y_{i}})],W_{m}=\frac{1}{n_{m}}

(1)

Here $N$ is the number of objects in one batch/epoch. $N_{m}$ and $n_{m}$ mean the number of objects from the $m$ -th class in the batch/epoch and in the whole training set, respectively. $W_{m}$ is the class weight for the $m$ -th class. $y_{i}$ and $\hat{y_{i}}$ refer to the true class and the model prediction for one input, respectively.

5.4 Training optimization

To set the learning rate, we employ the ExponentialDecay method, which decreases the learning rate exponentially with growing steps while training. Equation 2 shows the algorithm. $lr_{0}$ is the initial learning rate. $\alpha$ means the user-defined decay rate. $N$ and $n$ mean the training step and user-defined decay steps, respectively. Through experimentation, we find acceptable performance using the following parameters: $l_{r_{0}}=0.0002$ , $\alpha=0.95$ and $n=100$ .

l_{r_{i}}=l_{r_{0}}\times\alpha^{N/n}

(2)

An optimizer is the strategy for updating the weights and biases of neural networks, in order to help reduce the loss function to the desired minimum. For this model, we apply Adaptive Moment Estimation (Adam; Kingma & Ba, 2014), which is a special stochastic gradient descent algorithm that updates weights using two exponential decay rates.

Overfitting is a non-negligible problem when training, which means that the model extracts noise³³3This is a particularly important problem for astronomical data as they are inherently noisy rather than real features from the training data, resulting in high accuracy on the training set but low accuracy on the validation and test sets. To avoid this, tensorflow.keras.callbacks.EarlyStop** is called to monitor the training process. When the loss for the validation set is not smaller than that at the previous 3 epochs, the training will stop.

5.5 Optimal network architecture

We have tried and adjusted a variety of architectures and parameters. Given the size of the training set and the limited information in the images, a deep network is not suitable as it will likely lead to rapid overfitting and large fluctuations in the loss function. We therefore experiment with networks with only a few CNN layers.

KerasTuner (O’Malley et al., 2019) is an easy-to-use, scalable hyperparameter optimization framework that allows users to set ranges of neurons, activation function, and learning rates. It will automatically run every combination of configurations and search for optimal solutions. We apply this method to adjust the architecture and hyperparameters in NEEDLE.

The results show that a model with two Convolutional layers (each 128 3*3 kernels) for image inputs, and two fully-connected layers (each 64, 128 neurons) for metadata inputs, is able to perform the best predictions. The detailed architecture is shown in Figure 7. The learning rate, batch size and number of epochs are $3e^{-5}$ , 128, and 300, respectively.

6 Experiments and results

In this section, we investigate model performance on the ZTF BTS sample. In particular, we aim to determine which metadata are important to include in our training and test sets, the expected purity and completeness, and how confidently we can predict the type of an object at early phases. We also present the NEEDLE pipeline that we are implementing on Lasair.

6.1 Classifier performance with & without host metadata

Initially, we train only with information available from the real-time transient alerts: the science image (here assumed close to the time of maximum light), the reference image, and the transient metadata such as magnitude and time since discovery. For convenience we call this version NEEDLE-T (for transient). We then retrain the model, this time including the cataloged host galaxy properties obtained from Sherlock and Pan-STARRS, and we label this version NEEDLE-TH (transient+host).

Figure 8 shows the confusion matrix with the completeness of the three models on the test set. The prediction is decided by the maximal probability among three classes. The values given in the confusion matrix are the averages of 10 model realisations with randomly shuffled test sets (containing 15 objects per class each time, with remaining objects in the training set). The initial NEEDLE-T classifier (Figure 7(a)) can recognize 79% of normal SNe and 76% of SLSNe in the test set on average. It is worth recalling that more than half of SLSNe in our sample do not have cataloged hosts, therefore only 41/87 SLSNe can be included in the full NEEDLE-TH model. If we train NEEDLE-T only on these objects with detected hosts (enabling a fair comparison later with NEEDLE-TH), the averaged true positives of SLSNe decrease slightly to 55%, and the large range shows that the predictions are less stable with the smaller sample. This is shown in Figure 7(b). Adding the host metadata in NEEDLE-TH (Figure 7(c)) improves the performance for SLSNe to 77% on average, despite the smaller sample size, showing the importance of including host galaxy information. Even in the worst-performing model, at least 53% SLSNe are correctly identified with the help of host magnitudes and colour information, and the highest completeness reaches 93%.

This effect is even more pronounced for TDEs. For NEEDLE-TH, the average true positive rate for TDEs grows from 57% to 72% with the addition of host information. While galaxy colours do differ for TDEs compared to the other transient types, more likely this improvement reflects the fact that all TDEs in the sample have a small offset because they occur in the nuclei of their hosts.

6.2 Completeness and Purity

Figure 9 shows the completeness and purity trends on the unseen test set with increasing probability thresholds for classification, in both NEEDLE-TH (corresponding to Figure 7(c)) and NEEDLE-T (for only those objects having cataloged hosts, corresponding to Figure 7(b)). Here a class is only assigned if $p({\rm class})>x$ for the most probable class. In each case we show the average and standard deviation of 10 trained models.

We can see that on average, for SLSNe we attain a completeness 76% (52%) and a purity 85% (92%) for a threshold $p({\rm SLSN})>=0.5(0.75)$ . For TDEs, we obtain completeness 66% (38%) and purity 80% (93%). The results are fairly competitive with other popular classifiers (shown in Table 3, especially considering that we only use single images and limited light curve information.

We note that the purity achieved with our balanced test set will likely not reflect the purity obtainable in a real survey, due to the large imbalance in rates between SLSNe/TDEs and normal SNe. Therefore, when selecting objects in real time, one may wish to choose a high probability threshold to minimise the absolute numbers of normal SNe mis-classified as SLSNe or TDEs.

Figure 10 shows confusion matrices for transients classified with probability $p({\rm class})>=0.75$ . We show the completeness for NEEDLE-TH on the balanced, unseen test sets (Figure 9(a)) and completeness and purity matrices for the full data set (Figures 9(b), 9(c)). With $p({\rm class})>=0.75$ , NEEDLE-TH can correctly classify 95% TDEs and 97% SLSNe-I in the full data set. However, for even just a few % SN contamination, this results in a real-world purity of around 20% for the rare classes, showing the importance of choosing a probability threshold carefully. NEEDLE is designed to select young SLSN and TDE candidates for spectroscopic follow-up, rather than to produce large photometric samples. Therefore, a purity of a few $\times 10\%$ is an acceptable price for the high completeness.

We also investigate the importance of including host galaxy metadata. The diagrams show that SLSNe and TDEs essentially always gain higher completeness and purity when host metadata is included.

6.3 Classification from early detections

As NEEDLE is designed to provide a probability for each label after only a few early detections, we also test the average performance of NEEDLE-TH over time since explosion by attempting to classify a time series of pre-peak detections of 30 randomly selected objects in each class. We show the predicted $p({\rm SLSN})$ against time before peak for 30 SLSNe, and $p({\rm TDE})$ for 30 TDEs, in Figure 11.

For most SLSNe (Fig 10(a)) and TDEs (Fig 10(b)), the probability assigned to the correct class grows as the events approach the peak. This is likely due to the longer baseline over which the light curve features can be evaluated, indicating that properties such as light curve rise time and slope and host galaxy contrast are important features in NEEDLE. This is particularly apparent in the case of SLSNe, where magnitude contrast with the host galaxy (which is maximised at light curve peak) is also an important feature.

6.4 Real-time annotation on Lasair

We aim to provide NEEDLE classifications in close to real time via the LSST:UK alert broker, Lasair. Our classifier will digest incoming transients from a pre-filtered Kafka stream produced by a simple Lasair query, using data from ZTF (or LSST in the future), and provide the probabilities of different classes for each object. To return our classifications to the broker, we make use of the Lasair annotator⁴⁴4https://lasair.readthedocs.io/en/main/concepts/annotations.html feature, which allows verified users to add information to the transients database in a format that is query-able by another user.

Figure 12 shows the process in detail. NEEDLE is trained and tested using the ZTF alerts coming from Lasair. New alerts will be filtered by a customized SQL query to provide only young, reliable, extragalactic, non-repeating transients. Specifically, we retain events:

•

discovered within the last 60 days
•

with more than 3 confident detections (to reduce the chance of bad subtractions)
•

predicted to be a Supernova or Nuclear Transient by Sherlock (i.e. not a known AGN or Galactic variable).

Then, NEEDLE selects the brightest available detection as the input image, if it passes the quality image checker. NEEDLE then collects the host coordinates and photometry from Sherlock and Pan-STARRS, computes the predicted probabilities from the trained network, and sends them back to Lasair as annotations.

We have tested this process end-to-end with a preliminary version of NEEDLE. Our goal is to run the fully trained NEEDLE model automatically on all ZTF alerts passing our SQL filter, and release the results as a public stream on Lasair, beginning in early 2024.

7 Discussion

7.1 Individual discrepancy among rare transients

As mentioned in Section 5.2, the difficulty of classifying each individual object in our data set varies. One reason for this may be issues with the host galaxy metadata. In the Pan-STARRS survey, very nearby resolved galaxies may be broken into multiple sources by the survey photometry pipeline, resulting in underestimated host magnitudes. Failed host association may also cause issues, leading to the wrong photometry being retrieved. This is a particular problem for SLSNe, where many of the true hosts are not detected.

We also identify several real features of our objects that influence the ease of classification. For SLSNe, we found those that are easily classified often have relatively high $m_{\rm discovery}$ and $\Delta T_{\rm discovery}$ , low $\Delta m_{\rm discovery}$ , in a slightly bluer and faint (low ${g-r}$ and ${r-i}$ ) host galaxy, i.e. they are bright with a slow rise and a star-forming host, consistent with classic SLSNe in the literature. SLSNe in slightly more massive galaxies, or with short rise times, are more difficult to separate from normal SNe.

For TDEs, objects are most easily classified if they have a bright $m_{\rm discovery}$ and a shorter $\Delta T_{\rm discovery}$ than typical SLSNe. This could also occur because these events occur in the nuclei of galaxies, and so tend to be found closer to peak unless the flux contrast with the host is large. In future work we will investigate in more detail how to optimise the training process to account for these variations.

7.2 Comparisons to previous classifiers

In recent years, several transient classifiers have been designed that can recognize TDEs and in particular SLSNe. Some of them gain excellent accuracy for SLSNe by making use of their uniquely slow light curves. For the same reason, many of these classifiers show better performance when more light curve data are available at later phases. Table 3 shows comparisons of these classifiers with our NEEDLE Classifier.

The advantage of NEEDLE is that we do not require multiple detections or host redshifts as input because at LSST depth, few galaxies will have spectroscopic redshifts. We only use single-stamp images, alert photometry and cataloged host magnitudes (when available), enabling an informed real-time prediction from as little as one detection. Furthermore, all data used in training is from real survey detections rather than simulations. It is likely that we could gain an even better performance by making use of more detailed light curve information, and this is the aim for future development. However, the goal of NEEDLE is not to produce pure samples of photometrically classified events, but to provide probabilities of potential SLSNe and TDE at an early stage to guide spectroscopic follow-up. From this perspective, completeness may be more important than purity.

Code Paper Model Data sources Inputs Performance for SLSNe & TDEs tdescore Stein et al. (2023) XGBoost ZTF alerts, Pan-STARRS hosts 10 features of full light curves, 5 features from the context. In balanced test sets: TDE completeness of 77.0%, purity of 80.3% . FLEET-SLSN Gomez et al. (2020) Random forests Supernovae: Open Supernova Catalog (OSC, Guillochon et al. (2017)), ZTF; Host: SDSS, PS1/3 $\pi$ Light curve features: width of light curves, the phase offsets, the peak magnitudes from $g$ and $r$ bands; Host features: apparent magnitudes, half-light radius in $r$ band, offset (same as NEEDLE here), offset normalized by galaxy radius, the apparent magnitude difference between transient at peak and host. In unbalanced/observed sets: SLSN purity of about 85% and completeness of 20% FLEET-TDE Gomez et al. (2023) Random forests Transients: spectroscopically classified transients from TNS, light curves from ZTF; Host: SDSS, PS1/3 $\pi$ Similar to FLEET-SLSNe. In unbalanced/observed sets: 20 days after discovery: about 40% completeness and about 30% purity; 40 days after discovery: about 40% completeness and about 50% purity. ALeRCE-light curve classifier (Sánchez-Sáez et al., 2021) Balanced random forests ZTF light curves, with labels from a variety of catalogues. Detection features: 56 features per band, and 12 features computed using $g$ and $r$ bands, yielding a total of 124 detection features; Non-detection features: 9 features per band defined using all the non-detections associated with a given source. In balanced test sets: 100% accuracy with 26% deviation, high accuracy though only 24 SLSNe samples. SCONE Qu & Sako (2022) CNN A set of LSST deep drilling field simulations. 2D Gaussian process generating flux heatmaps as function of time and filter (wavelength). In balanced test sets: Without redshift, SLSN accuracy is 0.69, 0.76 and 0.92 at 0, 5 and 50 days after discovery; with redshift, the values are 0.91, 0.93 and 0.97. SN classifier Burhanudin & Maund (2023) CNN & transfer learning Light curves from Open Supernova Catalog; the Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC). Referred from Qu & Sako (2022) In unbalanced test sets: for PLAsTiCC-simulated SLSNe, the accuracy with and without redshift are 0.61 and 0.65, respectively. RAPID Muthukrishna et al. (2019) RNN Simulated data from PLAsTiCC transient models. A matrix with each row composed of the imputed light-curve fluxes for each band, repeated values of the host galaxy redshift, and the MW dust reddening. In (nearly) balanced test sets: for SLSNe, the accuracy is 0.83 (2 days after trigger) to 0.85 (40 days); for TDEs, the numbers are 0.59 and 0.86. SuperRAENN Villar et al. (2020) Recurrent autoencoder (RAE) & Random Forests Light curves from Pan-STARRS1 Medium Deep Survey (PS1 MDS) with known redshifts. Gaussian-processed light curve for RAE inputs, then use RAE latent features as inputs for random forest. In unbalanced test sets: for SLSNe, the completeness and purity are 0.76 and 0.81, with threshold larger than 0.7, their values increase to 0.83 and 0.91, respectively. Host redshift is considered. Superphot Hosseinzadeh et al. (2020) Random forests Light curves from Pan-STARRS1 Medium Deep Survey (PS1 MDS) 6 Principle Component Analysis coefficients on modelled light curves with known redshifts. In unbalanced sets: for SLSNe, the completeness and purity are 0.82 and 0.67, respectively. NEEDLE This work CNN+DNN ZTF Bright transient survey, Sherlock-predicted hosts and Pan-STARRS catalogs Science and reference image in a single band, simple light curve and most galaxy metadata. For SLSNe-I, averaged completeness is 0.77, averaged purity is 0.82 in the test sets. For TDE, the numbers are 0.72 and 0.79.

Table 3: Comparisons among various transient classifiers for SLSNe and TDEs.

7.3 Remaining difficulties and future improvement

While the NEEDLE algorithm is performing well on the ZTF data set, we are continuing to develop the code and plan a number of future improvements to deal with current limitations, including:

•

Unbalanced classes. The rare transients we focus on, including SLSNe and TDE, have less than 100 samples for each. After being split into training and test sets, fewer samples are actually used for model training. Weighted loss functions can solve this problem to a certain extent, but feature extraction of rare classes requires more samples and smarter algorithms, such as small-sample learning.
•

$NaN$ value replacement and padding. Replacing $NaN$ values with zero poses difficulties for classification, since zero has physical meaning for magnitudes and image pixels. However, given the input requirements of neural networks, some kind of padding is inevitable. The possible solution is to fill in the missing values based on context and modelling.
•

There is a large fraction of SLSNe without cataloged hosts, and in the early years of LSST, this fraction will increase at higher redshifts, and affect the other classes too. To mitigate this, we will continue to develop and apply the NEEDLE-T version of the code in parallel for such cases.
•

Including more contaminants in our training set. Currently we assume that contaminants such as AGN and variable stars can be rejected by simple Lasair filters before they reach NEEDLE. This may not be the case in future, deeper surveys like LSST.
•

LSST alert cutouts in real-time are much smaller than for ZTF, and full-size images will only be available after 80 hours. To achieve real-time prediction, older images might need to be included in classification and training, rather than just the most recent detection.

Additionally, we have further plans for new features, and analyses to improve our training process. These include:

•

A detailed study of mis-classified objects. The next step will be to visualize the model behaviour for such objects individually, and try to understand the reasons for mis-classification.
•

Including more time-domain information. Rather than one image and a set of simple light curve features, using more advanced features, including the light curve directly, or even providing a time series of images, may help to improve performance. For the next Classifier, Conv3D and other relevant networks, such as Recurrent Neural Networks, will be considered.
•

Early stage classification. The ultimate goal of our classifier is to identify rare events in their early stages, even before their peak. With the addition of images at multiple epochs, we will analyze the trends in accuracy as more observations are added.

8 Conclusion

This paper introduces a novel context-based hybrid neural network, capable of providing probabilistic classifications of transients as SLSNe, TDEs or SNe, at early stages in their evolution. The literature suggests that SLSNe are typically found in faint, star-forming dwarf galaxies, and TDEs are located at the center of the host galaxies that are often green and centrally concentrated. Based on the understanding of their unique characteristics, the NEEDLE classifier is specifically developed to exploit this information and identify these sources using only single science and reference image stamps of a transient and its environment, as well as simple photometric information from ZTF (and in future LSST) alert packets, and cataloged host galaxy magnitudes.

Since half of the hosts of SLSNe are not cataloged, two versions of NEEDLE are developed, differentiated according to whether they contain host information. Results show that even without a cataloged host galaxy, we are able to identify 79% of SNe, 76% SLSNe and 62% TDEs, averaged among 10 test sets. As host information is added, the true positive rate of TDEs increases to approximately 72%, and the highest true positive rate of SLSNe increases from 87% to 93%. To mitigate the issue of contamination from common SNe, we recommend a threshold probability before assigning a classification of $p\gtrsim 0.75$ . Under these conditions, we can achieve over 95% completeness for SLSNe and TDEs (on the full data set), at the cost of around 20% purity.

Furthermore, photometric information has a greater impact on the predictions of SLSN and TDE compared with ordinary SNe, in particular because of their longer rise times. Experiments have shown that the fraction of SLSNe and TDEs classified correctly increases as they rise towards the light curve peak.

Currently, NEEDLE is being implemented on the Lasair alert broker, and is able to process ZTF streaming alerts and submit an annotation containing the probabilities for three classes back to the broker. These public classifications will help to inform spectroscopic follow-up for these rare events. We are continuing to develop NEEDLE, and expect that image-based classification of transients will be a powerful tool in the era of LSST.

Acknowledgements

We thank members of the QUB transients discovery team, the QUB Virtual Institute for Data Intensive Research, and the Turing Institute for many helpful conversations. In particular, we thank Aleksandar Novakovic, Richard Gault and Miguel Arana for their advice on neural networks. We also thank Sean McGee and Sebastian Gomez for helpful feedback on the project.

MN and XS are supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 948381). MN also acknowledges support from an Alan Turing Fellowship and UK Space Agency Grant No. ST/Y000692/1. Lasair is supported by the UKRI Science and Technology Facilities Council and is a collaboration between the University of Edinburgh (grant ST/N002512/1) and Queen’s University Belfast (grant ST/N002520/1) within the LSST:UK Science Consortium.

Data Availability

This paper is based on publicly available data. We are making all results from this work publicly available via the LSST:UK Lasair broker, and the data repository including the trained models, scripts and HDF5 format data will be made available via Github.

References

Angus et al. (2016) Angus C. R., Levan A. J., Perley D. A., Tanvir N. R., Lyman J. D., Stanway E. R., Fruchter A. S., 2016, Monthly Notices of the Royal Astronomical Society, 458, 84
Baldeschi et al. (2020) Baldeschi A., Miller A., Stroh M., Margutti R., Coppejans D. L., 2020, ApJ, 902, 60
Bellm et al. (2018) Bellm E. C., et al., 2018, Publications of the Astronomical Society of the Pacific, 131, 018002
Blanchard et al. (2016) Blanchard P. K., Berger E., Fong W.-f., 2016, ApJ, 817, 144
Boone (2019) Boone K., 2019, The Astronomical Journal, 158, 257
Botticella et al. (2017) Botticella M. T., et al., 2017, A&A, 598, A50
Bricman & Gomboc (2020) Bricman K., Gomboc A., 2020, ApJ, 890, 73
Burhanudin & Maund (2023) Burhanudin U. F., Maund J. R., 2023, MNRAS, 521, 1601
Burhanudin et al. (2021) Burhanudin U. F., et al., 2021, MNRAS, 505, 4345
Carrasco-Davis et al. (2021) Carrasco-Davis R., et al., 2021, The Astronomical Journal, 162, 231
Chambers et al. (2019) Chambers K. C., et al., 2019, The Pan-STARRS1 Surveys (arXiv:1612.05560)
Chen et al. (2017) Chen T.-W., Smartt S. J., Yates R. M., Nicholl M., Krühler T., Schady P., Dennefeld M., Inserra C., 2017, Monthly Notices of the Royal Astronomical Society, 470, 3566
Chen et al. (2022) Chen Z. H., et al., 2022, arXiv e-prints, p. arXiv:2202.02059
Cleland et al. (2023) Cleland C., McGee S. L., Nicholl M., 2023, MNRAS, 524, 3559
Collette (2013) Collette A., 2013, Python and HDF5. O’Reilly
Donoso-Oliva et al. (2022) Donoso-Oliva C., Becker I., Protopapas P., Cabrera-Vives G., Vishnu M., Vardhan H., 2022, arXiv e-prints, p. arXiv:2205.01677
Flewelling et al. (2020) Flewelling H. A., et al., 2020, ApJS, 251, 7
Foley & Mandel (2013) Foley R. J., Mandel K., 2013, ApJ, 778, 167
Förster et al. (2022) Förster F., et al., 2022, arXiv e-prints, p. arXiv:2208.04310
Fremling et al. (2020) Fremling C., et al., 2020, ApJ, 895, 32
French et al. (2016) French K. D., Arcavi I., Zabludoff A., 2016, ApJ, 818, L21
Fruchter et al. (2006) Fruchter A. S., et al., 2006, Nature, 441, 463
Gagliano et al. (2021) Gagliano A., Narayan G., Engel A., Carrasco Kind M., LSST Dark Energy Science Collaboration 2021, ApJ, 908, 170
Gagliano et al. (2023) Gagliano A., Contardo G., Foreman Mackey D., Malz A. I., Aleo P. D., 2023, arXiv e-prints, p. arXiv:2305.08894
Gal-Yam (2019) Gal-Yam A., 2019, ARA&A, 57, 305
Gezari (2021) Gezari S., 2021, ARA&A, 59, 21
Gomez et al. (2020) Gomez S., Berger E., Blanchard P. K., Hosseinzadeh G., Nicholl M., Villar V. A., Yin Y., 2020, ApJ, 904, 74
Gomez et al. (2023) Gomez S., Villar V. A., Berger E., Gezari S., van Velzen S., Nicholl M., Blanchard P. K., Alexander K. D., 2023, ApJ, 949, 113
Graur et al. (2017) Graur O., Bianco F. B., Modjaz M., Shivvers I., Filippenko A. V., Li W., Smith N., 2017, ApJ, 837, 121
Graur et al. (2018) Graur O., French K. D., Zahid H. J., Guillochon J., Mandel K. S., Auchettl K., Zabludoff A. I., 2018, ApJ, 853, 39
Guillochon et al. (2017) Guillochon J., Parrent J., Kelley L. Z., Margutti R., 2017, The Astrophysical Journal, 835, 64
Hammerstein et al. (2022) Hammerstein E., et al., 2022, arXiv e-prints, p. arXiv:2203.01461
Hills (1975) Hills J. G., 1975, Nature, 254, 295
Hložek et al. (2023) Hložek R., et al., 2023, ApJS, 267, 25
Hosseinzadeh et al. (2020) Hosseinzadeh G., et al., 2020, ApJ, 905, 93
Hsu et al. (2023) Hsu C.-J., Tan J. C., Holdship J., Duo Xu Viti S., Wu B., Gaches B., 2023, arXiv e-prints, p. arXiv:2308.11803
Ivezić et al. (2019) Ivezić Ž., et al., 2019, ApJ, 873, 111
Kantor (2014) Kantor J., 2014, in Wozniak P. R., Graham M. J., Mahabal A. A., Seaman R., eds, The Third Hot-wiring the Transient Universe Workshop. pp 19–26
Kelly & Kirshner (2012) Kelly P. L., Kirshner R. P., 2012, ApJ, 759, 107
Kessler et al. (2019) Kessler R., et al., 2019, PASP, 131, 094501
Kingma & Ba (2014) Kingma D. P., Ba J., 2014, arXiv e-prints, p. arXiv:1412.6980
Kisley et al. (2022) Kisley M., Qin Y.-J., Zabludoff A., Barnard K., Ko C.-L., 2022, Classifying Astronomical Transients Using Only Host Galaxy Photometry, doi:10.48550/ARXIV.2209.02784, https://arxiv.longhoe.net/abs/2209.02784
Law-Smith et al. (2017) Law-Smith J., Ramirez-Ruiz E., Ellison S. L., Foley R. J., 2017, ApJ, 850, 22
Leloudas et al. (2015) Leloudas G., et al., 2015, Monthly Notices of the Royal Astronomical Society, 449, 917
Li et al. (2011) Li W., Chornock R., Leaman J., Filippenko A. V., Poznanski D., Wang X., Ganeshalingam M., Mannucci F., 2011, MNRAS, 412, 1473
Li et al. (2022) Li R., et al., 2022, arXiv e-prints, p. arXiv:2205.10720
Lunnan et al. (2014) Lunnan R., et al., 2014, ApJ, 787, 138
Miranda et al. (2022) Miranda N., et al., 2022, arXiv e-prints, p. arXiv:2208.06534
Muthukrishna et al. (2019) Muthukrishna D., Narayan G., Mandel K. S., Biswas R., Hložek R., 2019, Publications of the Astronomical Society of the Pacific, 131, 118002
Nicholl (2021) Nicholl M., 2021, Astronomy and Geophysics, 62, 5.34
O’Malley et al. (2019) O’Malley T., Bursztein E., Long J., Chollet F., ** H., Invernizzi L., et al., 2019, KerasTuner, https://github.com/keras-team/keras-tuner
Ørum et al. (2020) Ørum S. V., Ivens D. L., Strandberg P., Leloudas G., Man A. W. S., Schulze S., 2020, A&A, 643, A47
Perley et al. (2016) Perley D. A., et al., 2016, ApJ, 830, 13
Perley et al. (2020) Perley D. A., et al., 2020, ApJ, 904, 35
Pimentel et al. (2022) Pimentel Ó ., Estévez P. A., Förster F., 2022, The Astronomical Journal, 165, 18
Qu & Sako (2022) Qu H., Sako M., 2022, AJ, 163, 57
Quimby et al. (2011) Quimby R. M., et al., 2011, Nature, 474, 487
Ramsden et al. (2022) Ramsden P., Lanning D., Nicholl M., McGee S. L., 2022, MNRAS, 515, 1146
Rees (1988) Rees M. J., 1988, Nature, 333, 523
Sánchez-Sáez et al. (2021) Sánchez-Sáez P., et al., 2021, AJ, 161, 141
Schulze et al. (2017) Schulze S., et al., 2017, Monthly Notices of the Royal Astronomical Society, 473, 1258
Shappee et al. (2014) Shappee B., et al., 2014, in American Astronomical Society Meeting Abstracts #223. p. 236.03
Smith et al. (2019) Smith K. W., et al., 2019, Research Notes of the American Astronomical Society, 3, 26
Smith et al. (2020) Smith K. W., et al., 2020, PASP, 132, 085002
Stein et al. (2023) Stein R., et al., 2023, arXiv e-prints, p. arXiv:2312.00139
Sullivan et al. (2006) Sullivan M., et al., 2006, ApJ, 648, 868
Tonry et al. (2018) Tonry J. L., et al., 2018, PASP, 130, 064505
Villar et al. (2018) Villar V. A., Nicholl M., Berger E., 2018, The Astrophysical Journal, 869, 166
Villar et al. (2020) Villar V. A., et al., 2020, ApJ, 905, 94
Yao et al. (2023) Yao Y., et al., 2023, ApJ, 955, L6
van Velzen et al. (2021) van Velzen S., et al., 2021, ApJ, 908, 4

Appendix A ZTF objects type

Label

Type

Feature

Number

SN Ia (4113)

Thermonuclear explosion of white dwarf; spectrum lacks hydrogen and helium

4095

Iax

A faint and fast sub-class of SNe Ia

Ia-CSM

SN Ia interacting with nearby circumstellar material

SN II

(899)

Core-collapse explosion of a red supergiant $\gtrsim 8$ M ${}_{\odot}$

899

Stripped Envelope SN (363)

Ib/c

Massive stars that have lost their hydrogen (Ib) or hydrogen and helium layers (Ic)

216

IIb

Incomplete envelope strip**; initially show hydrogen lines, but quickly change to resemble a SN Ib

Ic-BL

Broad spectral lines due to high velocities, large nickel masses, the only SN type associated with gamma-ray bursts

Ibn

Supernova interacting with a helium-rich CSM

Interacting SN (211)

IIn

Hydrogen emission lines with narrow Doppler widths, indicating low-velocity CSM that has been shock-excited by a collision from the supernova ejecta

183

SLSN II

IIn brighter than -21 mag

SLSN

(87)

SLSN

10-100 times brighter than normal SN, no Hydrogen and usually no helium; late spectra resemble SNe Ic. Prefer dwarf galaxies.

TDE

(64)

TDE

A star approaches close to a supermassive black hole and is pulled apart by tidal forces, leading to fallback and accretion

Other (18)

Gap

Transient with luminosity intermediate between typical SNe and classical novae

Ca-rich

Faint and fast transients of ambiguous nature, with strong calcium lines in spectrum

Other

Non-SN (40)

Novae

Outburst on the surface of an accreting white dwarf

ILRT

Intermediate Luminosity Red Transient

LBV

Luminous Blue Variable: very massive star undergoing eruptive mass loss

LRN

Luminous Red Novae: mergers of low-mass stellar binaries

Sum

5795

Table 4: Summary of ZTF transients with their types, features and numbers. Our “SN” class includes the first four labels in this table, but in future versions we aim to resolve these SN sub-types