\receiveddate

XX Month, XXXX \reviseddateXX Month, XXXX \accepteddateXX Month, XXXX \publisheddateXX Month, XXXX \currentdateXX Month, XXXX \doiinfoXXXX.2022.1234567

\corresp

Corresponding author: Stefan Glüge (email: [email protected]).

Robust Low-Cost Drone Detection and Classification in Low SNR Environments

Stefan Glüge11{}^{\textbf{1}}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT    Matthias Nyfeler11{}^{\textbf{1}}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT    Ahmad Aghaebrahimian11{}^{\textbf{1}}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT   
Nicola Ramagnano22{}^{\textbf{2}}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT and Christof Schüpbach33{}^{\textbf{3}}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT
   Fellow    IEEE Institute of Computational Life Sciences, Zurich University of Applied Sciences, 8820 Wädenswil, Switzerland Institute for Communication Systems, Eastern Switzerland University of Applied Sciences, 8640 Rapperswil-Jona, Switzerland Armasuisse Science + Technology, 3602 Thun, Switzerland
Abstract

The proliferation of drones, or unmanned aerial vehicles (UAVs), has raised significant safety concerns due to their potential misuse in activities such as espionage, smuggling, and infrastructure disruption. This paper addresses the critical need for effective drone detection and classification systems that operate independently of UAV cooperation. We evaluate various convolutional neural networks (CNNs) for their ability to detect and classify drones using spectrogram data derived from consecutive Fourier transforms of signal components. The focus is on model robustness in low signal-to-noise ratio (SNR) environments, which is critical for real-world applications. A comprehensive dataset is provided to support future model development. In addition, we demonstrate a low-cost drone detection system using a standard computer, software-defined radio (SDR) and antenna, validated through real-world field testing. On our development dataset, all models consistently achieved an average balanced classification accuracy of 85%absentpercent85\geq 85\%≥ 85 % at SNR >12absent12>-12> - 12 dB. In the field test, these models achieved an average balance accuracy of >80%absentpercent80>80\%> 80 %, depending on transmitter distance and antenna direction. Our contributions include: a publicly available dataset for model development, a comparative analysis of CNN for drone detection under low SNR conditions, and the deployment and field evaluation of a practical, low-cost detection system.

{IEEEkeywords}

Deep neural networks, Robustness, Signal detection, Unmanned aerial vehicles

1 INTRODUCTION

\IEEEPARstart

Drones, or civil UAVs, have evolved from hobby toys to commercial systems with many applications. In particular, mini/amateur drones have become ubiquitous. With the proliferation of these low-cost, small and easy-to-fly drones, safety issues have became more pressing (e.g. spying, transfer of illegal or dangerous goods, disruption of infrastructure, assault). Although regulations and technical solutions (such as transponder systems) are in place to safely integrate UAVs into the airspace, detection and classification systems that do not rely on the cooperation of the UAV are necessary. Various technologies such as audio, video, radar, or radio frequency (RF) scanners have been proposed for this task [1].

In this paper, we evaluate different CNNs for drone detection and classification using the spectrogram data computed with consecutive Fourier transforms for the real and imaginary parts of the signal. To facilitate future model development, we make the dataset publicly available. In terms of performance, we focus on the robustness of the models to low SNRs, as this is the most relevant aspect for a real-world application of the system. Furthermore, we evaluate a low-cost drone detection system consisting of a standard computer, SDR, and antenna in a real-world field test.

Our contributions can therefore be summarised as follows:

  • We provide the dataset used to develop the model. Together with the code to load and transform the data, it can be easily used for future model development.

  • We compare different CNNs using 2222D spectrogram data for detection and classification of drones based on their RF signals under challenging conditions, i.e. low SNRs down to 2020-20- 20 dB.

  • We visualise the model embeddings to understand how the model clusters and separates different classes, to identify potential overlaps or ambiguities, and to examine the hierarchical relationships within the learned features.

  • We implement the models in a low-cost detection system and evaluate them in a field test.

1.1 RELATED WORK

A literature review on drone detection methods based on deep learning (DL) is given in [1] and [2]. Both works reflect the state of the art in 2024202420242024. Different DL algorithms are discussed with respect to the techniques used to detect drones based on visual, radar, acoustic, and RF signals. Given these general overviews, we briefly summarise recent work based on RF data, with a particular focus on the data side of the problem to motivate our work.

With the advent of DL-based methods, the data used to train models became the cornerstone of any detection system. Table 1 provides an overview of openly available datasets of RF drone signals. The DroneRF dataset [3] is one of the first openly available datasets. It contains RF time series data from three drones in four flight modes (i.e. on, hovering, flying, video recording) recorded by two universal software radio peripheral (USRP) SDR transceivers [4]. The dataset is widely used and enabled follow-up work with different approaches to classification systems, i.e. DL-based [5, 6], focused on pre-processing and combining signals from two frequency bands [7], genetic algorithm-based heterogeneous integrated k-nearest neighbour [8], and hierarchical reinforcement learning-based [9]. In general, the classification accuracies reported in the papers on the DroneRF dataset are close to 100%percent100100\%100 %. Specifically, [4], [5], and [6] report an average accuracy of 99.7%percent99.799.7\%99.7 %, 100%percent100100\%100 %, and 99.98%percent99.9899.98\%99.98 %, respectively, to detect the presence of a drone. There is therefore an obvious need for a harder, more realistic dataset.

Consequently, [10] investigate the detection and classification of drones in the presence of Bluetooth and Wi-Fi signals. Their system used a multi-stage detector to distinguish drone signals from the background noise and interfering signals. Once a signal was identified as a drone signal, it was classified using machine learning (ML) techniques. The detection performance of the proposed system was evaluated for different SNRs. The corresponding recordings (17171717 drone controls from eight different manufacturers) are openly available [11]. Unfortunately, the Bluetooth/Wi-Fi noise is not part of the dataset. Ozturk et al. [12] used the dataset to further investigate the classification of RF fingerprints at low SNRs by adding white Gaussian noise to the raw data. Using a CNN, they achieved classification accuracies ranging from 92%percent9292\%92 % to 100%percent100100\%100 % for SNR [10,30]absent1030\in[-10,30]∈ [ - 10 , 30 ]dB.

The openly available DroneDetect dataset [13] was created by Swinney and Woods [14]. It contains raw in-phase and quadrature (IQ) data recorded with a BladeRF SDR. Seven drone models were recorded in three different flight modes (on, hovering, flying). Measurements were also repeated with different types of noise, such as interference from a Bluetooth speaker, a Wi-Fi hotspot, and simultaneous Bluetooth and Wi-Fi interference. The dataset does not include measurements without drones, which would be necessary to evaluate a drone detection system. The results in [14] show that Bluetooth signals are more likely to interfere with detection and classification accuracy than Wi-Fi signals. Overall, frequency domain features extracted from a CNN were shown to be more robust than time domain features in the presence of interference. In [15] the drone signals from the DroneDetect dataset were augmented with Gaussian noise and SDR recorded background noise. Hence, the proposed approach could be evaluated regrading its capability to detect drones. They trained a CNN end-to-end on the raw IQ data and report an accuracy of 99%percent9999\%99 % for detection and between 72%percent7272\%72 % and 94%percent9494\%94 % for classification.

The Cardinal RF dataset [16] consists of the raw time series data from six drones + controller, two Wi-Fi and two Bluetooth devices. Based on this dataset, Medaiyese et al. [17] proposed a semi-supervised framework for UAV detection using wavelet analysis. Accuracy between 86%percent8686\%86 % and 97%percent9797\%97 % was achieved at SNRs of 30303030 dB and 18181818 dB, while it dropped to chance level for SNRs below 10101010 dB to 6666 dB. In addition, [18] investigated different wavelet transforms for the feature extraction from the RF signals. Using the wavelet scattering transform from the steady state of the RF signals at 30303030 dB SNR to train SqueezeNet [19], they achieved an accuracy of 98.9%percent98.998.9\%98.9 % at 10101010 dB SNR.

In our previous work [20], we created the noisy drone RF signals dataset111https://www.kaggle.com/datasets/sgluege/noisy-drone-rf-signal-classification from six drones and four remote controllers. It consists of non-overlap** signal vectors of 16384163841638416384 samples, corresponding to 1.2absent1.2\approx 1.2≈ 1.2 ms at 14141414 MHz. We added Labnoise (Bluetooth, Wi-Fi, Amplifier) and Gaussian noise to the dataset and mixed it with the drone signals with SNR [20,30]absent2030\in[-20,30]∈ [ - 20 , 30 ] dB. Using IQ data and spectrogram data to train different CNNs, we found an advantage in favour of the 2222D spectrogram representation of the data. There was no performance difference at SNR 0absent0\geq 0≥ 0 dB but a major improvement in the balanced accuracy at low SNR levels, i.e. 84.284.284.284.2% on the spectrogram data compared to 41.341.341.341.3% on the IQ data at 1212-12- 12 dB SNR.

Recently, [21] proposed an anchor-free object detector based on keypoints for drone RF signal spectograms. They also proposed an adversarial learning-based data adaptation method to generate domain independent and domain aligned features. Given five different types of drones, they report a mean average precision of 97.36%percent97.3697.36\%97.36 %, which drops to 55%absentpercent55\approx 55\%≈ 55 % when adding Gaussian noise with 2525-25- 25 dB SNR. The raw data used in their work is available222https://www.kaggle.com/datasets/zhaoericry/drone-rf-dataset, but yet, unfortunately not usable without any further documentation.

Table 1: Overview on openly available drone RF datasets.
Dataset Year Datatype UAV Noise Size
DroneRF [3] 2019 Raw Amplitude 3333 drones + 3333 controller Background RF activities 3.753.753.753.75 GB
Drone remote controller RF signal dataset [11] 2020 Raw Amplitude 17171717 controller none 124124124124 GB
DroneDetect dataset [13] 2020 Raw IQ 7777 drones + 7777 controller Bluetooth, Wi-Fi devices 66666666 GB
Cardinal RF [16] 2022 Raw Amplitude 6666 drones + 6666 controller Bluetooth, Wi-Fi 65656565 GB
Noisy drone RF signals [20] 2023 Pre-processed IQ and Spectrogram 6666 drones + 4444 controller Bluetooth, Wi-Fi, Gauss 23232323 GB

1.2 MOTIVATION

As we have seen in other fields, such as computer vision, the success of DL can be attributed to: (a) high-capacity models; (b) increased computational power; and (c) the availability of large amounts of labelled data [22]. Thus, given the large amount of available raw RF signals (cf. Tab. 1) we promote the idea of open and reusable data, to facilitate model development and model comparison.

With the noisy drone RF signals dataset [20], we have provided a first ready-to-use dataset to enable rapid model development, without the need for any data preparation. Furthermore, the dataset contains samples that can be considered as “hard” in terms of noise, i.e. Bluetooth + Wi-Fi + Gaussian noise at very low SNRs, and allows a direct comparison with the published results.

While the models proposed in [20] performed reasonably well in the training/lab setting, we found it difficult to transfer their performance to practical application. The reason was the choice of rather short signal vectors of 16384163841638416384 samples, corresponding to 1.2absent1.2\approx 1.2≈ 1.2 ms at 14141414 MHz. Since the drone signals occur in short bursts of 1.32absent1.32\approx 1.3\text{--}2≈ 1.3 – 2 ms with a repetition period of 60600absent60600\approx 60\text{--}600≈ 60 – 600 ms, our continuously running classifier predicts a drone whenever a burst occurs and noise during the repetition period of the signal. Therefore, in order to provide a stable and reliable classification per every second, one would need an additional “layer” to pool the classifier outputs given every 1.21.21.21.2 ms.

In the present work, we follow a data-centric approach and simply increase the length of the input signal to 75absent75\approx 75≈ 75 ms to train a classifier in an end-to-end manner. Again, we provide the data used for model development in the hope that it will inspire others to develop better models.

In the next section, we briefly describe the data collection and preprocessing procedure. Section 3 describes the model architectures and their training/validation method. In addition, we describe the setup of a low-cost drone detection system and of the field test. The resulting performance metrics are presented in Section 4 and are further discussed in Section 5.

2 MATERIALS

We used the raw RF signals from the drones that were collected in [20]. Nevertheless, we briefly describe the data acquisition process again to provide a complete picture of the development from the raw RF signal to the deployment of a detection system within a single manuscript.

2.1 DATA ACQUISITION

The drone’s remote control and, if present, the drone itself were placed in an anechoic chamber to record the raw RF signal without interference for at least one minute. The signals were received by a log-periodic antenna and sampled and stored by an Ettus Research USRP B210, see Fig. 1. In the static measurement, the respective signals of the remote control (TX) alone or with the drone (RX) were measured. In the dynamic measurement, one person at a time was inside the anechoic chamber and operated the remote control (TX) to generate a signal that is as close to reality as possible. All signals were recorded at a sampling frequency of 56565656 MHz (highest possible real-time bandwidth). All drone models and recording parameters are listed in Tab. 2, including both uplink and downlink signals.

Refer to caption
Figure 1: Recording of drone signals in the anechoic chamber. A DJI Phantom 4 Pro drone with the DJI Phantom GL300F remote control.
Table 2: Transmitters and receivers recorded in the development dataset and their respective class labels. Additionally, we show the center frequency (GHz), the channel spacing (MHz), the burst duration (ms), and the repetition period of the respective signals (ms).
Transmitter Receiver Label Center Freq. (GHz) Spacing (MHz) Duration (ms) Repetition (ms)
DJI Phantom GL300F DJI Phantom 4 Pro DJI 2.441752.441752.441752.44175 1.71.71.71.7 2.182.182.182.18 630630630630
Futaba T7C - FutabaT7 2.441752.441752.441752.44175 2222 1.71.71.71.7 288288288288
Futaba T14SG Futaba R7008SB FutabeT14 2.441752.441752.441752.44175 3.13.13.13.1 1.41.41.41.4 330330330330
Graupner mx-16 Graupner GR-16 Graupner 2.441752.441752.441752.44175 1111 1.9/3.71.93.71.9/3.71.9 / 3.7 750750750750
Bluetooth/Wi-Fi Noise - Noise 2.441752.441752.441752.44175
Taranis ACCST X8R Receiver Taranis 2.4402.4402.4402.440 1.51.51.51.5 3.1/4.43.14.43.1/4.43.1 / 4.4 420420420420
Turnigy 9X - Turnigy 2.4452.4452.4452.445 2222 1.31.31.31.3 61616161, 120120120120-2900290029002900 a
a The repetition period of the Turnigy transmitter is not static. First bursts were observed after 61616161 ms, the following signal bursts were observed in the interval [120,2900]1202900[120,2900][ 120 , 2900 ] ms

We also recorded three types of noise and interference. First, Bluetooth/Wi-Fi noise was recorded using the hardware setup described above. Measurements were taken in a public and busy university building. In this open recording setup, we had no control over the exact number or types of active Bluetooth/Wi-Fi devices and the actual traffic in progress.

Second, artificial white Gaussian noise was used, and third, receiver noise was recorded for 30303030 seconds from the USRP at various gain settings ([30,70]3070[30,70][ 30 , 70 ] db in steps of 10101010 dB) without the antenna attached. This should prevent the final model from misclassifying quantisation noise in the absence of a signal, especially at low gain settings.

2.2 DATA PREPARATION

To reduce memory consumption and computational effort, we reduced the bandwidth of the signals by downsampling from 56565656 MHz to 14141414 MHz using the SciPy [23] signal.decimate function with an 8888th order Chebyshev type I filter.

The drone signals occur in short bursts with some low power gain or background noise in between (cf. Tab. 2). We divided the signals into non-overlap** vectors of 1048576104857610485761048576 samples (74.974.974.974.9 ms) and only vectors containing a burst, or at least a partial burst, were used for the development dataset. This was achieved by applying an energy threshold. As the recordings were made in an echo-free chamber, the signal burst is always clearly visible. Hence, we only used vectors that contained a portion of the signal whose energy was above the threshold, which was arbitrarily set at 0.0010.0010.0010.001 of the average energy of the entire recording.

The selected drone signal vectors x𝑥xitalic_x with i{1,k}𝑖1𝑘i\in\{1,\ldots k\}italic_i ∈ { 1 , … italic_k } were normalised to a carrier power of 1111 per sample, i.e. only the part of the signal vector containing drone bursts was considered for the power calculation (m𝑚mitalic_m samples out of k𝑘kitalic_k). This was achieved by identifying the bursts as those samples where a smoothed energy was above a threshold. The signal vectors x𝑥xitalic_x are thus normalised by

x^(i)=x(i)/1mi|x(i)|2.^𝑥𝑖𝑥𝑖1𝑚subscript𝑖superscript𝑥𝑖2\hat{x}(i)=x(i)\ /\ \sqrt{\frac{1}{m}\sum_{i}|x(i)|^{2}}.over^ start_ARG italic_x end_ARG ( italic_i ) = italic_x ( italic_i ) / square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ( italic_i ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (1)

Noise vectors (Bluetooth, Wi-Fi, Amplifier, Gauss) n𝑛nitalic_n with samples i{1,k}𝑖1𝑘i\in\{1,\ldots k\}italic_i ∈ { 1 , … italic_k } were normalised to a mean power of 1111 with

n^(i)=n(i)/1ki|n(i)|2.^𝑛𝑖𝑛𝑖1𝑘subscript𝑖superscript𝑛𝑖2\hat{n}(i)=n(i)/\sqrt{\frac{1}{k}\sum_{i}|n(i)|^{2}}.over^ start_ARG italic_n end_ARG ( italic_i ) = italic_n ( italic_i ) / square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_n ( italic_i ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (2)

Finally, the normalised drone signal vectors were mixed with the normalised noise vectors by

y^(i)=(kx^(i)+n^(i))k+1, with k=10SNR10,formulae-sequence^𝑦𝑖𝑘^𝑥𝑖^𝑛𝑖𝑘1 with 𝑘superscript10SNR10\hat{y}(i)=\frac{\left(\sqrt{k}\cdot\hat{x}(i)+\hat{n}(i)\right)}{\sqrt{k+1}},% \text{ with }k=10^{\frac{\text{SNR}}{10}},over^ start_ARG italic_y end_ARG ( italic_i ) = divide start_ARG ( square-root start_ARG italic_k end_ARG ⋅ over^ start_ARG italic_x end_ARG ( italic_i ) + over^ start_ARG italic_n end_ARG ( italic_i ) ) end_ARG start_ARG square-root start_ARG italic_k + 1 end_ARG end_ARG , with italic_k = 10 start_POSTSUPERSCRIPT divide start_ARG SNR end_ARG start_ARG 10 end_ARG end_POSTSUPERSCRIPT , (3)

to generate the noisy drone signal vectors y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG at different SNRs.

2.3 DEVELOPMENT DATASET

To facilitate future model development, we provide our resulting dataset333https://www.kaggle.com/datasets/sgluege/noisy-drone-rf-signal-classification-v2 along with a code example444https://github.com/sgluege/noisy-drone-rf-signal-classification-v2 to load and inspect the data. The dataset consists of the non-overlap** signal vectors of 220superscript2202^{20}2 start_POSTSUPERSCRIPT 20 end_POSTSUPERSCRIPT samples, corresponding to 74.9absent74.9\approx 74.9≈ 74.9 ms at 14141414 MHz.

As described in Sec. 2.2, the drone signals were mixed with noise. More specifically, 50%percent5050\%50 % of the drone signals were mixed with Labnoise (Bluetooth + Wi-Fi + Amplifier) and 50%percent5050\%50 % with Gaussian noise. In addition, we created a separate noise class by mixing Labnoise and Gaussian noise in all possible combinations (i.e., Labnoise +++ Labnoise, Labnoise +++ Gaussian noise, Gaussian noise +++ Labnoise, and Gaussian noise +++ Gaussian noise). For the drone signal classes, as for the noise class, the number of samples for each SNR level was evenly distributed over the interval of SNRs [20,30]absent2030\in[-20,30]∈ [ - 20 , 30 ] dB in steps of 2222 dB, i.e., 679679679679-685685685685 samples per SNR level. The resulting number of samples per class is given in Tab. 3.

Table 3: Number of samples in the different classes in the development dataset.
Class DJI FutabaT14 FutabaT7 Graupner Taranis Turnigy Noise
#samples 1280128012801280 3472347234723472 801801801801 801801801801 1663166316631663 855855855855 8872887288728872

In our previous work [20] we found an advantage in using the spectrogram representation of the data compared to the IQ representation, especially at low SNRs levels. Therefore, we transform the raw IQ signals by computing the spectrum of each sample with consecutive Fourier transforms with non-overlap** segments of length 1024102410241024 for the real and imaginary parts of the signal. That is, the two IQ signal vectors ([2×220]delimited-[]2superscript220[2\times 2^{20}][ 2 × 2 start_POSTSUPERSCRIPT 20 end_POSTSUPERSCRIPT ]) are represented as two matrices ([2×1024×1024]delimited-[]210241024[2\times 1024\times 1024][ 2 × 1024 × 1024 ]). Fig. 2 shows four samples of the dataset at different SNRs. Note that we have plotted the log power spectrogram of the complex spectrum y^fftsubscript^𝑦fft\hat{y}_{\mathrm{fft}}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT roman_fft end_POSTSUBSCRIPT as

log10|y^fft|=log10(Re(y^fft)2+Im(y^fft)2)subscript10subscript^𝑦fftsubscript10Resuperscriptsubscript^𝑦fft2Imsuperscriptsubscript^𝑦fft2\log_{10}|\hat{y}_{\mathrm{fft}}|=\log_{10}(\sqrt{\mathrm{Re}(\hat{y}_{\mathrm% {fft}})^{2}+\mathrm{Im}(\hat{y}_{\mathrm{fft}})^{2}})roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT roman_fft end_POSTSUBSCRIPT | = roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( square-root start_ARG roman_Re ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT roman_fft end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_Im ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT roman_fft end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (4)
Refer to caption
(a) FutabaT14 at SNR 26262626 dB
Refer to caption
(b) DJI at SNR 6666 dB
Refer to caption
(c) Taranais at SNR 44-4- 4 dB
Refer to caption
(d) Noise at SNR 22222222 dB
Figure 2: Log power spectrogram and IQ data samples from the development dataset at different SNRs (2(a)-2(d))

2.4 DETECTION SYSTEM PROTOTYPE

For field use, a system based on a mobile computer was used as shown in Fig. 3 and illustrated in Fig. 4. The RF signals were received using a directional left-hand circularly polarised antenna (H&S SPA 2400/70/9/0/CP). The antenna gain of 8.58.58.58.5 dBi and the front-to-back ratio of 20202020 dB helped to increase the detection range and to attenuate the unwanted interferers in the opposite direction. Circular polarisation has been chosen to eliminate the alignment problem as the transmitting antennas have a linear polarisation. The USRP B210 was used to down-convert and digitise the RF signal at a sampling rate of 14141414 Msps. On the mobile computer, the GNU Radio program collected the baseband IQ samples in batches of one second and send one batch at a time to our PyTorch model, which classified the signal. To speed up the computations in the model we utilised an Nvidia GPU in computer. The classification results were then visualised in real time in a dedicated GUI.

GNU RadioGUIPyTorchGPUUSRPB2102.4-2.48 GHzMobile computerUSB3IQ DataClassificationResults
Figure 3: Block diagram of the mobile drone detection system.
Refer to caption
Figure 4: Detection prototype at the Zurich Lake in Rapperswil.

3 METHODS

3.1 MODEL ARCHITECTURE AND TRAINING

As in [20] we chose the Visual Geometry Group (VGG) CNN architecture [24]. The main idea of this architecture is to use multiple layers of small (3×3333\times 33 × 3) convolutional filters instead of larger ones. This is intended to increase the depth and expressiveness of the network, while reducing the number of parameters. There are several variants of this architecture, which differ in the number of convolutional layers (11111111 and 19191919, respectively). We used a variant with a batch normalisation [25] layer after the convolutions, denoted as VGG11_BN to VGG19_BN. For the dense classification layer, we used 256256256256 linear units followed by 7777 linear units at the output (one unit per class).

A stratified 5555-fold train-validation-test split was used as follows. In each fold, we trained a network using 80%percent8080\%80 % and 20%percent2020\%20 % of the available samples of each class for training and testing, respectively. Repeating the stratified split five times ensures that each sample was in the test set once in each experiment. Within the training set, 20%percent2020\%20 % of the samples were used as the validation set during training.

Model training was performed for 200200200200 epochs with a batch size of 8888. The PyTorch [26] implementation of the Adam algorithm [27] was used with a learning rate of 0.0050.0050.0050.005, betas (0.9,0.999)0.90.999(0.9,0.999)( 0.9 , 0.999 ) and weight decay of 00.

3.2 MODEL EVALUATION

During training, the model was evaluated on the validation set after each epoch. If the balanced accuracy on the validation set increased, it was saved. After training, the model with the highest balanced accuracy on the validation set was evaluated on the withheld test data. The performance of the models on the test data was accessed in terms of classification accuracy and balanced accuracy.

As accuracy simply measures the proportion of correct predictions out of the total number of observations, it can be misleading for unbalanced datasets. In our case, the noise class is over-represented in the dataset (cf. Tab. 3). Therefor, we also report the balanced accuracy, which is defined as the average of the recall obtained for each class, i.e. it gives equal weight to each class regardless of how frequent or rare it is.

3.3 VISUALISATION OF MODEL EMBEDDINGS

Despite their effectiveness, CNNs are often criticised for being “black boxes”. Understanding the feature representations, or embeddings, learned by the CNN helps to demystify these models and provide some understanding of their capabilities and limitations. In general, embeddings are high-dimensional vectors generated by the intermediate layers that capture essential patterns from the input data.

In our case, we chose the least complex VGG11_BN model to visualise its embeddings. When inferencing the test data, we collected the activations at the last dense classification layer, which consists of 256256256256 units. Given 3549354935493549 test samples, this results in a 256×35492563549256\times 3549256 × 3549 matrix. Using t-distributed Stochastic Neighbor Embedding (t-SNE) [28] and Uniform Manifold Approximation and Projection (UMAP) [29] as dimensionality reduction techniques, we project these high-dimensional embeddings into a lower-dimensional space, creating interpretable visualisations that reveal the model’s internal data representations.

Our goals were to understand how the model clusters and separates different classes, to identify potential overlaps or ambiguities, and to examine the hierarchical relationships within the learned features.

3.4 DETECTION SYSTEM FIELD TEST

We conducted a field test of the detection system in Rapperswil at the Zurich Lake. The drone detection prototype was placed on the shore (cf. Fig. 4) in line of sight of a wooden boardwalk across the lake, with no buildings to interfere with the signals. The transmitters were mounted on a 2.52.52.52.5 m long wooden pole. The signals from the transmitters were recorded (and classified in real time) at four positions along the walkway at approximately 110110110110 m, 340340340340 m, 560560560560 m and 670670670670 m from the detection system. Figure 5 shows an overview of the experimental setup. At each recording position, we measured with the directional antenna at three different angles, i.e. at 0superscript00^{\circ}0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT – facing the drones and/or remote controls, at 90superscript9090^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT – perpendicular to the direction of the transmitters, and at 180superscript180180^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT – in the opposite direction. Directing the antenna in the opposite direction should result in 20absent20\approx 20≈ 20 dB attenuation of the radio signals.

Refer to caption
Figure 5: Experimental measurement setup at the Zurich Lake in Rapperswil. One can see the four recording positions along the wooden walkway and the detection system positioned at the lake side. Further, recordings were done at different angels of the directional antenna indicated by the arrows at the detection system.

Table 4 lists the drones and/or remote controls used in the field test. Note that the Graupner drone and remote control are part of the development dataset (cf. Tab. 2), but were not measured in the field experiment. We assume that no other drones were present during the measurements, so recordings where none of our transmitters were used are labelled as “Noise”.

Table 4: Drones and/or remotes used in the field test
Class Drone/remote control
DJI DJI Phantom Pro 4 drone and remote
FutabaT14 Futaba T14 remote control
FutabaT7 Futaba T14 remote control
Taranis FrySky Taranis Q X7 remote control
Turnigy Turnigy Evolution remote control

For each transmitter, distance, and angle, 20202020 to 30303030 s, or approximately 300300300300 spectrograms were live classified and recorded. The resulting number of samples for each class, distance, and angle are shown in Tab. 5.

Table 5: Number of samples (#samples) for each class, distance and antenna direction (angle) recorded in the field test. Recordings at 00 m distance have no active transmitter and were therefore labelled “Noise”
class #samples Distance [m] #samples Angle [] #samples
DJI 4900490049004900 00 2597259725972597 00 9110911091109110
FutabaT14 5701570157015701 110110110110 6208620862086208 90909090 9076907690769076
FutabaT7 5086508650865086 340340340340 6305630563056305 180180180180 9226922692269226
Noise 2597259725972597 560560560560 6358635863586358
Taranis 5094509450945094 670670670670 5942594259425942
Turnigy 4032403240324032

4 RESULTS

4.1 CLASSIFICATION PERFORMANCE ON THE DEVELOPMENT DATASET

Table 6 shows the general mean ±plus-or-minus\pm± standard deviation of accuracy and balanced accuracy on the test data of the development dataset (cf. Sec. 2.3), obtained in the 5555-fold cross-validation of the different models.

There is no meaningful difference in performance between the models, even when the model complexity increases from VGG11_BN to VGG19_BN. The number of epochs for training (#epochs) shows when the highest balanced accuracy was reached on the validation set. It can be seen that the least complex model, VGG11_BN, required the least number of epochs compared to the more complex models. However, the resulting classification performance is the same.

Table 6: Mean ±plus-or-minus\pm± standard deviation of the accuracy (Acc.) and the balanced accuracy (balanced Acc.) obtained in 5555-fold cross-validation of the different models on the test data of the development dataset. An indication of the model training time is given with the mean ±plus-or-minus\pm± standard deviation of the number of training epochs (#epochs), i.e. when the highest balanced accuracy on the validation set was reached. The number of trainable parameters (#params) indicates the complexity of the model.
Model Acc. balanced Acc. #epochs #params
VGG11_BN 0.944±0.005plus-or-minus0.9440.0050.944\pm 0.0050.944 ± 0.005 0.932±0.002plus-or-minus0.9320.0020.932\pm 0.0020.932 ± 0.002 66.4±24.4plus-or-minus66.424.466.4\pm 24.466.4 ± 24.4 9.361069.36superscript1069.36\cdot 10^{6}9.36 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
VGG13_BN 0.947±0.003plus-or-minus0.9470.0030.947\pm 0.0030.947 ± 0.003 0.935±0.003plus-or-minus0.9350.0030.935\pm 0.0030.935 ± 0.003 138.6±46.4plus-or-minus138.646.4138.6\pm 46.4138.6 ± 46.4 9.541069.54superscript1069.54\cdot 10^{6}9.54 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
VGG16_BN 0.947±0.006plus-or-minus0.9470.0060.947\pm 0.0060.947 ± 0.006 0.937±0.005plus-or-minus0.9370.0050.937\pm 0.0050.937 ± 0.005 101.8±41.5plus-or-minus101.841.5101.8\pm 41.5101.8 ± 41.5 14.8610614.86superscript10614.86\cdot 10^{6}14.86 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
VGG19_BN 0.952±0.006plus-or-minus0.9520.0060.952\pm 0.0060.952 ± 0.006 0.939±0.008plus-or-minus0.9390.0080.939\pm 0.0080.939 ± 0.008 98.2±45.8plus-or-minus98.245.898.2\pm 45.898.2 ± 45.8 20.1710620.17superscript10620.17\cdot 10^{6}20.17 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT

Figure 6 shows the resulting 5555-fold mean balanced accuracy over SNRs [20,30]absent2030\in[-20,30]\,∈ [ - 20 , 30 ]dB in 2222 dB steps. Note that we do not show the standard deviation to keep the plot readable. In general, we observe a drastic degradation in performance from 1212-12- 12 dB down to near chance level at 2020-20- 20 dB.

The vast majority of misclassifications occurred between noise and drones and not between different types of drones. Figure 7 illustrates this fact. It shows the confusion matrix for the VGGG11_BN model for a single validation on the test data for the samples with 1414-14- 14 dB SNR.

Refer to caption
Figure 6: Mean balanced accuracy obtained in the 5555-fold cross-validation of the different models on the test set of the development dataset over the SNRs levels.
Refer to caption
Figure 7: Confusion matrix of the outputs of the VGG11_BN model on a single fold for the samples at 1414-14- 14 dB SNR from the test data. The average balanced accuracy is 0.710.710.710.71.

4.2 EMBEDDING SPACE VISUALISATION

Figure 8 shows the 2D t-SNE visualisation of the VGG11_BN embeddings of 3549354935493549 test samples from the development dataset. It can be seen that each class forms a separate cluster. While the different drone signal clusters are rather small and dense, the noise cluster takes up most of the embedding space and even forms several sub-clusters. This is most likely due to the variety of the signals used in the noise class, i.e. Bluetooth and Wi-Fi signals plus Gaussian noise.

We used t-SNE for dimensionality reduction because of its ability to preserve local structure within the high-dimensional embedding space. Furthermore, t-SNE has been widely adopted in the ML community and has a well-established track record for high-dimensional data visualisation. However, it is sensitive to hyperparameters such as perplexity and requires some tuning, i.e. different parameters can lead to considerable different results.

It can be argued that UMAP would be a better choice due to its balanced preservation of local and global structure together with its robustness to hyperparameters. Therefore, we created a web application555https://visvgg11bndronerfembeddings.streamlit.app that allows users to test and compare both approaches with different hyperparameters.

Refer to caption
Figure 8: 2D t-SNE visualisation of the VGG11_BN embeddings of 3549354935493549 test samples from the development dataset. The hyperparameters for t-SNE were: metric “eucleidean”, number of iterations 1000100010001000, perplexity 30303030 and method for gradient approximation “barnes_hut”

4.3 CLASSIFICATION PERFORMANCE IN THE FIELD TEST

For each model architecture, we performed 5555-fold cross-validation on the development dataset (cf. Sec. 3.1), resulting in five trained models per architecture. Thus, we also evaluated all five trained models on the field test data. We report the balanced accuracy ±plus-or-minus\pm± standard deviation for each model architecture for the complete field test dataset averaged over all directions and distances in Tab. LABEL:tab:field_test_acc.

Table 7: Mean ±plus-or-minus\pm± standard deviation of the balanced accuracy (balanced Acc.) of the complete field test recordings for the different models.
Model balanced Acc.
VGG11_BN 0.792±0.022plus-or-minus0.7920.0220.792\pm 0.0220.792 ± 0.022
VGG13_BN 0.807±0.011plus-or-minus0.8070.0110.807\pm 0.0110.807 ± 0.011
VGG16_BN 0.811±0.009plus-or-minus0.8110.0090.811\pm 0.0090.811 ± 0.009
VGG19_BN 0.806±0.016plus-or-minus0.8060.0160.806\pm 0.0160.806 ± 0.016

As observed on the development dataset (cf. Tab. 6), there is no meaningful difference in performance between the model architectures. We therefore focus on VGG11_BN, the simplest model trained, in the more detailed analysis of the field test results.

A live system should trigger an alarm when a drone is present. Therefore, the question of whether the signal is from a drone at all is more important than predicting the correct type of drone. Therefore, we also evaluated the models in terms of a binary problem with two classes “Drone” (for all six classes of drones in the development dataset) and “Noise”.

Table LABEL:tab:field_test_acc_type_direction shows that the accuracies were highly depend on the class. Our models generalise well to the drones in the dataset, with the exception of the DJI. The dependence on direction is not as strong as expected. Orienting the antenna 180 away from the transmitter reduces the signal power by about 20202020 dB, resulting in lower SNR and lower classification accuracy. However, as the transmitters were still quite close to the antenna, the effect is not pronounced. As we have seen on the development dataset in Fig. 6, there is a clear drop in accuracy once the SNR is below 1212-12- 12 dB. Apparently we were still above this threshold, regardless of the direction of the antenna.

Table 8: Mean balanced accuracy ±plus-or-minus\pm± standard deviation of the VGG11_BN models on the field test recordings for the different classes for each direction (0superscript00^{\circ}0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, 90superscript9090^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and 180superscript180180^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT). The upper part shows the accuracies for the classification problem (seven classes) and the lower part the accuracies for the detection problem “Drone” or “Noise”
Class 0superscript00^{\circ}0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT 90superscript9090^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT 180superscript180180^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT
DJI 0.623±0.080plus-or-minus0.6230.0800.623\pm 0.0800.623 ± 0.080 0.624±0.035plus-or-minus0.6240.0350.624\pm 0.0350.624 ± 0.035 0.540±0.051plus-or-minus0.5400.0510.540\pm 0.0510.540 ± 0.051
FutabaT14 0.716±0.101plus-or-minus0.7160.1010.716\pm 0.1010.716 ± 0.101 0.984±0.011plus-or-minus0.9840.0110.984\pm 0.0110.984 ± 0.011 0.911±0.042plus-or-minus0.9110.0420.911\pm 0.0420.911 ± 0.042
FutabaT7 0.724±0.041plus-or-minus0.7240.0410.724\pm 0.0410.724 ± 0.041 0.737±0.034plus-or-minus0.7370.0340.737\pm 0.0340.737 ± 0.034 0.698±0.059plus-or-minus0.6980.0590.698\pm 0.0590.698 ± 0.059
Noise 0.554±0.038plus-or-minus0.5540.0380.554\pm 0.0380.554 ± 0.038 0.924±0.026plus-or-minus0.9240.0260.924\pm 0.0260.924 ± 0.026 0.833±0.068plus-or-minus0.8330.0680.833\pm 0.0680.833 ± 0.068
Taranis 0.936±0.008plus-or-minus0.9360.0080.936\pm 0.0080.936 ± 0.008 0.879±0.002plus-or-minus0.8790.0020.879\pm 0.0020.879 ± 0.002 0.858±0.002plus-or-minus0.8580.0020.858\pm 0.0020.858 ± 0.002
Turnigy 0.899±0.129plus-or-minus0.8990.1290.899\pm 0.1290.899 ± 0.129 0.958±0.027plus-or-minus0.9580.0270.958\pm 0.0270.958 ± 0.027 0.962±0.024plus-or-minus0.9620.0240.962\pm 0.0240.962 ± 0.024
 Drone 0.958±0.011plus-or-minus0.9580.0110.958\pm 0.0110.958 ± 0.011 0.859±0.014plus-or-minus0.8590.0140.859\pm 0.0140.859 ± 0.014 0.847±0.017plus-or-minus0.8470.0170.847\pm 0.0170.847 ± 0.017
Noise 0.554±0.038plus-or-minus0.5540.0380.554\pm 0.0380.554 ± 0.038 0.924±0.026plus-or-minus0.9240.0260.924\pm 0.0260.924 ± 0.026 0.833±0.068plus-or-minus0.8330.0680.833\pm 0.0680.833 ± 0.068

What may be surprising is the low accuracy on the signals with no active transmitter, labelled as “Noise”, in the direction of the lake (0superscript00^{\circ}0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT). Given the uncontrolled nature of a field test, it could well be that there a drone was actually flying on the other side of the 2.32.32.32.3 km wide lake. This could explain the false positives we observed in that direction.

Table LABEL:tab:field_test_acc_distance shows the average balanced accuracy of the VGG11_BN models on the field test data collected at different distances for each antenna direction. There is a slight decrease in accuracy with distance. However, the longest distance of 670670670670 m appears to be too short to be a problem for the system. Unfortunately, this was the longest distance within line-of-sight that could be recorded at this location.

Table 9: Mean balanced accuracy ±plus-or-minus\pm± standard deviation of the VGG11_BN models on the field test data with active transmitters collected at different distances for each antenna direction (0superscript00^{\circ}0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, 90superscript9090^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and 180superscript180180^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT). The upper part shows the accuracies for the classification problem (seven classes) and the lower part the accuracies for the detection problem “Drone” or “Noise”
Classification
Distance (m) 0superscript00^{\circ}0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT 90superscript9090^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT 180superscript180180^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT
110 0.852±0.044plus-or-minus0.8520.0440.852\pm 0.0440.852 ± 0.044 0.838±0.024plus-or-minus0.8380.0240.838\pm 0.0240.838 ± 0.024 0.786±0.044plus-or-minus0.7860.0440.786\pm 0.0440.786 ± 0.044
340 0.815±0.078plus-or-minus0.8150.0780.815\pm 0.0780.815 ± 0.078 0.916±0.017plus-or-minus0.9160.0170.916\pm 0.0170.916 ± 0.017 0.828±0.031plus-or-minus0.8280.0310.828\pm 0.0310.828 ± 0.031
560 0.730±0.063plus-or-minus0.7300.0630.730\pm 0.0630.730 ± 0.063 0.796±0.007plus-or-minus0.7960.0070.796\pm 0.0070.796 ± 0.007 0.764±0.017plus-or-minus0.7640.0170.764\pm 0.0170.764 ± 0.017
670 0.708±0.068plus-or-minus0.7080.0680.708\pm 0.0680.708 ± 0.068 0.805±0.008plus-or-minus0.8050.0080.805\pm 0.0080.805 ± 0.008 0.777±0.007plus-or-minus0.7770.0070.777\pm 0.0070.777 ± 0.007
Detection
Distance (m) 0superscript00^{\circ}0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT 90superscript9090^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT 180superscript180180^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT
110 0.955±0.007plus-or-minus0.9550.0070.955\pm 0.0070.955 ± 0.007 0.851±0.025plus-or-minus0.8510.0250.851\pm 0.0250.851 ± 0.025 0.849±0.018plus-or-minus0.8490.0180.849\pm 0.0180.849 ± 0.018
340 0.986±0.008plus-or-minus0.9860.0080.986\pm 0.0080.986 ± 0.008 0.940±0.010plus-or-minus0.9400.0100.940\pm 0.0100.940 ± 0.010 0.913±0.018plus-or-minus0.9130.0180.913\pm 0.0180.913 ± 0.018
560 0.960±0.011plus-or-minus0.9600.0110.960\pm 0.0110.960 ± 0.011 0.820±0.013plus-or-minus0.8200.0130.820\pm 0.0130.820 ± 0.013 0.807±0.022plus-or-minus0.8070.0220.807\pm 0.0220.807 ± 0.022
670 0.925±0.021plus-or-minus0.9250.0210.925\pm 0.0210.925 ± 0.021 0.826±0.013plus-or-minus0.8260.0130.826\pm 0.0130.826 ± 0.013 0.823±0.014plus-or-minus0.8230.0140.823\pm 0.0140.823 ± 0.014

Figure 9 shows the confusion matrix for the outputs of the VGG11_BN model of a single fold on the field test data. As with the development dataset (cf. Fig. 7), most of the confusion is between noise and drones rather than between different types of drones.

Refer to caption
Figure 9: Confusion matrix of the outputs of the VGG11_BN model on a single fold for the samples from the field test data. The average balanced accuracy is 0.800.800.800.80.

5 DISCUSSION

We were able to show that a standard CNN, trained on drone RF signals recorded in a controlled laboratory environment and artificially augmented with noise, generalised well to the more challenging conditions of a real-world field test.

The drone detection system consisted of rather simple and low budget hardware (consumer grade notebook with GPU + SDR). Recording parameters such as sampling frequency, length of input vectors, etc. were set to enable real-time detection with the limited amount of memory and computing power. This means that data acquisition, pre-processing and model inference did not take longer than the signal being processed (74.9absent74.9\approx 74.9≈ 74.9 ms per sample in our case).

Obviously, the VGG models were able to learn the relevant features for the drone classification from the complex spectrograms of the RF signal. In this respect, we did not find any advantage for the use of more complex models, such as VGG19_BN, over the least complex model, VGG11_BN (cf. Tabs. 6 and LABEL:tab:field_test_acc).

Furthermore, we have seen that the misclassifications mainly occur between the noise class and the drones, and not between the different drones themselves (cf. Figs. 7 and 9). This is particularly relevant for the application of drone detection systems in security sensitive areas. The first priority is to detect any kind of UAV, regardless of its type.

Based on our experience and results, we see the following limitations of our work. The field test showed that the models can be used and work reliably (cf. Tab. LABEL:tab:field_test_acc_type_direction). However, it is the nature of a field test that the level of interference from WiFi/Bluetooth noise and the possible presence of other drones cannot be fully controlled. Furthermore, due to the limited space/distance between the transmitter and receiver in our field test setup, we were not able to clearly demonstrate the effect of free space attenuation on detection performance (cf. Tab. LABEL:tab:field_test_acc_distance).

Regarding the use of simple CNNs as classifiers, it is not possible to reliably predict whether multiple transmitters are present. In that case, an object detection approach on the spectrogams could provide a more fine-grained prediction, see for example the works [30, 31] and [21]. Nevertheless, the current approach will still detect a drone if one or more are present.

We have only tested a limited set of VGG architectures. It remains to be seen whether more recent architectures, such as the pre-trained Vision Transformer [32], generalise as well or better. We hope that our development dataset will inspire others to further optimise the model side of the problem and perhaps find a model architecture with better performance.

Another issue to consider is the occurrence of unknown drones, i.e. drones that are not part of the train set. Examining the embedding space (cf. 4.2) gives a first idea of whether a signal is clearly part of a known dense drone cluster or rather falls into the larger, less dense, noise cluster. We believe that a combination of an unsupervised deep autoencoder approach [33, 34] with an additional classification part (cf. [35]) would allow, first, to provide a stable classification of known samples and, second, to indicate whether a sample is known or rather an anomaly.

References

  • [1] N. Al-lQubaydhi, A. Alenezi, T. Alanazi, A. Senyor, N. Alanezi, B. Alotaibi, M. Alotaibi, A. Razaque, and S. Hariri, “Deep learning for unmanned aerial vehicles detection: A review,” Computer Science Review, vol. 51, p. 100614, 2 2024. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1574013723000813
  • [2] M. H. Rahman, M. A. S. Sejan, M. A. Aziz, R. Tabassum, J.-I. Baik, and H.-K. Song, “A comprehensive survey of unmanned aerial vehicles detection and classification using machine learning approach: Challenges, solutions, and future directions,” Remote Sensing, vol. 16, p. 879, 3 2024. [Online]. Available: https://www.mdpi.com/2072-4292/16/5/879
  • [3] M. S. Allahham, M. F. Al-Sa’d, A. Al-Ali, A. Mohamed, T. Khattab, and A. Erbad, “Dronerf dataset: A dataset of drones for rf-based detection, classification and identification,” Data in Brief, vol. 26, p. 104313, 10 2019. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S2352340919306675
  • [4] M. F. Al-Sa’d, A. Al-Ali, A. Mohamed, T. Khattab, and A. Erbad, “Rf-based drone detection and identification using deep learning approaches: An initiative towards a large open source drone database,” Future Generation Computer Systems, vol. 100, pp. 86–97, 11 2019.
  • [5] C. J. Swinney and J. C. Woods, “Unmanned aerial vehicle flight mode classification using convolutional neural network and transfer learning,” in 2020 16th International Computer Engineering Conference (ICENCO), 2020, pp. 83–87.
  • [6] Y. Zhang, “Rf-based drone detection using machine learning,” in 2021 2nd International Conference on Computing and Data Science (CDS), 2021, pp. 425–428.
  • [7] C. Ge, S. Yang, W. Sun, Y. Luo, and C. Luo, “For rf signal-based uav states recognition, is pre-processing still important at the era of deep learning?” in 2021 7th International Conference on Computer and Communications (ICCC), 2021, pp. 2292–2296.
  • [8] Y. Xue, Y. Chang, Y. Zhang, J. Sun, Z. Ji, H. Li, Y. Peng, and J. Zuo, “Uav signal recognition of heterogeneous integrated knn based on genetic algorithm,” Telecommunication Systems, vol. 85, pp. 591–599, 4 2024. [Online]. Available: https://link.springer.com/10.1007/s11235-023-01099-x
  • [9] A. AlKhonaini, T. Sheltami, A. Mahmoud, and M. Imam, “Uav detection using reinforcement learning,” Sensors, vol. 24, no. 6, 2024. [Online]. Available: https://www.mdpi.com/1424-8220/24/6/1870
  • [10] M. Ezuma, F. Erden, C. K. An**appa, O. Ozdemir, and I. Guvenc, “Detection and classification of uavs using rf fingerprints in the presence of wi-fi and bluetooth interference,” IEEE Open Journal of the Communications Society, vol. 1, pp. 60–76, 2020. [Online]. Available: https://ieeexplore.ieee.org/document/8913640/
  • [11] ——, “Drone remote controller rf signal dataset,” 2020. [Online]. Available: https://dx.doi.org/10.21227/ss99-8d56
  • [12] E. Ozturk, F. Erden, and I. Guvenc, “Rf-based low-snr classification of uavs using convolutional neural networks,” ITU Journal on Future and Evolving Technologies, vol. 2, pp. 39–52, 7 2021. [Online]. Available: https://www.itu.int/pub/S-JNL-VOL2.ISSUE5-2021-A04
  • [13] C. J. Swinney and J. C. Woods, “Dronedetect dataset: A radio frequency dataset of unmanned aerial system (uas) signals for machine learning detection & classification,” 2021. [Online]. Available: https://dx.doi.org/10.21227/5jjj-1m32
  • [14] ——, “Rf detection and classification of unmanned aerial vehicles in environments with wireless interference,” in 2021 International Conference on Unmanned Aircraft Systems (ICUAS), 2021, pp. 1494–1498.
  • [15] S. Kunze and B. Saha, “Drone classification with a convolutional neural network applied to raw iq data,” in 2022 3rd URSI Atlantic and Asia Pacific Radio Science Meeting (AT-AP-RASC), May 2022, pp. 1–4. [Online]. Available: https://ieeexplore.ieee.org/document/9814170/
  • [16] O. Medaiyese, M. Ezuma, A. Lauf, and A. Adeniran, “Cardinal rf (cardrf): An outdoor uav/uas/drone rf signals with bluetooth and wifi signals dataset,” 2022. [Online]. Available: https://dx.doi.org/10.21227/1xp7-ge95
  • [17] O. O. Medaiyese, M. Ezuma, A. P. Lauf, and A. A. Adeniran, “Hierarchical learning framework for uav detection and identification,” IEEE Journal of Radio Frequency Identification, vol. 6, pp. 176–188, 2022.
  • [18] O. O. Medaiyese, M. Ezuma, A. P. Lauf, and I. Guvenc, “Wavelet transform analytics for rf-based uav detection and identification system using machine learning,” Pervasive and Mobile Computing, vol. 82, p. 101569, 6 2022. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1574119222000219
  • [19] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size,” CoRR, vol. abs/1602.07360, 2016. [Online]. Available: http://arxiv.longhoe.net/abs/1602.07360
  • [20] S. Glüge., M. Nyfeler., N. Ramagnano., C. Horn., and C. Schüpbach., “Robust drone detection and classification from radio frequency signals using convolutional neural networks,” in Proceedings of the 15th International Joint Conference on Computational Intelligence - NCTA, INSTICC.   SciTePress, 2023, pp. 496–504.
  • [21] R. Zhao, T. Li, Y. Li, Y. Ruan, and R. Zhang, “Anchor-free multi-uav detection and classification using spectrogram,” IEEE Internet of Things Journal, vol. 11, pp. 5259–5272, 2 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10221859/
  • [22] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 843–852.
  • [23] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors, “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020.
  • [24] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://arxiv.longhoe.net/abs/1409.1556
  • [25] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ser. ICML’15.   JMLR.org, 2015, p. 448–456.
  • [26] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds.   Curran Associates, Inc., 2019, pp. 8024–8035.
  • [27] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://arxiv.longhoe.net/abs/1412.6980
  • [28] L. van der Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008. [Online]. Available: http://jmlr.org/papers/v9/vandermaaten08a.html
  • [29] L. McInnes, J. Healy, N. Saul, and L. Großberger, “Umap: Uniform manifold approximation and projection,” Journal of Open Source Software, vol. 3, no. 29, p. 861, 2018. [Online]. Available: https://doi.org/10.21105/joss.00861
  • [30] K. N. R. Surya Vara Prasad and V. K. Bhargava, “A classification algorithm for blind uav detection in wideband rf systems,” in 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall), 2020, pp. 1–7.
  • [31] S. Basak, S. Rajendran, S. Pollin, and B. Scheers, “Combined rf-based drone detection and classification,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 1, pp. 111–120, 2022.
  • [32] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” ICLR, 2021.
  • [33] S. Lu and R. Li, DAC–Deep Autoencoder-Based Clustering: A General Deep Learning Framework of Representation Learning.   Springer Science and Business Media Deutschland GmbH, 2 2022, vol. 294, pp. 205–216. [Online]. Available: https://link.springer.com/10.1007/978-3-030-82193-7_13
  • [34] H. Zhou, J. Bai, Y. Wang, J. Ren, X. Yang, and L. Jiao, “Deep radio signal clustering with interpretability analysis based on saliency map,” Digital Communications and Networks, 1 2023. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S2352864823000238
  • [35] E. Pintelas, I. E. Livieris, and P. E. Pintelas, “A convolutional autoencoder topology for classification in high-dimensional noisy image datasets,” Sensors, vol. 21, p. 7731, 11 2021. [Online]. Available: https://www.mdpi.com/1424-8220/21/22/7731