Motion-based video compression for resource-constrained camera traps

Malika Nisal Ratnayake Lex Gallon Adel N Toosi Alan Dorin
Department of Data Science and AI
Monash University, Victoria, Australia
{malika.ratnayake, adel.N.Toosi, alan.dorin}@monash.edu, [email protected]

Abstract

Field-captured video allows for detailed studies of spatiotemporal aspects of animal locomotion, decision-making, and environmental interactions. However, despite the affordability of data capture with mass-produced hardware, storage, processing, and transmission overheads pose a significant hurdle to acquiring high-resolution video from field-deployed camera traps. Therefore, efficient compression algorithms are crucial for monitoring with camera traps that have limited access to power, storage, and bandwidth. In this article, we introduce a new motion analysis-based video compression algorithm designed to run on camera trap devices. We implemented and tested this algorithm using a case study of insect-pollinator motion tracking. The algorithm identifies and stores only image regions depicting motion relevant to pollination monitoring, reducing the overall data size by an average of 84% across a diverse set of test datasets while retaining the information necessary for relevant behavioural analysis. The methods outlined in this paper facilitate the broader application of computer vision-enabled, low-powered camera trap devices for remote, in-situ video-based animal motion monitoring.

1 Introduction

Environmental change and other anthropogenic factors are increasingly affecting wildlife, making effective fauna monitoring crucial to help manage or mitigate impacts. Camera traps placed in the wild have emerged as indispensable tools for ecologists to study animal behaviour [3], habitat utilisation [11, 15, 5], and species abundance and distributions [26, 8].

Wildlife monitoring using camera traps presents a unique set of challenges that differs from human surveillance. Natural environments and the animal species that live in them are more complex and diverse than typical built environments and humans. Wildlife monitoring data collected through a camera trap can be either in the form of video or still images. Still image traps capture single or short-burst sequences of a location at predetermined intervals, or when triggered by animal motion [4]. Video camera traps may record continuous, sometimes lengthy, image sequences at high temporal resolution (e.g. $30+$ frames/sec). They too may collect data at preset intervals or when triggered, to allow analysis of dynamic interactions that aren’t fully captured in still images, such as bird courtship displays [12], predator ambush strategies [20], and animal pollination [14, 25, 16]. However, a video camera trap’s rich data comes at a cost: video storage, processing, and transmission from remote devices with limited access to power, storage capacity, and transmission bandwidth, are challenging. Some systems, therefore, run only for a few hours or days at a stretch, avoid onboard-video processing, and require manual data transfer [6]. This inconvenience can reduce the value of a video camera trap. Ideally, it would be capable of remote autonomous operation with infrequent service visits, and all data would be streamed conveniently to wildlife management and research offices. It is this need for increased autonomy of video camera traps that drives our research. We address the challenge of limited storage and transmission bandwidth by reducing video camera data file size with a new compression algorithm tailored specifically for wildlife monitoring. We use insect pollinator monitoring as our case study to demonstrate algorithm performance.

Monitoring insect pollinators is valuable to support them in sustaining natural ecosystems [9] and global food production [10]. However, observing insects outdoors is difficult due to the complexities of their environment and the varied appearance and behaviour of the insects themselves [18]. Video capture is valuable in such studies to provide high spatio-temporal resolution movement data enabling fast-moving insect interaction monitoring.

Current video-based insect camera traps employ continuous recording [6, 25, 23], time-lapse [17], or motion-triggers [29, 17]. Continuous video recordings require high storage capacity and transmission bandwidth. Time-lapse videos require less storage and bandwidth, but are prone to missing insect activity detail during non-recording periods. Motion-triggering aims to reduce storage and transmission requirements by recording video only when insects in view. However, widely-used hardware triggers like PIR sensors, have proven ineffective for detecting small insects [19, 17]. Software-based motion triggers have also been implemented in insect monitoring, utilising techniques such as foreground-background segmentation [29] and deep learning [28]. Foreground-background segmentation-based approaches are susceptible to false positive detections caused by wind moving foliage or illumination changes. In comparison, deep learning triggers can minimise false positives by recording videos only when an insect is present in the camera frame. However, to be effective these models usually require substantial computational resources, specialised hardware, and detection models trained on a wide variety of species. This can limit their value for autonomous applications and makes them prone to mis-detections.

In this paper, we introduce a novel, effective approach that substantially reduces video size without compromising insect data. Our method processes camera data frame-wise and pixel-wise to identify image regions with motion, storing only the information relevant for animal monitoring and reconstruction of motion paths and animal-habitat (in our case, flower) interaction. The algorithm architecture is designed to accommodate limited power, storage, and bandwidth resources of camera traps installed in remote locations while improving processing throughput. We conducted a case study applying our algorithm to a set of four datasets representing diverse application environments. We demonstrate the practicality of our method for animal monitoring by extracting insect behavioural data from compressed videos.

2 Methods

In this section we describe the algorithm architecture (Figure 1) and its multi-threading approach designed to improve data throughput. The software is published as open-access on GitHub as EcoMotionZip¹¹1https://github.com/malikaratnayake/EcoMotionZip.

Refer to caption — Figure 1: Overview of the algorithm architecture. The proposed algorithm has three main components: (1) Reader, (2) Motion Analysis, and (3) Writer, implemented with three separate threads. Footage from van der Voort et al. [29] illustrates the algorithm.

2.1 Reader component

The Reader captures video frames from a camera stream or pre-recorded video file and adds them to a queue for processing. Once all frames from the input stream have been passed to the Motion Analysis component for processing, the Reader terminates.

2.2 Motion Analysis component

The algorithm’s Motion Analysis component processes frames captured by the Reader thread to identify regions with motion. This process is designed to extract only information from video frames critical for animal behaviour analysis. It discards the remaining information to save storage space. Maintained data includes information for algorithms (and human viewers) to gauge (1) animal type / species, (2) movement paths / gaits, (3) observation time, and (4) an overview of the environment within the camera view.

Motion analysis begins by down-scaling each captured frame and converting it to greyscale to reduce computational load and improve processing efficiency [22, 1]. Subsequently, inter-frame changes are detected by calculating the absolute intensity difference between pixels in adjacent frames. Pixel regions displaying an absolute intensity difference greater than a user-set threshold are preserved along with a surrounding buffer region. Users have the ability to adjust the sensitivity of the motion capture by modifying this threshold value to suit the application environment and target species. This allows the algorithm to maintain detailed information about the animal and its immediate surroundings, facilitating subsequent behavioural analysis. The size of the buffer region can be customised by the user for the application. Subsequently, the frame containing motion regions is converted into a binary image, where pixels of non-zero intensity are retained while the rest are set to zero. Frames with no regions of detected motion are discarded completely. Frames with regions of detected motion undergo upscaling of the binary image to the original frame size and a bitwise product is generated between the upscaled binary image and the original frame. The resultant frame is then passed to the Writer component for storage. The Motion Analysis component terminates once all frames from the Reader have been processed. In addition, the algorithm records full frames at user-specified intervals (regardless of whether or not motion was detected in that frame) to capture a scene overview and changes in the environment that occur gradually during the recording period.

Alongside processed motion frames sent to the Writer, the Motion Analysis component transmits the frame numbers of the input and output videos, and whether or not a full image frame has been saved. This information is stored in a CSV file, for later reconstruction of animal motion and behaviour.

2.3 Writer component

The Writer component receives processed frames from the Motion Analysis component and re-assembles them into a video file. The Writer extracts the frame rate and resolution of the output video from the input video file. Additionally, the Writer stores a CSV file alongside the video file containing the supplementary data sent by the Motion Analysis component.

2.4 Test datasets

To assess our algorithm we used four public real-world datasets. These comprise multiple videos encompassing a range of typical insect monitoring scenarios, application contexts, scene complexities, and recording modes (Table 1 and Figure 2).

Table 1: Test Video Dataset Information. “Application Environment” describes the application environment. “Recording Method” presents the method of video capture: MT = Motion Triggered, TL = Time Lapse, Cont. = Continuous. “Camera” describes the camera model used for recording videos. “No. Videos” shows the number of videos in each dataset and “Video Codec” shows the codec used to record the videos. “Video Resol.” presents the video resolution of recorded videos. “FPS” shows the recorded frame rate in frames per second.

{tblr}

width = colspec = Q[190]Q[164]Q[130]Q[209]Q[77]Q[73]Q[113]Q[50], cells = c, hline1-2,6 = -, Dataset & Application
Environment Recording
Method Camera No.
Videos Video
Codec Video
Resol. FPS
Naqvi et al. [17] Urban garden MT / TL Bushnell NatureView HD 3 H264 1920, 1080 30
Ratnayake et al. [24] Rural farm Cont. Raspberry Pi V2 10 DIVX 1920, 1080 30
van der Voort et al. [29] Controlled env. MT Raspberry Pi V2 1 H264 1920, 1080 24
Droissart et al. [6] Multiple env. Cont. Raspberry Pi V2 3 H264 1296, 972 -

3 Results

3.1 Video compression

We evaluated video compression performance by comparing file sizes and frame counts (Table 2). To assess the effectiveness of our algorithm and ensure data needed for behavioural analysis is preserved, we maintained the original video resolution, frame rate and video codec. Test data was processed using a Raspberry Pi 5 (8 GB) single-board computer common in insect camera traps [13, 6, 29].

Table 2: Video compression results. “No. Frames” and “File Size (MB)” data for raw and processed videos show dataset total frame counts and file sizes. Reported file sizes for processed videos are the totals of compressed video and CSV files storing video supporting data. “Frame Reduc. (%)” and “File Size Reduc. (%)” show reduction in total frame count and test video file size.

{tblr}

width = colspec = Q[219]Q[113]Q[134]Q[113]Q[134]Q[107]Q[107], cells = c, cell11 = r=2, cell12 = c=20.247, cell14 = c=20.247, cell16 = r=2, cell17 = r=2, hline1,3,7 = -, hline2 = 2-5, Dataset & Raw Videos Processed Videos Frame
Reduc. (%) File Size
Reduc. (%)
No. Frames File Size (MB) No. Frames File Size (MB)
Naqvi et al. [17] 5445 327 4147 70 23.84 78.51
Ratnayake et al. [24] 179912 10895 12093 260 93.28 97.61
van der Voort et al. [29] 790 24 772 2 2.28 89.61
Droissart et al. [6] 5471 73 3843 21 29.76 71.49

3.2 Information retention

We evaluated our algorithm’s ability to preserve relevant information for animal behaviour analysis for our case study by comparing the number of insect appearances detected in raw and compressed videos using both manual and automated techniques. We used the dataset Ratnayake et al. [24] for this experiment as it has the highest video compression and hence, plausibly, the highest likelihood of information loss. This dataset contains video of four insect types: honeybees, Syrphid flies, Lepidopterans, and Vespids. We followed the procedure in [25] to manually record insect events in this dataset. Results are shown in Table 3 and discussed in Section 4.

To evaluate the suitability of the compressed videos for automated insect tracking, we used Polytrack [21, 25] to extract insect trajectories and flower positions from the compressed videos. For this experiment, we utilised the pre-trained YOLOv8 object detection model with default software configurations [21, 25]. The results are plotted as insect trajectories in Figure 3. An example video showing insect trajectories extracted by the Polytrack software is included in the supplementary materials.

Table 3: Insect count validation. Comparison of manual insect counts in raw / compressed videos. “Raw Video” presents observations from raw videos of [24, 25]. “Comp. video” presents observations from our compressed videos. “Insects Missed” counts insect appearances unobserved in compressed videos, but counted from raw video observations. “New Insects Observed” shows insect appearances observed in compressed videos unrecorded in observations made on the raw video dataset. Counts related to “New Insects Observed” are included in “Comp. video”.

Insect Type

Insect Counts

Raw

video

Comp.

video

Insects

Missed

New Insects

Observed

Honeybee

Syrphidae

Lepidoptera

Vespidae

4 Discussion

Our compression algorithm analyses pixel-wise motion within video frames to remove frames and individual pixels devoid of animal motion information while retaining critical data. It achieved an average compression of 84% for insect monitoring videos in diverse environments (Table 2), while preserving key data for behaviour analysis (Table 3, Figure 3). The significant file size reduction translates to reduced storage and bandwidth needs beneficial for resource-constrained, remote edge-camera traps.

The proposed algorithm consistently achieved percentage file size reductions exceeding frame reduction percentage across all datasets, especially in videos of environments with few background changes (e.g Naqvi et al. [17], van der Voort et al. [29], Droissart et al. [6], Table 2). In these cases, our pixel-wise motion analysis selectively eliminated data from non-moving pixels but retained data from pixels with motion for analysis. This demonstrates the adaptability of our approach for different environmental conditions, abundances of animals in a frame, or recording trigger type.

In our experiments, the algorithm retained all data necessary for insect abundance estimations (Table 3). Also, manual observations of compressed videos revealed more fast-moving Vespids, relatively small Syrphids, and Lepidoptera, compared to raw video observations. This is probably due to the fact that our compressed videos simplify the task by focusing attention on key parts of the image to reduce user fatigue [27, 30, 7]. This added focus can potentially reduce the cost and improve the accuracy of ecological video analysis [2].

We validated the compressed videos’ value for automated tracking and analysis with existing insect tracking software to extract insect trajectories (Figure 3). Notably, the Polytrack software [21] was not optimised for processing compressed videos, reducing its performance. Future work to tailor software for processing compressed video would certainly be valuable.

We used foreground changes as motion triggers without assessing events or objects causing the motion on the device. This records animal appearances with few false negatives (Table 3) and therefore preserves essential data. But compressed videos contained false positives caused by wind and illumination changes that add unnecessarily to the compressed video file-size in dynamic environments. Future methods to reduce false positives would improve the compression performance, possibly at the expense of onboard processing and resource costs. We note that whether or not this matters depends on the cost/benefit analysis of a particular researcher working with specific animals in specific environmental conditions. It is certainly worthy of future research if the file compression ratios we achieved with our method were found to be insufficient for a particular application.

5 Conclusions

This paper presented an algorithm to compress videos captured by resource-constrained camera traps that employs motion analysis to remove unwanted pixels and image frames. By analysing video frames pixel-by-pixel, our algorithm achieved an average compression of 84% on the test data while preserving all information crucial for animal behaviour analysis in our insect pollinator monitoring case study. This substantial file-size reduction significantly enhances the monitoring capabilities of remote camera traps by increasing monitoring duration, minimising storage requirements and bandwidth demands, and facilitating efficient data transfer from these resource-limited devices. Deploying this algorithm on camera traps has the potential to significantly advance ecological monitoring and conservation efforts by optimising the capabilities of existing systems.

References

Bjerge et al. [2023] Kim Bjerge, Carsten Eie Frigaard, and Henrik Karstoft. Object detection of small insects in time-lapse camera recordings. Sensors, 23(16):7242, 2023.
Breeze et al. [2021] Tom D Breeze, Alison P Bailey, Kelvin G Balcombe, Tom Brereton, Richard Comont, Mike Edwards, Michael P Garratt, Martin Harvey, Cathy Hawes, Nick Isaac, et al. Pollinator monitoring more than pays for itself. Journal of Applied Ecology, 58(1):44–57, 2021.
Caravaggi et al. [2017] Anthony Caravaggi, Peter B Banks, A Cole Burton, Caroline MV Finlay, Peter M Haswell, Matt W Hayward, Marcus J Rowcliffe, and Mike D Wood. A review of camera trap** for conservation behaviour research. Remote Sensing in Ecology and Conservation, 3(3):109–122, 2017.
Collett and Fisher [2017] Rachael A Collett and Diana O Fisher. Time-lapse camera trap** as an alternative to pitfall trap** for estimating activity of leaf litter arthropods. Ecology and Evolution, 7(18):7527–7533, 2017.
Dharmarathne et al. [2022] Sanjaya Chathuranga Dharmarathne, EGDP Jayasekara, Dharshani Mahaulpatha, and Kusal de Silva. Camera trap data reveals the habitat use and activity patterns of a secretive forest bird, sri lanka spurfowl galloperdix bicalcarata. Journal of Wildlife and Biodiversity, 6(Suppl. 1):100–118, 2022.
Droissart et al. [2021] Vincent Droissart, Laura Azandi, Eric Rostand Onguene, Marie Savignac, Thomas B Smith, and Vincent Deblauwe. Pict: A low-cost, modular, open-source camera trap system to study plant–insect interactions. Methods in Ecology and Evolution, 12(8):1389–1396, 2021.
Faber et al. [2012] Léon G Faber, Natasha M Maurits, and Monicque M Lorist. Mental fatigue affects visual selective attention. PloS one, 7(10):e48073, 2012.
Feng et al. [2021] Jiawei Feng, Yifei Sun, Hailong Li, Yuqi ** Ge, and Tianming Wang. Assessing mammal species richness and occupancy in a northeast asian temperate forest shared by cattle. Diversity and Distributions, 27(5):857–872, 2021.
Food & Agriculture Organization [2019] Food & Agriculture Organization. Global action on pollination services for sustainable agriculture. 2019.
Gazzea et al. [2023] Elena Gazzea, Péter Batáry, and Lorenzo Marini. Global meta-analysis shows reduced quality of food crops under inadequate animal pollination. Nature communications, 14(1):4463, 2023.
Head et al. [2012] Josephine S Head, Martha M Robbins, Roger Mundry, Loïc Makaga, and Christophe Boesch. Remote video-camera traps measure habitat use and competitive exclusion among sympatric chimpanzee, gorilla and elephant in loango national park, gabon. Journal of Tropical Ecology, 28(6):571–583, 2012.
Janisch et al. [2021] Judith Janisch, Clementine Mitoyen, Elisa Perinot, Giovanni Spezie, Leonida Fusani, and Cliodhna Quigley. Video recording and analysis of avian movements and behavior: insights from courtship case studies. Integrative and Comparative Biology, 61(4):1378–1393, 2021.
Jolles [2021] Jolle W Jolles. Broad-scale applications of the raspberry pi: A review and guide for biologists. Methods in Ecology and Evolution, 12(9):1562–1579, 2021.
Krauss et al. [2018] Siegfried L Krauss, David G Roberts, Ryan D Phillips, and Caroline Edwards. Effectiveness of camera traps for quantifying daytime and nighttime visitation by vertebrate pollinators. Ecology and Evolution, 8(18):9304–9314, 2018.
Lovell et al. [2022] Connor Lovell, Shiya Li, Jessica Turner, and Chris Carbone. The effect of habitat and human disturbance on the spatiotemporal activity of two urban carnivores: The results of an intensive camera trap study. Ecology and evolution, 12(3):e8746, 2022.
Melidonis and Peter [2015] Caitlin A Melidonis and Craig I Peter. Diurnal pollination, primarily by a single species of rodent, documented in protea foliosa using modified camera traps. South African Journal of Botany, 97:9–15, 2015.
Naqvi et al. [2022] Qaim Naqvi, Patrick J Wolff, Brenda Molano-Flores, and **elle H Sperry. Camera traps are an effective tool for monitoring insect–plant interactions. Ecology and Evolution, 12(6):e8962, 2022.
Nykänen et al. [2023] Milaja Nykänen, Hannu Pöysä, Juho Matala, and Mervi Kunnasranta. Motion detection or time lapse? a comparison of camera trap triggers in the monitoring of elusive ground dwelling birds. 2023.
Ortmann and Johnson [2021] CR Ortmann and SD Johnson. How reliable are motion-triggered camera traps for detecting small mammals and birds in ecological studies? Journal of Zoology, 313(3):202–207, 2021.
Rampim et al. [2020] Lilian E Rampim, Leonardo R Sartorello, Carlos E Fragoso, Mario Haberfeld, and Allison L Devlin. Antagonistic interactions between predator and prey: mobbing of jaguars (panthera onca) by white-lipped peccaries (tayassu pecari). acta ethologica, 23(1):45–48, 2020.
[21] Malika Nisal Ratnayake. Polytrack. GitHub.
Ratnayake et al. [2021a] Malika Nisal Ratnayake, Adrian G Dyer, and Alan Dorin. Towards computer vision and deep learning facilitated pollination monitoring for agriculture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2921–2930, 2021a.
Ratnayake et al. [2021b] Malika Nisal Ratnayake, Adrian G Dyer, and Alan Dorin. Tracking individual honeybees among wildflower clusters with computer vision-facilitated pollinator monitoring. Plos one, 16(2):e0239504, 2021b.
Ratnayake et al. [2022] Malika Nisal Ratnayake, Don Chathurika Amarathunga, Asaduz Zaman, A G Dyer, and Alan Dorin. Spatial Monitoring and Insect Behavioural Analysis Dataset, 2022.
Ratnayake et al. [2023] Malika Nisal Ratnayake, Don Chathurika Amarathunga, Asaduz Zaman, Adrian G Dyer, and Alan Dorin. Spatial monitoring and insect behavioural analysis using computer vision for precision pollination. International Journal of Computer Vision, 131(3):591–606, 2023.
Reece et al. [2021] Sally J Reece, Frans GT Radloff, Alison J Leslie, Rajan Amin, and Craig J Tambling. A camera trap appraisal of species richness and community composition of medium and large mammals in a miombo woodland reserve. African Journal of Ecology, 59(4):898–911, 2021.
Simons and Chabris [1999] Daniel J Simons and Christopher F Chabris. Gorillas in our midst: Sustained inattentional blindness for dynamic events. perception, 28(9):1059–1074, 1999.
Sittinger et al. [2024] Maximilian Sittinger, Johannes Uhler, Maximilian Pink, and Annette Herz. Insect detect: An open-source diy camera trap for automated insect monitoring. Plos one, 19(4):e0295474, 2024.
van der Voort et al. [2022] Genevieve E van der Voort, Scott R Gilmore, Jamieson C Gorrell, and Jasmine K Janes. Continuous video capture, and pollinia tracking, in platanthera (orchidaceae) reveal new insect visitors and potential pollinators. PeerJ, 10:e13191, 2022.
Zett et al. [2022] Theresa Zett, Ken J Stratford, and Florian J Weise. Inter-observer variance and agreement of wildlife information extracted from camera trap images. Biodiversity and Conservation, 31(12):3019–3037, 2022.