-
Deep Learning-based Epicenter Localization using Single-Station Strong Motion Records
Authors:
Melek Türkmen,
Sanem Meral,
Baris Yilmaz,
Melis Cikis,
Erdem Akagündüz,
Salih Tileylioglu
Abstract:
This paper explores the application of deep learning (DL) techniques to strong motion records for single-station epicenter localization. Often underutilized in seismology-related studies, strong motion records offer a potential wealth of information about seismic events. We investigate whether DL-based methods can effectively leverage this data for accurate epicenter localization. Our study introd…
▽ More
This paper explores the application of deep learning (DL) techniques to strong motion records for single-station epicenter localization. Often underutilized in seismology-related studies, strong motion records offer a potential wealth of information about seismic events. We investigate whether DL-based methods can effectively leverage this data for accurate epicenter localization. Our study introduces AFAD-1218, a collection comprising more than 36,000 strong motion records sourced from Turkey. To utilize the strong motion records represented in either the time or the frequency domain, we propose two neural network architectures: deep residual network and temporal convolutional networks. Through extensive experimentation, we demonstrate the efficacy of DL approaches in extracting meaningful insights from these records, showcasing their potential for enhancing seismic event analysis and localization accuracy. Notably, our findings highlight significant reductions in prediction error achieved through the exclusion of low signal-to-noise ratio records, both in nationwide experiments and regional transfer-learning scenarios. Overall, this research underscores the promise of DL techniques in harnessing strong motion records for improved seismic event characterization and localization.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Near-Infrared and Low-Rank Adaptation of Vision Transformers in Remote Sensing
Authors:
Irem Ulku,
O. Ozgur Tanriover,
Erdem Akagündüz
Abstract:
Plant health can be monitored dynamically using multispectral sensors that measure Near-Infrared reflectance (NIR). Despite this potential, obtaining and annotating high-resolution NIR images poses a significant challenge for training deep neural networks. Typically, large networks pre-trained on the RGB domain are utilized to fine-tune infrared images. This practice introduces a domain shift issu…
▽ More
Plant health can be monitored dynamically using multispectral sensors that measure Near-Infrared reflectance (NIR). Despite this potential, obtaining and annotating high-resolution NIR images poses a significant challenge for training deep neural networks. Typically, large networks pre-trained on the RGB domain are utilized to fine-tune infrared images. This practice introduces a domain shift issue because of the differing visual traits between RGB and NIR images.As an alternative to fine-tuning, a method called low-rank adaptation (LoRA) enables more efficient training by optimizing rank-decomposition matrices while kee** the original network weights frozen. However, existing parameter-efficient adaptation strategies for remote sensing images focus on RGB images and overlook domain shift issues in the NIR domain. Therefore, this study investigates the potential benefits of using vision transformer (ViT) backbones pre-trained in the RGB domain, with low-rank adaptation for downstream tasks in the NIR domain. Extensive experiments demonstrate that employing LoRA with pre-trained ViT backbones yields the best performance for downstream tasks applied to NIR images.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
How to Augment for Atmospheric Turbulence Effects on Thermal Adapted Object Detection Models?
Authors:
Engin Uzun,
Erdem Akagunduz
Abstract:
Atmospheric turbulence poses a significant challenge to the performance of object detection models. Turbulence causes distortions, blurring, and noise in images by bending and scattering light rays due to variations in the refractive index of air. This results in non-rigid geometric distortions and temporal fluctuations in the electromagnetic radiation received by optical systems. This paper explo…
▽ More
Atmospheric turbulence poses a significant challenge to the performance of object detection models. Turbulence causes distortions, blurring, and noise in images by bending and scattering light rays due to variations in the refractive index of air. This results in non-rigid geometric distortions and temporal fluctuations in the electromagnetic radiation received by optical systems. This paper explores the effectiveness of turbulence image augmentation techniques in improving the accuracy and robustness of thermal-adapted and deep learning-based object detection models under atmospheric turbulence. Three distinct approximation-based turbulence simulators (geometric, Zernike-based, and P2S) are employed to generate turbulent training and test datasets. The performance of three state-of-the-art deep learning-based object detection models: RTMDet-x, DINO-4scale, and YOLOv8-x, is employed on these turbulent datasets with and without turbulence augmentation during training. The results demonstrate that utilizing turbulence-specific augmentations during model training can significantly improve detection accuracy and robustness against distorted turbulent images. Turbulence augmentation enhances performance even for a non-turbulent test set.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Authors:
Övgü Özdemir,
Erdem Akagündüz
Abstract:
Visual question answering (VQA) is known as an AI-complete task as it requires understanding, reasoning, and inferring about the vision and the language content. Over the past few years, numerous neural architectures have been suggested for the VQA problem. However, achieving success in zero-shot VQA remains a challenge due to its requirement for advanced generalization and reasoning skills. This…
▽ More
Visual question answering (VQA) is known as an AI-complete task as it requires understanding, reasoning, and inferring about the vision and the language content. Over the past few years, numerous neural architectures have been suggested for the VQA problem. However, achieving success in zero-shot VQA remains a challenge due to its requirement for advanced generalization and reasoning skills. This study explores the impact of incorporating image captioning as an intermediary process within the VQA pipeline. Specifically, we explore the efficacy of utilizing image captions instead of images and leveraging large language models (LLMs) to establish a zero-shot setting. Since image captioning is the most crucial step in this process, we compare the impact of state-of-the-art image captioning models on VQA performance across various question types in terms of structure and semantics. We propose a straightforward and efficient question-driven image captioning approach within this pipeline to transfer contextual information into the question-answering (QA) model. This method involves extracting keywords from the question, generating a caption for each image-question pair using the keywords, and incorporating the question-driven caption into the LLM prompt. We evaluate the efficacy of using general-purpose and question-driven image captions in the VQA pipeline. Our study highlights the potential of employing image captions and harnessing the capabilities of LLMs to achieve competitive performance on GQA under the zero-shot setting. Our code is available at \url{https://github.com/ovguyo/captions-in-VQA}.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Exploring Challenges in Deep Learning of Single-Station Ground Motion Records
Authors:
Ümit Mert Çağlar,
Baris Yilmaz,
Melek Türkmen,
Erdem Akagündüz,
Salih Tileylioglu
Abstract:
Contemporary deep learning models have demonstrated promising results across various applications within seismology and earthquake engineering. These models rely primarily on utilizing ground motion records for tasks such as earthquake event classification, localization, earthquake early warning systems, and structural health monitoring. However, the extent to which these models effectively learn…
▽ More
Contemporary deep learning models have demonstrated promising results across various applications within seismology and earthquake engineering. These models rely primarily on utilizing ground motion records for tasks such as earthquake event classification, localization, earthquake early warning systems, and structural health monitoring. However, the extent to which these models effectively learn from these complex time-series signals has not been thoroughly analyzed. In this study, our objective is to evaluate the degree to which auxiliary information, such as seismic phase arrival times or seismic station distribution within a network, dominates the process of deep learning from ground motion records, potentially hindering its effectiveness. We perform a hyperparameter search on two deep learning models to assess their effectiveness in deep learning from ground motion records while also examining the impact of auxiliary information on model performance. Experimental results reveal a strong reliance on the highly correlated P and S phase arrival information. Our observations highlight a potential gap in the field, indicating an absence of robust methodologies for deep learning of single-station ground motion recordings independent of any auxiliary information.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
FuseFormer: A Transformer for Visual and Thermal Image Fusion
Authors:
Aytekin Erdogan,
Erdem Akagündüz
Abstract:
Due to the lack of a definitive ground truth for the image fusion problem, the loss functions are structured based on evaluation metrics, such as the structural similarity index measure (SSIM). However, in doing so, a bias is introduced toward the SSIM and, consequently, the input visual band image. The objective of this study is to propose a novel methodology for the image fusion problem that mit…
▽ More
Due to the lack of a definitive ground truth for the image fusion problem, the loss functions are structured based on evaluation metrics, such as the structural similarity index measure (SSIM). However, in doing so, a bias is introduced toward the SSIM and, consequently, the input visual band image. The objective of this study is to propose a novel methodology for the image fusion problem that mitigates the limitations associated with using classical evaluation metrics as loss functions. Our approach integrates a transformer-based multi-scale fusion strategy that adeptly addresses local and global context information. This integration not only refines the individual components of the image fusion process but also significantly enhances the overall efficacy of the method. Our proposed method follows a two-stage training approach, where an auto-encoder is initially trained to extract deep features at multiple scales in the first stage. For the second stage, we integrate our fusion block and change the loss function as mentioned. The multi-scale features are fused using a combination of Convolutional Neural Networks (CNNs) and Transformers. The CNNs are utilized to capture local features, while the Transformer handles the integration of general context features. Through extensive experiments on various benchmark datasets, our proposed method, along with the novel loss function definition, demonstrates superior performance compared to other competitive fusion algorithms.
△ Less
Submitted 24 April, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Local Masking Meets Progressive Freezing: Crafting Efficient Vision Transformers for Self-Supervised Learning
Authors:
Utku Mert Topcuoglu,
Erdem Akagündüz
Abstract:
In this paper, we present an innovative approach to self-supervised learning for Vision Transformers (ViTs), integrating local masked image modeling with progressive layer freezing. This method focuses on enhancing the efficiency and speed of initial layer training in ViTs. By systematically freezing specific layers at strategic points during training, we reduce computational demands while maintai…
▽ More
In this paper, we present an innovative approach to self-supervised learning for Vision Transformers (ViTs), integrating local masked image modeling with progressive layer freezing. This method focuses on enhancing the efficiency and speed of initial layer training in ViTs. By systematically freezing specific layers at strategic points during training, we reduce computational demands while maintaining or improving learning capabilities. Our approach employs a novel multi-scale reconstruction process that fosters efficient learning in initial layers and enhances semantic comprehension across scales. The results demonstrate a substantial reduction in training time (~12.5\%) with a minimal impact on model accuracy (decrease in top-1 accuracy by 0.6\%). Our method achieves top-1 and top-5 accuracies of 82.6\% and 96.2\%, respectively, underscoring its potential in scenarios where computational resources and time are critical. This work marks an advancement in the field of self-supervised learning for computer vision. The implementation of our approach is available at our project's GitHub repository: github.com/utkutpcgl/ViTFreeze.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
EANet: Enhanced Attribute-based RGBT Tracker Network
Authors:
Abbas Türkoğlu,
Erdem Akagündüz
Abstract:
Tracking objects can be a difficult task in computer vision, especially when faced with challenges such as occlusion, changes in lighting, and motion blur. Recent advances in deep learning have shown promise in challenging these conditions. However, most deep learning-based object trackers only use visible band (RGB) images. Thermal infrared electromagnetic waves (TIR) can provide additional infor…
▽ More
Tracking objects can be a difficult task in computer vision, especially when faced with challenges such as occlusion, changes in lighting, and motion blur. Recent advances in deep learning have shown promise in challenging these conditions. However, most deep learning-based object trackers only use visible band (RGB) images. Thermal infrared electromagnetic waves (TIR) can provide additional information about an object, including its temperature, when faced with challenging conditions. We propose a deep learning-based image tracking approach that fuses RGB and thermal images (RGBT). The proposed model consists of two main components: a feature extractor and a tracker. The feature extractor encodes deep features from both the RGB and the TIR images. The tracker then uses these features to track the object using an enhanced attribute-based architecture. We propose a fusion of attribute-specific feature selection with an aggregation module. The proposed methods are evaluated on the RGBT234 \cite{LiCLiang2018} and LasHeR \cite{LiLasher2021} datasets, which are the most widely used RGBT object-tracking datasets in the literature. The results show that the proposed system outperforms state-of-the-art RGBT object trackers on these datasets, with a relatively smaller number of parameters.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Detecting Driver Drowsiness as an Anomaly Using LSTM Autoencoders
Authors:
Gülin Tüfekci,
Alper Kayabaşi,
Erdem Akagündüz,
İlkay Ulusoy
Abstract:
In this paper, an LSTM autoencoder-based architecture is utilized for drowsiness detection with ResNet-34 as feature extractor. The problem is considered as anomaly detection for a single subject; therefore, only the normal driving representations are learned and it is expected that drowsiness representations, yielding higher reconstruction losses, are to be distinguished according to the knowledg…
▽ More
In this paper, an LSTM autoencoder-based architecture is utilized for drowsiness detection with ResNet-34 as feature extractor. The problem is considered as anomaly detection for a single subject; therefore, only the normal driving representations are learned and it is expected that drowsiness representations, yielding higher reconstruction losses, are to be distinguished according to the knowledge of the network. In our study, the confidence levels of normal and anomaly clips are investigated through the methodology of label assignment such that training performance of LSTM autoencoder and interpretation of anomalies encountered during testing are analyzed under varying confidence rates. Our method is experimented on NTHU-DDD and benchmarked with a state-of-the-art anomaly detection method for driver drowsiness. Results show that the proposed model achieves detection rate of 0.8740 area under curve (AUC) and is able to provide significant improvements on certain scenarios.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Sequence Models for Drone vs Bird Classification
Authors:
Fatih Cagatay Akyon,
Erdem Akagunduz,
Sinan Onur Altinuc,
Alptekin Temizel
Abstract:
Drone detection has become an essential task in object detection as drone costs have decreased and drone technology has improved. It is, however, difficult to detect distant drones when there is weak contrast, long range, and low visibility. In this work, we propose several sequence classification architectures to reduce the detected false-positive ratio of drone tracks. Moreover, we propose a new…
▽ More
Drone detection has become an essential task in object detection as drone costs have decreased and drone technology has improved. It is, however, difficult to detect distant drones when there is weak contrast, long range, and low visibility. In this work, we propose several sequence classification architectures to reduce the detected false-positive ratio of drone tracks. Moreover, we propose a new drone vs. bird sequence classification dataset to train and evaluate the proposed architectures. 3D CNN, LSTM, and Transformer based sequence classification architectures have been trained on the proposed dataset to show the effectiveness of the proposed idea. As experiments show, using sequence information, bird classification and overall F1 scores can be increased by up to 73% and 35%, respectively. Among all sequence classification models, R(2+1)D-based fully convolutional model yields the best transfer learning and fine-tuning results.
△ Less
Submitted 19 December, 2022; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Augmentation of Atmospheric Turbulence Effects on Thermal Adapted Object Detection Models
Authors:
Engin Uzun,
Ahmet Anil Dursun,
Erdem Akagunduz
Abstract:
Atmospheric turbulence has a degrading effect on the image quality of long-range observation systems. As a result of various elements such as temperature, wind velocity, humidity, etc., turbulence is characterized by random fluctuations in the refractive index of the atmosphere. It is a phenomenon that may occur in various imaging spectra such as the visible or the infrared bands. In this paper, w…
▽ More
Atmospheric turbulence has a degrading effect on the image quality of long-range observation systems. As a result of various elements such as temperature, wind velocity, humidity, etc., turbulence is characterized by random fluctuations in the refractive index of the atmosphere. It is a phenomenon that may occur in various imaging spectra such as the visible or the infrared bands. In this paper, we analyze the effects of atmospheric turbulence on object detection performance in thermal imagery. We use a geometric turbulence model to simulate turbulence effects on a medium-scale thermal image set, namely "FLIR ADAS v2". We apply thermal domain adaptation to state-of-the-art object detectors and propose a data augmentation strategy to increase the performance of object detectors which utilizes turbulent images in different severity levels as training data. Our results show that the proposed data augmentation strategy yields an increase in performance for both turbulent and non-turbulent thermal test images.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
A Survey on Infrared Image and Video Sets
Authors:
Kevser Irem Danaci,
Erdem Akagunduz
Abstract:
In this survey, we compile a list of publicly available infrared image and video sets for artificial intelligence and computer vision researchers. We mainly focus on IR image and video sets which are collected and labelled for computer vision applications such as object detection, object segmentation, classification, and motion detection. We categorize 92 different publicly available or private se…
▽ More
In this survey, we compile a list of publicly available infrared image and video sets for artificial intelligence and computer vision researchers. We mainly focus on IR image and video sets which are collected and labelled for computer vision applications such as object detection, object segmentation, classification, and motion detection. We categorize 92 different publicly available or private sets according to their sensor types, image resolution, and scale. We describe each and every set in detail regarding their collection purpose, operation environment, optical system properties, and area of application. We also cover a general overview of fundamental concepts that relate to IR imagery, such as IR radiation, IR detectors, IR optics and application fields. We analyse the statistical significance of the entire corpus from different perspectives. We believe that this survey will be a guideline for computer vision and artificial intelligence researchers that are interested in working with the spectra beyond the visible domain.
△ Less
Submitted 16 January, 2023; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Dynamical System Parameter Identification using Deep Recurrent Cell Networks
Authors:
Erdem Akagündüz,
Oguzhan Cifdaloz
Abstract:
In this paper, we investigate the parameter identification problem in dynamical systems through a deep learning approach. Focusing mainly on second-order, linear time-invariant dynamical systems, the topic of dam** factor identification is studied. By utilizing a six-layer deep neural network with different recurrent cells, namely GRUs, LSTMs or BiLSTMs; and by feeding input-output sequence pair…
▽ More
In this paper, we investigate the parameter identification problem in dynamical systems through a deep learning approach. Focusing mainly on second-order, linear time-invariant dynamical systems, the topic of dam** factor identification is studied. By utilizing a six-layer deep neural network with different recurrent cells, namely GRUs, LSTMs or BiLSTMs; and by feeding input-output sequence pairs captured from a dynamical system simulator, we search for an effective deep recurrent architecture in order to resolve dam** factor identification problem. Our study results show that, although previously not utilized for this task in the literature, bidirectional gated recurrent cells (BiLSTMs) provide better parameter identification results when compared to unidirectional gated recurrent memory cells such as GRUs and LSTM. Thus, indicating that an input-output sequence pair of finite length, collected from a dynamical system and when observed anachronistically, may carry information in both time directions for prediction of a dynamical systems parameter.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
Filter design for small target detection on infrared imagery using normalized-cross-correlation layer
Authors:
H. Seçkin Demir,
Erdem Akagunduz
Abstract:
In this paper, we introduce a machine learning approach to the problem of infrared small target detection filter design. For this purpose, similarly to a convolutional layer of a neural network, the normalized-cross-correlational (NCC) layer, which we utilize for designing a target detection/recognition filter bank, is proposed. By employing the NCC layer in a neural network structure, we introduc…
▽ More
In this paper, we introduce a machine learning approach to the problem of infrared small target detection filter design. For this purpose, similarly to a convolutional layer of a neural network, the normalized-cross-correlational (NCC) layer, which we utilize for designing a target detection/recognition filter bank, is proposed. By employing the NCC layer in a neural network structure, we introduce a framework, in which supervised training is used to calculate the optimal filter shape and the optimum number of filters required for a specific target detection/recognition task on infrared images. We also propose the mean-absolute-deviation NCC (MAD-NCC) layer, an efficient implementation of the proposed NCC layer, designed especially for FPGA systems, in which square root operations are avoided for real-time computation. As a case study we work on dim-target detection on mid-wave infrared imagery and obtain the filters that can discriminate a dim target from various types of background clutter, specific to our operational concept.
△ Less
Submitted 15 June, 2020;
originally announced June 2020.
-
A Hybrid Framework for Matching Printing Design Files to Product Photos
Authors:
Alper Kaplan,
Erdem Akagunduz
Abstract:
We propose a real-time image matching framework, which is hybrid in the sense that it uses both hand-crafted features and deep features obtained from a well-tuned deep convolutional network. The matching problem, which we concentrate on, is specific to a certain application, that is, printing design to product photo matching. Printing designs are any kind of template image files, created using a d…
▽ More
We propose a real-time image matching framework, which is hybrid in the sense that it uses both hand-crafted features and deep features obtained from a well-tuned deep convolutional network. The matching problem, which we concentrate on, is specific to a certain application, that is, printing design to product photo matching. Printing designs are any kind of template image files, created using a design tool, thus are perfect image signals. However, photographs of a printed product suffer many unwanted effects, such as uncontrolled shooting angle, uncontrolled illumination, occlusions, printing deficiencies in color, camera noise, optic blur, et cetera. For this purpose, we create an image set that includes printing design and corresponding product photo pairs with collaboration of an actual printing facility. Using this image set, we benchmark various hand-crafted and deep features for matching performance and propose a framework in which deep learning is utilized with highest contribution, but without disabling real-time operation using an ordinary desktop computer.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Authors:
Irem Ulku,
Erdem Akagunduz
Abstract:
Semantic segmentation is the pixel-wise labelling of an image. Since the problem is defined at the pixel level, determining image class labels only is not acceptable, but localising them at the original image pixel resolution is necessary. Boosted by the extraordinary ability of convolutional neural networks (CNN) in creating semantic, high level and hierarchical image features; several deep learn…
▽ More
Semantic segmentation is the pixel-wise labelling of an image. Since the problem is defined at the pixel level, determining image class labels only is not acceptable, but localising them at the original image pixel resolution is necessary. Boosted by the extraordinary ability of convolutional neural networks (CNN) in creating semantic, high level and hierarchical image features; several deep learning-based 2D semantic segmentation approaches have been proposed within the last decade. In this survey, we mainly focus on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images. We started with an analysis of the public image sets and leaderboards for 2D semantic segmentation, with an overview of the techniques employed in performance evaluation. In examining the evolution of the field, we chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era. We technically analysed the solutions put forward in terms of solving the fundamental problems of the field, such as fine-grained localisation and scale invariance. Before drawing our conclusions, we present a table of methods from all mentioned eras, with a summary of each approach that explains their contribution to the field. We conclude the survey by discussing the current challenges of the field and to what extent they have been solved.
△ Less
Submitted 16 March, 2022; v1 submitted 21 December, 2019;
originally announced December 2019.
-
Defining Image Memorability using the Visual Memory Schema
Authors:
Erdem Akagunduz,
Adrian G. Bors,
Karla K. Evans
Abstract:
Memorability of an image is a characteristic determined by the human observers' ability to remember images they have seen. Yet recent work on image memorability defines it as an intrinsic property that can be obtained independent of the observer. {The current study aims to enhance our understanding and prediction of image memorability, improving upon existing approaches by incorporating the proper…
▽ More
Memorability of an image is a characteristic determined by the human observers' ability to remember images they have seen. Yet recent work on image memorability defines it as an intrinsic property that can be obtained independent of the observer. {The current study aims to enhance our understanding and prediction of image memorability, improving upon existing approaches by incorporating the properties of cumulative human annotations.} We propose a new concept called the Visual Memory Schema (VMS) referring to an organisation of image components human observers share when encoding and recognising images. The concept of VMS is operationalised by asking human observers to define memorable regions of images they were asked to remember during an episodic memory test. We then statistically assess the consistency of VMSs across observers for either correctly or incorrectly recognised images. The associations of the VMSs with eye fixations and saliency are analysed separately as well. Lastly, we adapt various deep learning architectures for the reconstruction and prediction of memorable regions in images and analyse the results when using transfer learning at the outputs of different convolutional network layers.
△ Less
Submitted 5 March, 2019;
originally announced March 2019.