-
State-of-the-Art in Nudity Classification: A Comparative Analysis
Authors:
Fatih Cagatay Akyon,
Alptekin Temizel
Abstract:
This paper presents a comparative analysis of existing nudity classification techniques for classifying images based on the presence of nudity, with a focus on their application in content moderation. The evaluation focuses on CNN-based models, vision transformer, and popular open-source safety checkers from Stable Diffusion and Large-scale Artificial Intelligence Open Network (LAION). The study i…
▽ More
This paper presents a comparative analysis of existing nudity classification techniques for classifying images based on the presence of nudity, with a focus on their application in content moderation. The evaluation focuses on CNN-based models, vision transformer, and popular open-source safety checkers from Stable Diffusion and Large-scale Artificial Intelligence Open Network (LAION). The study identifies the limitations of current evaluation datasets and highlights the need for more diverse and challenging datasets. The paper discusses the potential implications of these findings for develo** more accurate and effective image classification systems on online platforms. Overall, the study emphasizes the importance of continually improving image classification models to ensure the safety and well-being of platform users. The project page, including the demonstrations and results is publicly available at https://github.com/fcakyon/content-moderation-deep-learning.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Ulcerative Colitis Mayo Endoscopic Scoring Classification with Active Learning and Generative Data Augmentation
Authors:
Ümit Mert Çağlar,
Alperen İnci,
Oğuz Hanoğlu,
Görkem Polat,
Alptekin Temizel
Abstract:
Endoscopic imaging is commonly used to diagnose Ulcerative Colitis (UC) and classify its severity. It has been shown that deep learning based methods are effective in automated analysis of these images and can potentially be used to aid medical doctors. Unleashing the full potential of these methods depends on the availability of large amount of labeled images; however, obtaining and labeling thes…
▽ More
Endoscopic imaging is commonly used to diagnose Ulcerative Colitis (UC) and classify its severity. It has been shown that deep learning based methods are effective in automated analysis of these images and can potentially be used to aid medical doctors. Unleashing the full potential of these methods depends on the availability of large amount of labeled images; however, obtaining and labeling these images are quite challenging. In this paper, we propose a active learning based generative augmentation method. The method involves generating a large number of synthetic samples by training using a small dataset consisting of real endoscopic images. The resulting data pool is narrowed down by using active learning methods to select the most informative samples, which are then used to train a classifier. We demonstrate the effectiveness of our method through experiments on a publicly available endoscopic image dataset. The results show that using synthesized samples in conjunction with active learning leads to improved classification performance compared to using only the original labeled examples and the baseline classification performance of 68.1% increases to 74.5% in terms of Quadratic Weighted Kappa (QWK) Score. Another observation is that, attaining equivalent performance using only real data necessitated three times higher number of images.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Adversarial Image Generation by Spatial Transformation in Perceptual Colorspaces
Authors:
Ayberk Aydin,
Alptekin Temizel
Abstract:
Deep neural networks are known to be vulnerable to adversarial perturbations. The amount of these perturbations are generally quantified using $L_p$ metrics, such as $L_0$, $L_2$ and $L_\infty$. However, even when the measured perturbations are small, they tend to be noticeable by human observers since $L_p$ distance metrics are not representative of human perception. On the other hand, humans are…
▽ More
Deep neural networks are known to be vulnerable to adversarial perturbations. The amount of these perturbations are generally quantified using $L_p$ metrics, such as $L_0$, $L_2$ and $L_\infty$. However, even when the measured perturbations are small, they tend to be noticeable by human observers since $L_p$ distance metrics are not representative of human perception. On the other hand, humans are less sensitive to changes in colorspace. In addition, pixel shifts in a constrained neighborhood are hard to notice. Motivated by these observations, we propose a method that creates adversarial examples by applying spatial transformations, which creates adversarial examples by changing the pixel locations independently to chrominance channels of perceptual colorspaces such as $YC_{b}C_{r}$ and $CIELAB$, instead of making an additive perturbation or manipulating pixel values directly. In a targeted white-box attack setting, the proposed method is able to obtain competitive fooling rates with very high confidence. The experimental evaluations show that the proposed method has favorable results in terms of approximate perceptual distance between benign and adversarially generated images. The source code is publicly available at https://github.com/ayberkydn/stadv-torch
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
TopoMask: Instance-Mask-Based Formulation for the Road Topology Problem via Transformer-Based Architecture
Authors:
M. Esat Kalfaoglu,
Halil Ibrahim Ozturk,
Ozsel Kilinc,
Alptekin Temizel
Abstract:
Driving scene understanding task involves detecting static elements such as lanes, traffic signs, and traffic lights, and their relationships with each other. To facilitate the development of comprehensive scene understanding solutions using multiple camera views, a new dataset called Road Genome (OpenLane-V2) has been released. This dataset allows for the exploration of complex road connections a…
▽ More
Driving scene understanding task involves detecting static elements such as lanes, traffic signs, and traffic lights, and their relationships with each other. To facilitate the development of comprehensive scene understanding solutions using multiple camera views, a new dataset called Road Genome (OpenLane-V2) has been released. This dataset allows for the exploration of complex road connections and situations where lane markings may be absent. Instead of using traditional lane markings, the lanes in this dataset are represented by centerlines, which offer a more suitable representation of lanes and their connections. In this study, we have introduced a new approach called TopoMask for predicting centerlines in road topology. Unlike existing approaches in the literature that rely on keypoints or parametric methods, TopoMask utilizes an instance-mask based formulation with a transformer-based architecture and, in order to enrich the mask instances with flow information, a direction label representation is proposed. TopoMask have ranked 4th in the OpenLane-V2 Score (OLS) and ranked 2nd in the F1 score of centerline prediction in OpenLane Topology Challenge 2023. In comparison to the current state-of-the-art method, TopoNet, the proposed method has achieved similar performance in Frechet-based lane detection and outperformed TopoNet in Chamfer-based lane detection without utilizing its scene graph neural network.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Deep Architectures for Content Moderation and Movie Content Rating
Authors:
Fatih Cagatay Akyon,
Alptekin Temizel
Abstract:
Rating a video based on its content is an important step for classifying video age categories. Movie content rating and TV show rating are the two most common rating systems established by professional committees. However, manually reviewing and evaluating scene/film content by a committee is a tedious work and it becomes increasingly difficult with the ever-growing amount of online video content.…
▽ More
Rating a video based on its content is an important step for classifying video age categories. Movie content rating and TV show rating are the two most common rating systems established by professional committees. However, manually reviewing and evaluating scene/film content by a committee is a tedious work and it becomes increasingly difficult with the ever-growing amount of online video content. As such, a desirable solution is to use computer vision based video content analysis techniques to automate the evaluation process. In this paper, related works are summarized for action recognition, multi-modal learning, movie genre classification, and sensitive content detection in the context of content moderation and movie content rating. The project page is available at https://github.com/fcakyon/content-moderation-deep-learning.
△ Less
Submitted 12 December, 2022; v1 submitted 8 December, 2022;
originally announced December 2022.
-
Sequence Models for Drone vs Bird Classification
Authors:
Fatih Cagatay Akyon,
Erdem Akagunduz,
Sinan Onur Altinuc,
Alptekin Temizel
Abstract:
Drone detection has become an essential task in object detection as drone costs have decreased and drone technology has improved. It is, however, difficult to detect distant drones when there is weak contrast, long range, and low visibility. In this work, we propose several sequence classification architectures to reduce the detected false-positive ratio of drone tracks. Moreover, we propose a new…
▽ More
Drone detection has become an essential task in object detection as drone costs have decreased and drone technology has improved. It is, however, difficult to detect distant drones when there is weak contrast, long range, and low visibility. In this work, we propose several sequence classification architectures to reduce the detected false-positive ratio of drone tracks. Moreover, we propose a new drone vs. bird sequence classification dataset to train and evaluate the proposed architectures. 3D CNN, LSTM, and Transformer based sequence classification architectures have been trained on the proposed dataset to show the effectiveness of the proposed idea. As experiments show, using sequence information, bird classification and overall F1 scores can be increased by up to 73% and 35%, respectively. Among all sequence classification models, R(2+1)D-based fully convolutional model yields the best transfer learning and fine-tuning results.
△ Less
Submitted 19 December, 2022; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Evaluation and Analysis of Different Aggregation and Hyperparameter Selection Methods for Federated Brain Tumor Segmentation
Authors:
Ece Isik-Polat,
Gorkem Polat,
Altan Kocyigit,
Alptekin Temizel
Abstract:
Availability of large, diverse, and multi-national datasets is crucial for the development of effective and clinically applicable AI systems in the medical imaging domain. However, forming a global model by bringing these datasets together at a central location, comes along with various data privacy and ownership problems. To alleviate these problems, several recent studies focus on the federated…
▽ More
Availability of large, diverse, and multi-national datasets is crucial for the development of effective and clinically applicable AI systems in the medical imaging domain. However, forming a global model by bringing these datasets together at a central location, comes along with various data privacy and ownership problems. To alleviate these problems, several recent studies focus on the federated learning paradigm, a distributed learning approach for decentralized data. Federated learning leverages all the available data without any need for sharing collaborators' data with each other or collecting them on a central server. Studies show that federated learning can provide competitive performance with conventional central training, while having a good generalization capability. In this work, we have investigated several federated learning approaches on the brain tumor segmentation problem. We explore different strategies for faster convergence and better performance which can also work on strong Non-IID cases.
△ Less
Submitted 12 April, 2022; v1 submitted 16 February, 2022;
originally announced February 2022.
-
Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection
Authors:
Fatih Cagatay Akyon,
Sinan Onur Altinuc,
Alptekin Temizel
Abstract:
Detection of small objects and objects far away in the scene is a major challenge in surveillance applications. Such objects are represented by small number of pixels in the image and lack sufficient details, making them difficult to detect using conventional detectors. In this work, an open-source framework called Slicing Aided Hyper Inference (SAHI) is proposed that provides a generic slicing ai…
▽ More
Detection of small objects and objects far away in the scene is a major challenge in surveillance applications. Such objects are represented by small number of pixels in the image and lack sufficient details, making them difficult to detect using conventional detectors. In this work, an open-source framework called Slicing Aided Hyper Inference (SAHI) is proposed that provides a generic slicing aided inference and fine-tuning pipeline for small object detection. The proposed technique is generic in the sense that it can be applied on top of any available object detector without any fine-tuning. Experimental evaluations, using object detection baselines on the Visdrone and xView aerial object detection datasets show that the proposed inference method can increase object detection AP by 6.8%, 5.1% and 5.3% for FCOS, VFNet and TOOD detectors, respectively. Moreover, the detection accuracy can be further increased with a slicing aided fine-tuning, resulting in a cumulative increase of 12.7%, 13.4% and 14.5% AP in the same order. Proposed technique has been integrated with Detectron2, MMDetection and YOLOv5 models and it is publicly available at https://github.com/obss/sahi.git .
△ Less
Submitted 24 October, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Class Distance Weighted Cross-Entropy Loss for Ulcerative Colitis Severity Estimation
Authors:
Gorkem Polat,
Ilkay Ergenc,
Haluk Tarik Kani,
Yesim Ozen Alahdab,
Ozlen Atug,
Alptekin Temizel
Abstract:
In scoring systems used to measure the endoscopic activity of ulcerative colitis, such as Mayo endoscopic score or Ulcerative Colitis Endoscopic Index Severity, levels increase with severity of the disease activity. Such relative ranking among the scores makes it an ordinal regression problem. On the other hand, most studies use categorical cross-entropy loss function to train deep learning models…
▽ More
In scoring systems used to measure the endoscopic activity of ulcerative colitis, such as Mayo endoscopic score or Ulcerative Colitis Endoscopic Index Severity, levels increase with severity of the disease activity. Such relative ranking among the scores makes it an ordinal regression problem. On the other hand, most studies use categorical cross-entropy loss function to train deep learning models, which is not optimal for the ordinal regression problem. In this study, we propose a novel loss function, class distance weighted cross-entropy (CDW-CE), that respects the order of the classes and takes the distance of the classes into account in calculation of the cost. Experimental evaluations show that models trained with CDW-CE outperform the models trained with conventional categorical cross-entropy and other commonly used loss functions which are designed for the ordinal regression problems. In addition, the class activation maps of models trained with CDW-CE loss are more class-discriminative and they are found to be more reasonable by the domain experts.
△ Less
Submitted 12 June, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Automated question generation and question answering from Turkish texts
Authors:
Fatih Cagatay Akyon,
Devrim Cavusoglu,
Cemil Cengiz,
Sinan Onur Altinuc,
Alptekin Temizel
Abstract:
While exam-style questions are a fundamental educational tool serving a variety of purposes, manual construction of questions is a complex process that requires training, experience and resources. Automatic question generation (QG) techniques can be utilized to satisfy the need for a continuous supply of new questions by streamlining their generation. However, compared to automatic question answer…
▽ More
While exam-style questions are a fundamental educational tool serving a variety of purposes, manual construction of questions is a complex process that requires training, experience and resources. Automatic question generation (QG) techniques can be utilized to satisfy the need for a continuous supply of new questions by streamlining their generation. However, compared to automatic question answering (QA), QG is a more challenging task. In this work, we fine-tune a multilingual T5 (mT5) transformer in a multi-task setting for QA, QG and answer extraction tasks using Turkish QA datasets. To the best of our knowledge, this is the first academic work that performs automated text-to-text question generation from Turkish texts. Experimental evaluations show that the proposed multi-task setting achieves state-of-the-art Turkish question answering and question generation performance on TQuADv1, TQuADv2 datasets and XQuAD Turkish split. The source code and the pre-trained models are available at https://github.com/obss/turkish-question-generation.
△ Less
Submitted 6 April, 2022; v1 submitted 11 November, 2021;
originally announced November 2021.
-
Imperceptible Adversarial Examples by Spatial Chroma-Shift
Authors:
Ayberk Aydin,
Deniz Sen,
Berat Tuna Karli,
Oguz Hanoglu,
Alptekin Temizel
Abstract:
Deep Neural Networks have been shown to be vulnerable to various kinds of adversarial perturbations. In addition to widely studied additive noise based perturbations, adversarial examples can also be created by applying a per pixel spatial drift on input images. While spatial transformation based adversarial examples look more natural to human observers due to absence of additive noise, they still…
▽ More
Deep Neural Networks have been shown to be vulnerable to various kinds of adversarial perturbations. In addition to widely studied additive noise based perturbations, adversarial examples can also be created by applying a per pixel spatial drift on input images. While spatial transformation based adversarial examples look more natural to human observers due to absence of additive noise, they still possess visible distortions caused by spatial transformations. Since the human vision is more sensitive to the distortions in the luminance compared to those in chrominance channels, which is one of the main ideas behind the lossy visual multimedia compression standards, we propose a spatial transformation based perturbation method to create adversarial examples by only modifying the color components of an input image. While having competitive fooling rates on CIFAR-10 and NIPS2017 Adversarial Learning Challenge datasets, examples created with the proposed method have better scores with regards to various perceptual quality metrics. Human visual perception studies validate that the examples are more natural looking and often indistinguishable from their original counterparts.
△ Less
Submitted 2 September, 2021; v1 submitted 5 August, 2021;
originally announced August 2021.
-
Generative Data Augmentation for Vehicle Detection in Aerial Images
Authors:
Hilmi Kumdakcı,
Cihan Öngün,
Alptekin Temizel
Abstract:
Scarcity of training data is one of the prominent problems for deep networks which require large amounts data. Data augmentation is a widely used method to increase the number of training samples and their variations. In this paper, we focus on improving vehicle detection performance in aerial images and propose a generative augmentation method which does not need any extra supervision than the bo…
▽ More
Scarcity of training data is one of the prominent problems for deep networks which require large amounts data. Data augmentation is a widely used method to increase the number of training samples and their variations. In this paper, we focus on improving vehicle detection performance in aerial images and propose a generative augmentation method which does not need any extra supervision than the bounding box annotations of the vehicle objects in the training dataset. The proposed method increases the performance of vehicle detection by allowing detectors to be trained with higher number of instances, especially when there are limited number of training instances. The proposed method is generic in the sense that it can be integrated with different generators. The experiments show that the method increases the Average Precision by up to 25.2% and 25.7% when integrated with Pluralistic and DeepFill respectively.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy
Authors:
Sharib Ali,
Mariia Dmitrieva,
Noha Ghatwary,
Sophia Bano,
Gorkem Polat,
Alptekin Temizel,
Adrian Krenzer,
Amar Hekalo,
Yun Bo Guo,
Bogdan Matuszewski,
Mourad Gridach,
Irina Voiculescu,
Vishnusai Yoganand,
Arnav Chavan,
Aryan Raj,
Nhan T. Nguyen,
Dat Q. Tran,
Le Duy Huynh,
Nicolas Boutry,
Shahadate Rezvy,
Haijian Chen,
Yoon Ho Choi,
Anand Subramanian,
Velmurugan Balasubramanian,
Xiaohong W. Gao
, et al. (12 additional authors not shown)
Abstract:
The Endoscopy Computer Vision Challenge (EndoCV) is a crowd-sourcing initiative to address eminent problems in develo** reliable computer aided detection and diagnosis endoscopy systems and suggest a pathway for clinical translation of technologies. Whilst endoscopy is a widely used diagnostic and treatment tool for hollow-organs, there are several core challenges often faced by endoscopists, ma…
▽ More
The Endoscopy Computer Vision Challenge (EndoCV) is a crowd-sourcing initiative to address eminent problems in develo** reliable computer aided detection and diagnosis endoscopy systems and suggest a pathway for clinical translation of technologies. Whilst endoscopy is a widely used diagnostic and treatment tool for hollow-organs, there are several core challenges often faced by endoscopists, mainly: 1) presence of multi-class artefacts that hinder their visual interpretation, and 2) difficulty in identifying subtle precancerous precursors and cancer abnormalities. Artefacts often affect the robustness of deep learning methods applied to the gastrointestinal tract organs as they can be confused with tissue of interest. EndoCV2020 challenges are designed to address research questions in these remits. In this paper, we present a summary of methods developed by the top 17 teams and provide an objective comparison of state-of-the-art methods and methods designed by the participants for two sub-challenges: i) artefact detection and segmentation (EAD2020), and ii) disease detection and segmentation (EDD2020). Multi-center, multi-organ, multi-class, and multi-modal clinical endoscopy datasets were compiled for both EAD2020 and EDD2020 sub-challenges. The out-of-sample generalization ability of detection algorithms was also evaluated. Whilst most teams focused on accuracy improvements, only a few methods hold credibility for clinical usability. The best performing teams provided solutions to tackle class imbalance, and variabilities in size, origin, modality and occurrences by exploring data augmentation, data fusion, and optimal class thresholding techniques.
△ Less
Submitted 17 February, 2021; v1 submitted 12 October, 2020;
originally announced October 2020.
-
LPMNet: Latent Part Modification and Generation for 3D Point Clouds
Authors:
Cihan Öngün,
Alptekin Temizel
Abstract:
In this paper, we focus on latent modification and generation of 3D point cloud object models with respect to their semantic parts. Different to the existing methods which use separate networks for part generation and assembly, we propose a single end-to-end Autoencoder model that can handle generation and modification of both semantic parts, and global shapes. The proposed method supports part ex…
▽ More
In this paper, we focus on latent modification and generation of 3D point cloud object models with respect to their semantic parts. Different to the existing methods which use separate networks for part generation and assembly, we propose a single end-to-end Autoencoder model that can handle generation and modification of both semantic parts, and global shapes. The proposed method supports part exchange between 3D point cloud models and composition by different parts to form new models by directly editing latent representations. This holistic approach does not need part-based training to learn part representations and does not introduce any extra loss besides the standard reconstruction loss. The experiments demonstrate the robustness of the proposed method with different object categories and varying number of points. The method can generate new models by integration of generative models such as GANs and VAEs and can work with unannotated point clouds by integration of a segmentation module.
△ Less
Submitted 25 February, 2021; v1 submitted 8 August, 2020;
originally announced August 2020.
-
Optimization of XNOR Convolution for Binary Convolutional Neural Networks on GPU
Authors:
Mete Can Kaya,
Alperen İnci,
Alptekin Temizel
Abstract:
Binary convolutional networks have lower computational load and lower memory foot-print compared to their full-precision counterparts. So, they are a feasible alternative for the deployment of computer vision applications on limited capacity embedded devices. Once trained on less resource-constrained computational environments, they can be deployed for real-time inference on such devices. In this…
▽ More
Binary convolutional networks have lower computational load and lower memory foot-print compared to their full-precision counterparts. So, they are a feasible alternative for the deployment of computer vision applications on limited capacity embedded devices. Once trained on less resource-constrained computational environments, they can be deployed for real-time inference on such devices. In this study, we propose an implementation of binary convolutional network inference on GPU by focusing on optimization of XNOR convolution. Experimental results show that using GPU can provide a speed-up of up to $42.61\times$ with a kernel size of $3\times3$. The implementation is publicly available at https://github.com/metcan/Binary-Convolutional-Neural-Network-Inference-on-GPU
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Performance Analysis of Noise Subspace-based Narrowband Direction-of-Arrival (DOA) Estimation Algorithms on CPU and GPU
Authors:
Hamza Eray,
Alptekin Temizel
Abstract:
High-performance computing of array signal processing problems is a critical task as real-time system performance is required for many applications. Noise subspace-based Direction-of-Arrival (DOA) estimation algorithms are popular in the literature since they provide higher angular resolution and higher robustness. In this study, we investigate various optimization strategies for high-performance…
▽ More
High-performance computing of array signal processing problems is a critical task as real-time system performance is required for many applications. Noise subspace-based Direction-of-Arrival (DOA) estimation algorithms are popular in the literature since they provide higher angular resolution and higher robustness. In this study, we investigate various optimization strategies for high-performance DOA estimation on GPU and comparatively analyze alternative implementations (MATLAB, C/C++ and CUDA). Experiments show that up to 3.1x speedup can be achieved on GPU compared to the baseline multi-threaded CPU implementation. The source code is publicly available at the following link: https://github.com/erayhamza/NssDOACuda
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Bit-level Parallelization of 3DES Encryption on GPU
Authors:
Kaan Furkan Altınok,
Afşin Peker,
Alptekin Temizel
Abstract:
Triple DES (3DES) is a standard fundamental encryption algorithm, used in several electronic payment applications and web browsers. In this paper, we propose a parallel implementation of 3DES on GPU. Since 3DES encrypts data with 64-bit blocks, our approach considers each 64-bit block a kernel block and assign a separate thread to process each bit. Algorithm's permutation operations, XOR operation…
▽ More
Triple DES (3DES) is a standard fundamental encryption algorithm, used in several electronic payment applications and web browsers. In this paper, we propose a parallel implementation of 3DES on GPU. Since 3DES encrypts data with 64-bit blocks, our approach considers each 64-bit block a kernel block and assign a separate thread to process each bit. Algorithm's permutation operations, XOR operations, and S-box operations are done in parallel within these kernel blocks. The implementation benefits from the use of constant and shared memory types to optimize memory access. The results show an average 10.70x speed-up against the baseline multi-threaded CPU implementation. The implementation is publicly available at https://github.com/kaanfurkan35/3DES_GPU
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
Accelerating Translational Image Registration for HDR Images on GPU
Authors:
Kadir Cenk Alpay,
Kadir Berkay Aydemir,
Alptekin Temizel
Abstract:
High Dynamic Range (HDR) images are generated using multiple exposures of a scene. When a hand-held camera is used to capture a static scene, these images need to be aligned by globally shifting each image in both dimensions. For a fast and robust alignment, the shift amount is commonly calculated using Median Threshold Bitmaps (MTB) and creating an image pyramid. In this study, we optimize these…
▽ More
High Dynamic Range (HDR) images are generated using multiple exposures of a scene. When a hand-held camera is used to capture a static scene, these images need to be aligned by globally shifting each image in both dimensions. For a fast and robust alignment, the shift amount is commonly calculated using Median Threshold Bitmaps (MTB) and creating an image pyramid. In this study, we optimize these computations using a parallel processing approach utilizing GPU. Experimental evaluation shows that the proposed implementation achieves a speed-up of up to 6.24 times over the baseline multi-threaded CPU implementation on the alignment of one image pair. The source code is available at https://github.com/kadircenk/WardMTBCuda
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
Speaker and Posture Classification using Instantaneous Intraspeech Breathing Features
Authors:
Atıl İlerialkan,
Alptekin Temizel,
Hüseyin Hacıhabiboğlu
Abstract:
Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude…
▽ More
Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude features are extracted using the Hilbert-Huang transform (HHT) and fed into a CNN-GRU network for classification of recordings from the open intraspeech breathing sound dataset, BreathBase, that we collected for this study. Using intraspeech breathing sounds, 87% speaker classification, and 98% posture classification accuracy were obtained.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
Attack Type Agnostic Perceptual Enhancement of Adversarial Images
Authors:
Bilgin Aksoy,
Alptekin Temizel
Abstract:
Adversarial images are samples that are intentionally modified to deceive machine learning systems. They are widely used in applications such as CAPTHAs to help distinguish legitimate human users from bots. However, the noise introduced during the adversarial image generation process degrades the perceptual quality and introduces artificial colours; making it also difficult for humans to classify…
▽ More
Adversarial images are samples that are intentionally modified to deceive machine learning systems. They are widely used in applications such as CAPTHAs to help distinguish legitimate human users from bots. However, the noise introduced during the adversarial image generation process degrades the perceptual quality and introduces artificial colours; making it also difficult for humans to classify images and recognise objects. In this letter, we propose a method to enhance the perceptual quality of these adversarial images. The proposed method is attack type agnostic and could be used in association with the existing attacks in the literature. Our experiments show that the generated adversarial images have lower Euclidean distance values while maintaining the same adversarial attack performance. Distances are reduced by 5.88% to 41.27% with an average reduction of 22% over the different attack and network types.
△ Less
Submitted 10 May, 2019; v1 submitted 7 March, 2019;
originally announced March 2019.
-
Paired 3D Model Generation with Conditional Generative Adversarial Networks
Authors:
Cihan Öngün,
Alptekin Temizel
Abstract:
Generative Adversarial Networks (GANs) are shown to be successful at generating new and realistic samples including 3D object models. Conditional GAN, a variant of GANs, allows generating samples in given conditions. However, objects generated for each condition are different and it does not allow generation of the same object in different conditions. In this paper, we first adapt conditional GAN,…
▽ More
Generative Adversarial Networks (GANs) are shown to be successful at generating new and realistic samples including 3D object models. Conditional GAN, a variant of GANs, allows generating samples in given conditions. However, objects generated for each condition are different and it does not allow generation of the same object in different conditions. In this paper, we first adapt conditional GAN, which is originally designed for 2D image generation, to the problem of generating 3D models in different rotations. We then propose a new approach to guide the network to generate the same 3D sample in different and controllable rotation angles (sample pairs). Unlike previous studies, the proposed method does not require modification of the standard conditional GAN architecture and it can be integrated into the training step of any conditional GAN. Experimental results and visual comparison of 3D models show that the proposed method is successful at generating model pairs in different conditions.
△ Less
Submitted 15 March, 2019; v1 submitted 9 August, 2018;
originally announced August 2018.
-
Multi-modal Egocentric Activity Recognition using Audio-Visual Features
Authors:
Mehmet Ali Arabacı,
Fatih Özkan,
Elif Surer,
Peter Jančovič,
Alptekin Temizel
Abstract:
Egocentric activity recognition in first-person videos has an increasing importance with a variety of applications such as lifelogging, summarization, assisted-living and activity tracking. Existing methods for this task are based on interpretation of various sensor information using pre-determined weights for each feature. In this work, we propose a new framework for egocentric activity recogniti…
▽ More
Egocentric activity recognition in first-person videos has an increasing importance with a variety of applications such as lifelogging, summarization, assisted-living and activity tracking. Existing methods for this task are based on interpretation of various sensor information using pre-determined weights for each feature. In this work, we propose a new framework for egocentric activity recognition problem based on combining audio-visual features with multi-kernel learning (MKL) and multi-kernel boosting (MKBoost). For that purpose, firstly grid optical-flow, virtual-inertia feature, log-covariance, cuboid are extracted from the video. The audio signal is characterized using a "supervector", obtained based on Gaussian mixture modelling of frame-level features, followed by a maximum a-posteriori adaptation. Then, the extracted multi-modal features are adaptively fused by MKL classifiers in which both the feature and kernel selection/weighing and recognition tasks are performed together. The proposed framework was evaluated on a number of egocentric datasets. The results showed that using multi-modal features with MKL outperforms the existing methods.
△ Less
Submitted 30 April, 2020; v1 submitted 2 July, 2018;
originally announced July 2018.
-
The Effects of JPEG and JPEG2000 Compression on Attacks using Adversarial Examples
Authors:
Ayse Elvan Aydemir,
Alptekin Temizel,
Tugba Taskaya Temizel
Abstract:
Adversarial examples are known to have a negative effect on the performance of classifiers which have otherwise good performance on undisturbed images. These examples are generated by adding non-random noise to the testing samples in order to make classifier misclassify the given data. Adversarial attacks use these intentionally generated examples and they pose a security risk to the machine learn…
▽ More
Adversarial examples are known to have a negative effect on the performance of classifiers which have otherwise good performance on undisturbed images. These examples are generated by adding non-random noise to the testing samples in order to make classifier misclassify the given data. Adversarial attacks use these intentionally generated examples and they pose a security risk to the machine learning based systems. To be immune to such attacks, it is desirable to have a pre-processing mechanism which removes these effects causing misclassification while kee** the content of the image. JPEG and JPEG2000 are well-known image compression techniques which suppress the high-frequency content taking the human visual system into account. JPEG has been also shown to be an effective method for reducing adversarial noise. In this paper, we propose applying JPEG2000 compression as an alternative and systematically compare the classification performance of adversarial images compressed using JPEG and JPEG2000 at different target PSNR values and maximum compression levels. Our experiments show that JPEG2000 is more effective in reducing adversarial noise as it allows higher compression rates with less distortion and it does not introduce blocking artifacts.
△ Less
Submitted 31 March, 2018; v1 submitted 28 March, 2018;
originally announced March 2018.
-
Boosted Multiple Kernel Learning for First-Person Activity Recognition
Authors:
Fatih Ozkan,
Mehmet Ali Arabaci,
Elif Surer,
Alptekin Temizel
Abstract:
Activity recognition from first-person (ego-centric) videos has recently gained attention due to the increasing ubiquity of the wearable cameras. There has been a surge of efforts adapting existing feature descriptors and designing new descriptors for the first-person videos. An effective activity recognition system requires selection and use of complementary features and appropriate kernels for e…
▽ More
Activity recognition from first-person (ego-centric) videos has recently gained attention due to the increasing ubiquity of the wearable cameras. There has been a surge of efforts adapting existing feature descriptors and designing new descriptors for the first-person videos. An effective activity recognition system requires selection and use of complementary features and appropriate kernels for each feature. In this study, we propose a data-driven framework for first-person activity recognition which effectively selects and combines features and their respective kernels during the training. Our experimental results show that use of Multiple Kernel Learning (MKL) and Boosted MKL in first-person activity recognition problem exhibits improved results in comparison to the state-of-the-art. In addition, these techniques enable the expansion of the framework with new features in an efficient and convenient way.
△ Less
Submitted 5 June, 2017; v1 submitted 22 February, 2017;
originally announced February 2017.