Search | arXiv e-print repository

E-TSL: A Continuous Educational Turkish Sign Language Dataset with Baseline Methods

Authors: Şükrü Öztürk, Hacer Yalim Keles

Abstract: This study introduces the continuous Educational Turkish Sign Language (E-TSL) dataset, collected from online Turkish language lessons for 5th, 6th, and 8th grades. The dataset comprises 1,410 videos totaling nearly 24 hours and includes performances from 11 signers. Turkish, an agglutinative language, poses unique challenges for sign language translation, particularly with a vocabulary where 64%… ▽ More This study introduces the continuous Educational Turkish Sign Language (E-TSL) dataset, collected from online Turkish language lessons for 5th, 6th, and 8th grades. The dataset comprises 1,410 videos totaling nearly 24 hours and includes performances from 11 signers. Turkish, an agglutinative language, poses unique challenges for sign language translation, particularly with a vocabulary where 64% are singleton words and 85% are rare words, appearing less than five times. We developed two baseline models to address these challenges: the Pose to Text Transformer (P2T-T) and the Graph Neural Network based Transformer (GNN-T) models. The GNN-T model achieved 19.13% BLEU-1 score and 3.28% BLEU-4 score, presenting a significant challenge compared to existing benchmarks. The P2T-T model, while demonstrating slightly lower performance in BLEU scores, achieved a higher ROUGE-L score of 22.09%. Additionally, we benchmarked our model using the well-known PHOENIX-Weather 2014T dataset to validate our approach. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 7 pages, 3 figures, 4 tables, submitted to IEEE conference

arXiv:2405.02977 [pdf, other]

SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences

Authors: Ali Emre Keskin, Hacer Yalim Keles

Abstract: Numerous sign language datasets exist, yet they typically cover only a limited selection of the thousands of signs used globally. Moreover, creating diverse sign language datasets is an expensive and challenging task due to the costs associated with gathering a varied group of signers. Motivated by these challenges, we aimed to develop a solution that addresses these limitations. In this context,… ▽ More Numerous sign language datasets exist, yet they typically cover only a limited selection of the thousands of signs used globally. Moreover, creating diverse sign language datasets is an expensive and challenging task due to the costs associated with gathering a varied group of signers. Motivated by these challenges, we aimed to develop a solution that addresses these limitations. In this context, we focused on textually describing body movements from skeleton keypoint sequences, leading to the creation of a new dataset. We structured this dataset around AUTSL, a comprehensive isolated Turkish sign language dataset. We also developed a baseline model, SkelCap, which can generate textual descriptions of body movements. This model processes the skeleton keypoints data as a vector, applies a fully connected layer for embedding, and utilizes a transformer neural network for sequence-to-sequence modeling. We conducted extensive evaluations of our model, including signer-agnostic and sign-agnostic assessments. The model achieved promising results, with a ROUGE-L score of 0.98 and a BLEU-4 score of 0.94 in the signer-agnostic evaluation. The dataset we have prepared, namely the AUTSL-SkelCap, will be made publicly available soon. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 8 pages, 5 figures, 7 tables, submitted to IEEE conference

arXiv:2404.16814 [pdf, other]

Meta-Transfer Derm-Diagnosis: Exploring Few-Shot Learning and Transfer Learning for Skin Disease Classification in Long-Tail Distribution

Authors: Zeynep Özdemir, Hacer Yalim Keles, Ömer Özgür Tanrıöver

Abstract: Addressing the challenges of rare diseases is difficult, especially with the limited number of reference images and a small patient population. This is more evident in rare skin diseases, where we encounter long-tailed data distributions that make it difficult to develop unbiased and broadly effective models. The diverse ways in which image datasets are gathered and their distinct purposes also ad… ▽ More Addressing the challenges of rare diseases is difficult, especially with the limited number of reference images and a small patient population. This is more evident in rare skin diseases, where we encounter long-tailed data distributions that make it difficult to develop unbiased and broadly effective models. The diverse ways in which image datasets are gathered and their distinct purposes also add to these challenges. Our study conducts a detailed examination of the benefits and drawbacks of episodic and conventional training methodologies, adopting a few-shot learning approach alongside transfer learning. We evaluated our models using the ISIC2018, Derm7pt, and SD-198 datasets. With minimal labeled examples, our models showed substantial information gains and better performance compared to previously trained models. Our research emphasizes the improved ability to represent features in DenseNet121 and MobileNetV2 models, achieved by using pre-trained models on ImageNet to increase similarities within classes. Moreover, our experiments, ranging from 2-way to 5-way classifications with up to 10 examples, showed a growing success rate for traditional transfer learning methods as the number of examples increased. The addition of data augmentation techniques significantly improved our transfer learning based model performance, leading to higher performances than existing methods, especially in the SD-198 and ISIC2018 datasets. All source code related to this work will be made publicly available soon at the provided URL. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 17 pages, 5 figures, 6 tables, submitted to IEEE Journal of Biomedical and Health Informatics

arXiv:2403.05181 [pdf, other]

Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples

Authors: Eda Yilmaz, Hacer Yalim Keles

Abstract: Knowledge Distillation (KD) facilitates the transfer of discriminative capabilities from an advanced teacher model to a simpler student model, ensuring performance enhancement without compromising accuracy. It is also exploited for model stealing attacks, where adversaries use KD to mimic the functionality of a teacher model. Recent developments in this domain have been influenced by the Stingy Te… ▽ More Knowledge Distillation (KD) facilitates the transfer of discriminative capabilities from an advanced teacher model to a simpler student model, ensuring performance enhancement without compromising accuracy. It is also exploited for model stealing attacks, where adversaries use KD to mimic the functionality of a teacher model. Recent developments in this domain have been influenced by the Stingy Teacher model, which provided empirical analysis showing that sparse outputs can significantly degrade the performance of student models. Addressing the risk of intellectual property leakage, our work introduces an approach to train a teacher model that inherently protects its logits, influenced by the Nasty Teacher concept. Differing from existing methods, we incorporate sparse outputs of adversarial examples with standard training data to strengthen the teacher's defense against student distillation. Our approach carefully reduces the relative entropy between the original and adversarially perturbed outputs, allowing the model to produce adversarial logits with minimal impact on overall performance. The source codes will be made publicly available soon. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 12 pages, 3 figures, 6 tables

arXiv:2312.08194 [pdf, other]

SVInvNet: A Densely Connected Encoder-Decoder Architecture for Seismic Velocity Inversion

Authors: Mojtaba Najafi Khatounabad, Hacer Yalim Keles, Selma Kadioglu

Abstract: This study presents a deep learning-based approach to seismic velocity inversion problem, focusing on both noisy and noiseless training datasets of varying sizes. Our Seismic Velocity Inversion Network (SVInvNet) introduces a novel architecture that contains a multi-connection encoder-decoder structure enhanced with dense blocks. This design is specifically tuned to effectively process complex inf… ▽ More This study presents a deep learning-based approach to seismic velocity inversion problem, focusing on both noisy and noiseless training datasets of varying sizes. Our Seismic Velocity Inversion Network (SVInvNet) introduces a novel architecture that contains a multi-connection encoder-decoder structure enhanced with dense blocks. This design is specifically tuned to effectively process complex information, crucial for addressing the challenges of non-linear seismic velocity inversion. For training and testing, we created diverse seismic velocity models, including multi-layered, faulty, and salt dome categories. We also investigated how different kinds of ambient noise, both coherent and stochastic, and the size of the training dataset affect learning outcomes. SVInvNet is trained on datasets ranging from 750 to 6,000 samples and is tested using a large benchmark dataset of 12,000 samples. Despite its fewer parameters compared to the baseline, SVInvNet achieves superior performance with this dataset. The outcomes of the SVInvNet are additionally compared to those of the Full Waveform Inversion (FWI) method. The comparative analysis clearly reveals the effectiveness of the proposed model. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 14 pages, 11 figures, submitted to IEEE Transactions on Geoscience and Remote Sensing

arXiv:2306.09391 [pdf, other]

Multi-omics Prediction from High-content Cellular Imaging with Deep Learning

Authors: Rahil Mehrizi, Arash Mehrjou, Maryana Alegro, Yi Zhao, Benedetta Carbone, Carl Fishwick, Johanna Vappiani, **g Bi, Siobhan Sanford, Hakan Keles, Marcus Bantscheff, Cuong Nguyen, Patrick Schwab

Abstract: High-content cellular imaging, transcriptomics, and proteomics data provide rich and complementary views on the molecular layers of biology that influence cellular states and function. However, the biological determinants through which changes in multi-omics measurements influence cellular morphology have not yet been systematically explored, and the degree to which cell imaging could potentially… ▽ More High-content cellular imaging, transcriptomics, and proteomics data provide rich and complementary views on the molecular layers of biology that influence cellular states and function. However, the biological determinants through which changes in multi-omics measurements influence cellular morphology have not yet been systematically explored, and the degree to which cell imaging could potentially enable the prediction of multi-omics directly from cell imaging data is therefore currently unclear. Here, we address the question of whether it is possible to predict bulk multi-omics measurements directly from cell images using Image2Omics - a deep learning approach that predicts multi-omics in a cell population directly from high-content images of cells stained with multiplexed fluorescent dyes. We perform an experimental evaluation in gene-edited macrophages derived from human induced pluripotent stem cells (hiPSC) under multiple stimulation conditions and demonstrate that Image2Omics achieves significantly better performance in predicting transcriptomics and proteomics measurements directly from cell images than predictions based on the mean observed training set abundance. We observed significant predictability of abundances for 4927 (18.72%; 95% CI: 6.52%, 35.52%) and 3521 (13.38%; 95% CI: 4.10%, 32.21%) transcripts out of 26137 in M1 and M2-stimulated macrophages respectively and for 422 (8.46%; 95% CI: 0.58%, 25.83%) and 697 (13.98%; 95% CI: 2.41%, 32.83%) proteins out of 4986 in M1 and M2-stimulated macrophages respectively. Our results show that some transcript and protein abundances are predictable from cell imaging and that cell imaging may potentially, in some settings and depending on the mechanisms of interest and desired performance threshold, even be a scalable and resource-efficient substitute for multi-omics measurements. △ Less

Submitted 21 May, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2110.12396 [pdf, other]

doi 10.1109/ACCESS.2022.3151362

Using Motion History Images with 3D Convolutional Networks in Isolated Sign Language Recognition

Authors: Ozge Mercanoglu Sincan, Hacer Yalim Keles

Abstract: Sign language recognition using computational models is a challenging problem that requires simultaneous spatio-temporal modeling of the multiple sources, i.e. faces, hands, body, etc. In this paper, we propose an isolated sign language recognition model based on a model trained using Motion History Images (MHI) that are generated from RGB video frames. RGB-MHI images represent spatio-temporal sum… ▽ More Sign language recognition using computational models is a challenging problem that requires simultaneous spatio-temporal modeling of the multiple sources, i.e. faces, hands, body, etc. In this paper, we propose an isolated sign language recognition model based on a model trained using Motion History Images (MHI) that are generated from RGB video frames. RGB-MHI images represent spatio-temporal summary of each sign video effectively in a single RGB image. We propose two different approaches using this RGB-MHI model. In the first approach, we use the RGB-MHI model as a motion-based spatial attention module integrated into a 3D-CNN architecture. In the second approach, we use RGB-MHI model features directly with the features of a 3D-CNN model using a late fusion technique. We perform extensive experiments on two recently released large-scale isolated sign language datasets, namely AUTSL and BosphorusSign22k. Our experiments show that our models, which use only RGB data, can compete with the state-of-the-art models in the literature that use multi-modal data. △ Less

Submitted 18 February, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

arXiv:2109.01071 [pdf, other]

Towards disease-aware image editing of chest X-rays

Authors: Aakash Saboo, Sai Niranjan Ramachandran, Kai Dierkes, Hacer Yalim Keles

Abstract: Disease-aware image editing by means of generative adversarial networks (GANs) constitutes a promising avenue for advancing the use of AI in the healthcare sector. Here, we present a proof of concept of this idea. While GAN-based techniques have been successful in generating and manipulating natural images, their application to the medical domain, however, is still in its infancy. Working with the… ▽ More Disease-aware image editing by means of generative adversarial networks (GANs) constitutes a promising avenue for advancing the use of AI in the healthcare sector. Here, we present a proof of concept of this idea. While GAN-based techniques have been successful in generating and manipulating natural images, their application to the medical domain, however, is still in its infancy. Working with the CheXpert data set, we show that StyleGAN can be trained to generate realistic chest X-rays. Inspired by the Cyclic Reverse Generator (CRG) framework, we train an encoder that allows for faithfully inverting the generator on synthetic X-rays and provides organ-level reconstructions of real ones. Employing a guided manipulation of latent codes, we confer the medical condition of cardiomegaly (increased heart size) onto real X-rays from healthy patients. This work was presented in the Medical Imaging meets Neurips Workshop 2020, which was held as part of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020) in Vancouver, Canada △ Less

Submitted 3 September, 2021; v1 submitted 2 September, 2021; originally announced September 2021.

arXiv:2105.05066 [pdf, other]

ChaLearn LAP Large Scale Signer Independent Isolated Sign Language Recognition Challenge: Design, Results and Future Research

Authors: Ozge Mercanoglu Sincan, Julio C. S. Jacques Junior, Sergio Escalera, Hacer Yalim Keles

Abstract: The performances of Sign Language Recognition (SLR) systems have improved considerably in recent years. However, several open challenges still need to be solved to allow SLR to be useful in practice. The research in the field is in its infancy in regards to the robustness of the models to a large diversity of signs and signers, and to fairness of the models to performers from different demographic… ▽ More The performances of Sign Language Recognition (SLR) systems have improved considerably in recent years. However, several open challenges still need to be solved to allow SLR to be useful in practice. The research in the field is in its infancy in regards to the robustness of the models to a large diversity of signs and signers, and to fairness of the models to performers from different demographics. This work summarises the ChaLearn LAP Large Scale Signer Independent Isolated SLR Challenge, organised at CVPR 2021 with the goal of overcoming some of the aforementioned challenges. We analyse and discuss the challenge design, top winning solutions and suggestions for future research. The challenge attracted 132 participants in the RGB track and 59 in the RGB+Depth track, receiving more than 1.5K submissions in total. Participants were evaluated using a new large-scale multi-modal Turkish Sign Language (AUTSL) dataset, consisting of 226 sign labels and 36,302 isolated sign video samples performed by 43 different signers. Winning teams achieved more than 96% recognition rate, and their approaches benefited from pose/hand/face estimation, transfer learning, external data, fusion/ensemble of modalities and different strategies to model spatio-temporal information. However, methods still fail to distinguish among very similar signs, in particular those sharing similar hand trajectories. △ Less

Submitted 11 May, 2021; originally announced May 2021.

Comments: Preprint of the accepted paper at ChaLearn Looking at People Sign Language Recognition in the Wild Workshop at CVPR 2021

arXiv:2103.15463 [pdf, other]

A Hierarchical Approach to Remote Sensing Scene Classification

Authors: Ozlem Sen, Hacer Yalim Keles

Abstract: Remote sensing scene classification deals with the problem of classifying land use/cover of a region from images. To predict the development and socioeconomic structures of cities, the status of land use in regions is tracked by the national map** agencies of countries. Many of these agencies use land-use types that are arranged in multiple levels. In this paper, we examined the efficiency of a… ▽ More Remote sensing scene classification deals with the problem of classifying land use/cover of a region from images. To predict the development and socioeconomic structures of cities, the status of land use in regions is tracked by the national map** agencies of countries. Many of these agencies use land-use types that are arranged in multiple levels. In this paper, we examined the efficiency of a hierarchically designed Convolutional Neural Network (CNN) based framework that is suitable for such arrangements. We use the NWPU-RESISC45 dataset for our experiments and arranged this data set in a two-level nested hierarchy. Each node in the designed hierarchy is trained using DenseNet-121 architectures. We provide detailed empirical analysis to compare the performances of this hierarchical scheme and its non-hierarchical counterpart, together with the individual model performances. We also evaluated the performance of the hierarchical structure statistically to validate the presented empirical results. The results of our experiments show that although individual classifiers for different sub-categories in the hierarchical scheme perform considerably well, the accumulation of the classification errors in the cascaded structure prevents its classification performance from exceeding that of the non-hierarchical deep model △ Less

Submitted 24 January, 2022; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: This paper is the preprint of the accepted manuscript in PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

arXiv:2101.07036 [pdf, other]

Iterative Facial Image Inpainting Based on an Encoder-Generator Architecture

Authors: Yahya Dogan, Hacer Yalim Keles

Abstract: Facial image inpainting is a challenging problem as it requires generating new pixels that include semantic information for masked key components in a face, e.g., eyes and nose. Recently, remarkable methods have been proposed in this field. Most of these approaches use encoder-decoder architectures and have different limitations such as allowing unique results for a given image and a particular ma… ▽ More Facial image inpainting is a challenging problem as it requires generating new pixels that include semantic information for masked key components in a face, e.g., eyes and nose. Recently, remarkable methods have been proposed in this field. Most of these approaches use encoder-decoder architectures and have different limitations such as allowing unique results for a given image and a particular mask. Alternatively, some optimization-based approaches generate promising results using different masks with generator networks. However, these approaches are computationally more expensive. In this paper, we propose an efficient solution to the facial image inpainting problem using the Cyclic Reverse Generator (CRG) architecture, which provides an encoder-generator model. We use the encoder to embed a given image to the generator space and incrementally inpaint the masked regions until a plausible image is generated; we trained a discriminator model to assess the quality of the generated images during the iterations and determine the convergence. After the generation process, for the post-processing, we utilize a Unet model that we trained specifically for this task to remedy the artifacts close to the mask boundaries. We empirically observed that only a few iterations are sufficient to generate realistic images with the proposed model. Since the models are not trained for particular mask types, our method allows applying sketch-based inpaintings, using a variety of mask types, and producing multiple and diverse results. We compared our method with the state-of-the-art models both quantitatively and qualitatively, and observed that our method can compete with the other models in all mask types; it is particularly better in images where larger masks are utilized. Our code, dataset and models are available at: https://github.com/yahyadogan72/iterative facial image inpainting. △ Less

Submitted 13 February, 2022; v1 submitted 18 January, 2021; originally announced January 2021.

Comments: This paper is the preprint of the accepted manuscript in Neural Computing and Applications Journal

arXiv:2008.00932 [pdf, other]

doi 10.1109/ACCESS.2020.3028072

AUTSL: A Large Scale Multi-modal Turkish Sign Language Dataset and Baseline Methods

Authors: Ozge Mercanoglu Sincan, Hacer Yalim Keles

Abstract: Sign language recognition is a challenging problem where signs are identified by simultaneous local and global articulations of multiple sources, i.e. hand shape and orientation, hand movements, body posture, and facial expressions. Solving this problem computationally for a large vocabulary of signs in real life settings is still a challenge, even with the state-of-the-art models. In this study,… ▽ More Sign language recognition is a challenging problem where signs are identified by simultaneous local and global articulations of multiple sources, i.e. hand shape and orientation, hand movements, body posture, and facial expressions. Solving this problem computationally for a large vocabulary of signs in real life settings is still a challenge, even with the state-of-the-art models. In this study, we present a new largescale multi-modal Turkish Sign Language dataset (AUTSL) with a benchmark and provide baseline models for performance evaluations. Our dataset consists of 226 signs performed by 43 different signers and 38,336 isolated sign video samples in total. Samples contain a wide variety of backgrounds recorded in indoor and outdoor environments. Moreover, spatial positions and the postures of signers also vary in the recordings. Each sample is recorded with Microsoft Kinect v2 and contains RGB, depth, and skeleton modalities. We prepared benchmark training and test sets for user independent assessments of the models. We trained several deep learning based models and provide empirical evaluations using the benchmark; we used CNNs to extract features, unidirectional and bidirectional LSTM models to characterize temporal information. We also incorporated feature pooling modules and temporal attention to our models to improve the performances. We evaluated our baseline models on AUTSL and Montalbano datasets. Our models achieved competitive results with the state-of-the-art methods on Montalbano dataset, i.e. 96.11% accuracy. In AUTSL random train-test splits, our models performed up to 95.95% accuracy. In the proposed user-independent benchmark dataset our best baseline model achieved 62.02% accuracy. The gaps in the performances of the same baseline models show the challenges inherent in our benchmark dataset. AUTSL benchmark dataset is publicly available at https://cvml.ankara.edu.tr. △ Less

Submitted 19 October, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: Preprint of the accepted paper at IEEE Access Journal. The revised version contains empirical results with Montalbano dataset, in addition to AUTSL. The abstract is revised accordingly

Journal ref: IEEE Access (2020), vol. 8, pp. 181340-181355

arXiv:2006.11183 [pdf, other]

doi 10.1007/s11042-021-10593-w

Evaluation Of Hidden Markov Models Using Deep CNN Features In Isolated Sign Recognition

Authors: Anil Osman Tur, Hacer Yalim Keles

Abstract: Isolated sign recognition from video streams is a challenging problem due to the multi-modal nature of the signs, where both local and global hand features and face gestures needs to be attended simultaneously. This problem has recently been studied widely using deep Convolutional Neural Network (CNN) based features and Long Short-Term Memory (LSTM) based deep sequence models. However, the current… ▽ More Isolated sign recognition from video streams is a challenging problem due to the multi-modal nature of the signs, where both local and global hand features and face gestures needs to be attended simultaneously. This problem has recently been studied widely using deep Convolutional Neural Network (CNN) based features and Long Short-Term Memory (LSTM) based deep sequence models. However, the current literature is lack of providing empirical analysis using Hidden Markov Models (HMMs) with deep features. In this study, we provide a framework that is composed of three modules to solve isolated sign recognition problem using different sequence models. The dimensions of deep features are usually too large to work with HMM models. To solve this problem, we propose two alternative CNN based architectures as the second module in our framework, to reduce deep feature dimensions effectively. After extensive experiments, we show that using pretrained Resnet50 features and one of our CNN based dimension reduction models, HMMs can classify isolated signs with 90.15% accuracy in Montalbano dataset using RGB and Skeletal data. This performance is comparable with the current LSTM based models. HMMs have fewer parameters and can be trained and run on commodity computers fast, without requiring GPUs. Therefore, our analysis with deep features show that HMMs could also be utilized as well as deep sequence models in challenging isolated sign recognition problem. △ Less

Submitted 10 May, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

Comments: This paper is the preprint of the accepted manuscript at Multimedia Tools and Applications Journal. It contains 16 pages, 5 figure, 8 tables

arXiv:1907.01841 [pdf, other]

doi 10.1016/j.neucom.2020.03.071

Semi-supervised Image Attribute Editing using Generative Adversarial Networks

Authors: Yahya Dogan, Hacer Yalim Keles

Abstract: Image attribute editing is a challenging problem that has been recently studied by many researchers using generative networks. The challenge is in the manipulation of selected attributes of images while preserving the other details. The method to achieve this goal is to find an accurate latent vector representation of an image and a direction corresponding to the attribute. Almost all the works in… ▽ More Image attribute editing is a challenging problem that has been recently studied by many researchers using generative networks. The challenge is in the manipulation of selected attributes of images while preserving the other details. The method to achieve this goal is to find an accurate latent vector representation of an image and a direction corresponding to the attribute. Almost all the works in the literature use labeled datasets in a supervised setting for this purpose. In this study, we introduce an architecture called Cyclic Reverse Generator (CRG), which allows learning the inverse function of the generator accurately via an encoder in an unsupervised setting by utilizing cyclic cost minimization. Attribute editing is then performed using the CRG models for finding desired attribute representations in the latent space. In this work, we use two arbitrary reference images, with and without desired attributes, to compute an attribute direction for editing. We show that the proposed approach performs better in terms of image reconstruction compared to the existing end-to-end generative models both quantitatively and qualitatively. We demonstrate state-of-the-art results on both real images and generated images in CelebA dataset. △ Less

Submitted 13 April, 2020; v1 submitted 3 July, 2019; originally announced July 2019.

Comments: This paper is the preprint of the accepted manuscript in Neurocomputing Journal. To visualize the Figures in the manuscript in high quality, please check the version at this URL: https://github.com/yahyadogan72/CRG

arXiv:1808.01477 [pdf, other]

doi 10.1007/s10044-019-00845-9

Learning Multi-scale Features for Foreground Segmentation

Authors: Long Ang Lim, Hacer Yalim Keles

Abstract: Foreground segmentation algorithms aim segmenting moving objects from the background in a robust way under various challenging scenarios. Encoder-decoder type deep neural networks that are used in this domain recently perform impressive segmentation results. In this work, we propose a novel robust encoder-decoder structure neural network that can be trained end-to-end using only a few training exa… ▽ More Foreground segmentation algorithms aim segmenting moving objects from the background in a robust way under various challenging scenarios. Encoder-decoder type deep neural networks that are used in this domain recently perform impressive segmentation results. In this work, we propose a novel robust encoder-decoder structure neural network that can be trained end-to-end using only a few training examples. The proposed method extends the Feature Pooling Module (FPM) of FgSegNet by introducing features fusions inside this module, which is capable of extracting multi-scale features within images; resulting in a robust feature pooling against camera motion, which can alleviate the need of multi-scale inputs to the network. Our method outperforms all existing state-of-the-art methods in CDnet2014 dataset by an average overall F-Measure of 0.9847. We also evaluate the effectiveness of our method on SBI2015 and UCSD Background Subtraction datasets. The source code of the proposed method is made available at https://github.com/lim-anggun/FgSegNet_v2 . △ Less

Submitted 4 August, 2018; originally announced August 2018.

arXiv:1802.04524 [pdf]

On a generalized theorem of de Bruijn and Erdös in d-dimensional Fuzzy Linear Spaces

Authors: H. Keleş

Abstract: In this study we follow a new framework for the theory that offers us, other than traditional, a new angle to observe and investigate some relations between finite sets, F-lattice L and their elements. The theory is based on the Fuzzy Linear Spaces (FLS) S=(N,D). In this case, to operate on these spaces the necessary preliminaries, concepts and operations in lattices relative to FLS are introduced… ▽ More In this study we follow a new framework for the theory that offers us, other than traditional, a new angle to observe and investigate some relations between finite sets, F-lattice L and their elements. The theory is based on the Fuzzy Linear Spaces (FLS) S=(N,D). In this case, to operate on these spaces the necessary preliminaries, concepts and operations in lattices relative to FLS are introduced. Some definitions, such that k-fuzzy point, k-fuzzy line are given. Then we correspond these definitions to the definitions in usually linear spaces. We investigate some combinatorics properties of FLS. In some examples in the case where ILI=3*. We see some differences. In general, taking an ordered lattice Ln={0,a1,a2,...,an,1} we observe how some combinatorics formulas and properties are changed. In FLS the dimension concept is a set. We produce some general formulas by using some trivial examples. Furthermore, we generalize de Bruijn-Erdös Theorem in [2]. △ Less

Submitted 13 February, 2018; originally announced February 2018.

Comments: 5 pages

Report number: JSAER2018 MSC Class: 05E18B35; 37F20; 11B30

Journal ref: Journal of Scientific and Engineering Research, 2018, 5(1):102-105

arXiv:1801.02225 [pdf, other]

doi 10.1016/j.patrec.2018.08.002

Foreground Segmentation Using a Triplet Convolutional Neural Network for Multiscale Feature Encoding

Authors: Long Ang Lim, Hacer Yalim Keles

Abstract: A common approach for moving objects segmentation in a scene is to perform a background subtraction. Several methods have been proposed in this domain. However, they lack the ability of handling various difficult scenarios such as illumination changes, background or camera motion, camouflage effect, shadow etc. To address these issues, we propose a robust and flexible encoder-decoder type neural n… ▽ More A common approach for moving objects segmentation in a scene is to perform a background subtraction. Several methods have been proposed in this domain. However, they lack the ability of handling various difficult scenarios such as illumination changes, background or camera motion, camouflage effect, shadow etc. To address these issues, we propose a robust and flexible encoder-decoder type neural network based approach. We adapt a pre-trained convolutional network, i.e. VGG-16 Net, under a triplet framework in the encoder part to embed an image in multiple scales into the feature space and use a transposed convolutional network in the decoder part to learn a map** from feature space to image space. We train this network end-to-end by using only a few training samples. Our network takes an RGB image in three different scales and produces a foreground segmentation probability mask for the corresponding image. In order to evaluate our model, we entered the Change Detection 2014 Challenge (changedetection.net) and our method outperformed all the existing state-of-the-art methods by an average F-Measure of 0.9770. Our source code will be made publicly available at https://github.com/lim-anggun/FgSegNet. △ Less

Submitted 7 January, 2018; originally announced January 2018.

Comments: This paper is under consideration at Pattern Recognition Letters

Showing 1–17 of 17 results for author: Keles, H