Search | arXiv e-print repository

EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning

Authors: Yue Jiang, Zixin Guo, Hamed Rezazadegan Tavakoli, Luis A. Leiva, Antti Oulasvirta

Abstract: From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpa… ▽ More From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which leverages a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that controls gaze locations. Our model has the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and duration, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model. Our software and models will be publicly available. △ Less

Submitted 20 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2402.05202 [pdf, other]

UEyes: An Eye-Tracking Dataset across User Interface Types

Authors: Yue Jiang, Luis A. Leiva, Paul R. B. Houssel, Hamed R. Tavakoli, Julia Kylmälä, Antti Oulasvirta

Abstract: Different types of user interfaces differ significantly in the number of elements and how they are displayed. To examine how such differences affect the way users look at UIs, we collected and analyzed a large eye-tracking-based dataset, UEyes (62 participants, 1,980 UI screenshots, near 20K eye movement sequences), covering four major UI types: webpage, desktop UI, mobile UI, and poster. Furtherm… ▽ More Different types of user interfaces differ significantly in the number of elements and how they are displayed. To examine how such differences affect the way users look at UIs, we collected and analyzed a large eye-tracking-based dataset, UEyes (62 participants, 1,980 UI screenshots, near 20K eye movement sequences), covering four major UI types: webpage, desktop UI, mobile UI, and poster. Furthermore, we analyze and discuss the differences in important factors, such as color, location, and gaze direction across UI types, individual viewing strategies and potential future directions. This position paper is a derivative of our recent paper with a particular focus on the UEyes dataset. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: Accepted as a CHI2023 workshop paper

arXiv:2401.12729 [pdf, other]

Enhancing Object Detection Performance for Small Objects through Synthetic Data Generation and Proportional Class-Balancing Technique: A Comparative Study in Industrial Scenarios

Authors: Jibinraj Antony, Vinit Hegiste, Ali Nazeri, Hooman Tavakoli, Snehal Walunj, Christiane Plociennik, Martin Ruskowski

Abstract: Object Detection (OD) has proven to be a significant computer vision method in extracting localized class information and has multiple applications in the industry. Although many of the state-of-the-art (SOTA) OD models perform well on medium and large sized objects, they seem to under perform on small objects. In most of the industrial use cases, it is difficult to collect and annotate data for s… ▽ More Object Detection (OD) has proven to be a significant computer vision method in extracting localized class information and has multiple applications in the industry. Although many of the state-of-the-art (SOTA) OD models perform well on medium and large sized objects, they seem to under perform on small objects. In most of the industrial use cases, it is difficult to collect and annotate data for small objects, as it is time-consuming and prone to human errors. Additionally, those datasets are likely to be unbalanced and often result in an inefficient model convergence. To tackle this challenge, this study presents a novel approach that injects additional data points to improve the performance of the OD models. Using synthetic data generation, the difficulties in data collection and annotations for small object data points can be minimized and to create a dataset with balanced distribution. This paper discusses the effects of a simple proportional class-balancing technique, to enable better anchor matching of the OD models. A comparison was carried out on the performances of the SOTA OD models: YOLOv5, YOLOv7 and SSD, for combinations of real and synthetic datasets within an industrial use case. △ Less

Submitted 29 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: Accepted and presented in conference ESAIM23 1st European Symposium on Artificial Intelligence in Manufacturing

arXiv:2401.10761 [pdf, other]

NN-VVC: Versatile Video Coding boosted by self-supervisedly learned image coding for machines

Authors: Jukka I. Ahonen, Nam Le, Honglei Zhang, Antti Hallapuro, Francesco Cricri, Hamed Rezazadegan Tavakoli, Miska M. Hannuksela, Esa Rahtu

Abstract: The recent progress in artificial intelligence has led to an ever-increasing usage of images and videos by machine analysis algorithms, mainly neural networks. Nonetheless, compression, storage and transmission of media have traditionally been designed considering human beings as the viewers of the content. Recent research on image and video coding for machine analysis has progressed mainly in two… ▽ More The recent progress in artificial intelligence has led to an ever-increasing usage of images and videos by machine analysis algorithms, mainly neural networks. Nonetheless, compression, storage and transmission of media have traditionally been designed considering human beings as the viewers of the content. Recent research on image and video coding for machine analysis has progressed mainly in two almost orthogonal directions. The first is represented by end-to-end (E2E) learned codecs which, while offering high performance on image coding, are not yet on par with state-of-the-art conventional video codecs and lack interoperability. The second direction considers using the Versatile Video Coding (VVC) standard or any other conventional video codec (CVC) together with pre- and post-processing operations targeting machine analysis. While the CVC-based methods benefit from interoperability and broad hardware and software support, the machine task performance is often lower than the desired level, particularly in low bitrates. This paper proposes a hybrid codec for machines called NN-VVC, which combines the advantages of an E2E-learned image codec and a CVC to achieve high performance in both image and video coding for machines. Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bjøntegaard Delta rate reduction over VVC for image and video data, respectively, when evaluated on multiple different datasets and machine vision tasks. To the best of our knowledge, this is the first research paper showing a hybrid video codec that outperforms VVC on multiple datasets and multiple machine vision tasks. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: ISM 2023 Best paper award winner version

arXiv:2401.10732 [pdf, other]

doi 10.1109/ICIP46576.2022.9897916

Bridging the gap between image coding for machines and humans

Authors: Nam Le, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed Rezazadegan Tavakoli, Emre Aksu, Miska M. Hannuksela, Esa Rahtu

Abstract: Image coding for machines (ICM) aims at reducing the bitrate required to represent an image while minimizing the drop in machine vision analysis accuracy. In many use cases, such as surveillance, it is also important that the visual quality is not drastically deteriorated by the compression process. Recent works on using neural network (NN) based ICM codecs have shown significant coding gains agai… ▽ More Image coding for machines (ICM) aims at reducing the bitrate required to represent an image while minimizing the drop in machine vision analysis accuracy. In many use cases, such as surveillance, it is also important that the visual quality is not drastically deteriorated by the compression process. Recent works on using neural network (NN) based ICM codecs have shown significant coding gains against traditional methods; however, the decompressed images, especially at low bitrates, often contain checkerboard artifacts. We propose an effective decoder finetuning scheme based on adversarial training to significantly enhance the visual quality of ICM codecs, while preserving the machine analysis accuracy, without adding extra bitcost or parameters at the inference phase. The results show complete removal of the checkerboard artifacts at the negligible cost of -1.6% relative change in task performance score. In the cases where some amount of artifacts is tolerable, such as when machine consumption is the primary target, this technique can enhance both pixel-fidelity and feature-fidelity scores without losing task performance. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Journal ref: IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 3411-3415

arXiv:2210.04112 [pdf, other]

Leveraging progressive model and overfitting for efficient learned image compression

Authors: Honglei Zhang, Francesco Cricri, Hamed Rezazadegan Tavakoli, Emre Aksu, Miska M. Hannuksela

Abstract: Deep learning is overwhelmingly dominant in the field of computer vision and image/video processing for the last decade. However, for image and video compression, it lags behind the traditional techniques based on discrete cosine transform (DCT) and linear filters. Built on top of an autoencoder architecture, learned image compression (LIC) systems have drawn enormous attention in recent years. Ne… ▽ More Deep learning is overwhelmingly dominant in the field of computer vision and image/video processing for the last decade. However, for image and video compression, it lags behind the traditional techniques based on discrete cosine transform (DCT) and linear filters. Built on top of an autoencoder architecture, learned image compression (LIC) systems have drawn enormous attention in recent years. Nevertheless, the proposed LIC systems are still inferior to the state-of-the-art traditional techniques, for example, the Versatile Video Coding (VVC/H.266) standard, due to either their compression performance or decoding complexity. Although claimed to outperform the VVC/H.266 on a limited bit rate range, some proposed LIC systems take over 40 seconds to decode a 2K image on a GPU system. In this paper, we introduce a powerful and flexible LIC framework with multi-scale progressive (MSP) probability model and latent representation overfitting (LOF) technique. With different predefined profiles, the proposed framework can achieve various balance points between compression efficiency and computational complexity. Experiments show that the proposed framework achieves 2.5%, 1.0%, and 1.3% Bjontegaard delta bit rate (BD-rate) reduction over the VVC/H.266 standard on three benchmark datasets on a wide bit rate range. More importantly, the decoding complexity is reduced from O(n) to O(1) compared to many other LIC systems, resulting in over 20 times speedup when decoding 2K images. △ Less

Submitted 8 October, 2022; originally announced October 2022.

arXiv:2203.10794 [pdf, other]

Human-Centric Artificial Intelligence Architecture for Industry 5.0 Applications

Authors: Jože M. Rožanec, Inna Novalija, Patrik Zajec, Klemen Kenda, Hooman Tavakoli, Sungho Suh, Entso Veliou, Dimitrios Papamartzivanos, Thanassis Giannetsos, Sofia Anna Menesidou, Ruben Alonso, Nino Cauli, Antonello Meloni, Diego Reforgiato Recupero, Dimosthenis Kyriazis, Georgios Sofianidis, Spyros Theodoropoulos, Blaž Fortuna, Dunja Mladenić, John Soldatos

Abstract: Human-centricity is the core value behind the evolution of manufacturing towards Industry 5.0. Nevertheless, there is a lack of architecture that considers safety, trustworthiness, and human-centricity at its core. Therefore, we propose an architecture that integrates Artificial Intelligence (Active Learning, Forecasting, Explainable Artificial Intelligence), simulated reality, decision-making, an… ▽ More Human-centricity is the core value behind the evolution of manufacturing towards Industry 5.0. Nevertheless, there is a lack of architecture that considers safety, trustworthiness, and human-centricity at its core. Therefore, we propose an architecture that integrates Artificial Intelligence (Active Learning, Forecasting, Explainable Artificial Intelligence), simulated reality, decision-making, and users' feedback, focusing on synergies between humans and machines. Furthermore, we align the proposed architecture with the Big Data Value Association Reference Architecture Model. Finally, we validate it on three use cases from real-world case studies. △ Less

Submitted 19 October, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

arXiv:2112.08767 [pdf, other]

Adaptation and Attention for Neural Video Coding

Authors: Nannan Zou, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu

Abstract: Neural image coding represents now the state-of-the-art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-to-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired w… ▽ More Neural image coding represents now the state-of-the-art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-to-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired with an inter-frame codec. As one architectural novelty, we propose to train the inter-frame codec model to adapt the motion estimation process based on the resolution of the input video. A second architectural novelty is a new neural block that combines concepts from split-attention based neural networks and from DenseNets. Finally, we propose to overfit a set of decoder-side multiplicative parameters at inference time. Through ablation studies and comparisons to prior art, we show the benefits of our proposed techniques in terms of coding gains. We compare our codec to VVC/H.266 and RLVC, which represent the state-of-the-art traditional and end-to-end learned codecs, respectively, and to the top performing end-to-end learned approach in 2021 CLIC competition, E2E_T_OL. Our codec clearly outperforms E2E_T_OL, and compare favorably to VVC and RLVC in some settings. △ Less

Submitted 16 December, 2021; originally announced December 2021.

arXiv:2110.08648 [pdf]

doi 10.1097/EDE.0000000000001489

Minding non-collapsibility of odds ratios when recalibrating risk prediction models

Authors: Mohsen Sadatsafavi, Hamid Tavakoli, Abdollah Safari

Abstract: In clinical prediction modeling, model updating refers to the practice of modifying a prediction model before it is used in a new setting. In the context of logistic regression for a binary outcome, one of the simplest updating methods is a fixed odds-ratio transformation of predicted risks to improve calibration-in-the-large. Previous authors have proposed equations for calculating this odds-rati… ▽ More In clinical prediction modeling, model updating refers to the practice of modifying a prediction model before it is used in a new setting. In the context of logistic regression for a binary outcome, one of the simplest updating methods is a fixed odds-ratio transformation of predicted risks to improve calibration-in-the-large. Previous authors have proposed equations for calculating this odds-ratio based on the discrepancy between the prevalence in the original and the new population, or between the average of predicted and observed risks. We show that this method fails to consider the non-collapsibility of odds-ratio. Consequently, it under-corrects predicted risks, especially when predicted risks are more dispersed (i.e., for models with good discrimination). We suggest an approximate equation for recovering the conditional odds-ratio from the mean and variance of predicted risks. Brief simulations and a case study show that this approach reduces under-correction, sometimes substantially. R code for implementation is provided. △ Less

Submitted 10 November, 2021; v1 submitted 16 October, 2021; originally announced October 2021.

Comments: 12 Pages, 1 Figure, 1567 words

arXiv:2108.10551 [pdf, ps, other]

Lossless Image Compression Using a Multi-Scale Progressive Statistical Model

Authors: Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Nannan Zou, Emre Aksu, Miska M. Hannuksela

Abstract: Lossless image compression is an important technique for image storage and transmission when information loss is not allowed. With the fast development of deep learning techniques, deep neural networks have been used in this field to achieve a higher compression rate. Methods based on pixel-wise autoregressive statistical models have shown good performance. However, the sequential processing way p… ▽ More Lossless image compression is an important technique for image storage and transmission when information loss is not allowed. With the fast development of deep learning techniques, deep neural networks have been used in this field to achieve a higher compression rate. Methods based on pixel-wise autoregressive statistical models have shown good performance. However, the sequential processing way prevents these methods to be used in practice. Recently, multi-scale autoregressive models have been proposed to address this limitation. Multi-scale approaches can use parallel computing systems efficiently and build practical systems. Nevertheless, these approaches sacrifice compression performance in exchange for speed. In this paper, we propose a multi-scale progressive statistical model that takes advantage of the pixel-wise approach and the multi-scale approach. We developed a flexible mechanism where the processing order of the pixels can be adjusted easily. Our proposed method outperforms the state-of-the-art lossless image compression methods on two large benchmark datasets by a significant margin without degrading the inference speed dramatically. △ Less

Submitted 24 August, 2021; originally announced August 2021.

Comments: Accepted ACCV 2020

arXiv:2108.09992 [pdf, other]

doi 10.1109/ICME51207.2021.9428224

Learned Image Coding for Machines: A Content-Adaptive Approach

Authors: Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Hamed Rezazadegan Tavakoli, Esa Rahtu

Abstract: Today, according to the Cisco Annual Internet Report (2018-2023), the fastest-growing category of Internet traffic is machine-to-machine communication. In particular, machine-to-machine communication of images and videos represents a new challenge and opens up new perspectives in the context of data compression. One possible solution approach consists of adapting current human-targeted image and v… ▽ More Today, according to the Cisco Annual Internet Report (2018-2023), the fastest-growing category of Internet traffic is machine-to-machine communication. In particular, machine-to-machine communication of images and videos represents a new challenge and opens up new perspectives in the context of data compression. One possible solution approach consists of adapting current human-targeted image and video coding standards to the use case of machine consumption. Another approach consists of develo** completely new compression paradigms and architectures for machine-to-machine communications. In this paper, we focus on image compression and present an inference-time content-adaptive finetuning scheme that optimizes the latent representation of an end-to-end learned image codec, aimed at improving the compression efficiency for machine-consumption. The conducted experiments show that our online finetuning brings an average bitrate saving (BD-rate) of -3.66% with respect to our pretrained image codec. In particular, at low bitrate points, our proposed method results in a significant bitrate saving of -9.85%. Overall, our pretrained-and-then-finetuned system achieves -30.54% BD-rate over the state-of-the-art image/video codec Versatile Video Coding (VVC). △ Less

Submitted 13 October, 2021; v1 submitted 23 August, 2021; originally announced August 2021.

Comments: Fig 4 correction

Journal ref: 2021 IEEE International Conference on Multimedia and Expo (ICME), 2021, pp. 1-6

arXiv:2106.06403 [pdf, other]

Small Object Detection for Near Real-Time Egocentric Perception in a Manual Assembly Scenario

Authors: Hooman Tavakoli, Snehal Walunj, Parsha Pahlevannejad, Christiane Plociennik, Martin Ruskowski

Abstract: Detecting small objects in video streams of head-worn augmented reality devices in near real-time is a huge challenge: training data is typically scarce, the input video stream can be of limited quality, and small objects are notoriously hard to detect. In industrial scenarios, however, it is often possible to leverage contextual knowledge for the detection of small objects. Furthermore, CAD data… ▽ More Detecting small objects in video streams of head-worn augmented reality devices in near real-time is a huge challenge: training data is typically scarce, the input video stream can be of limited quality, and small objects are notoriously hard to detect. In industrial scenarios, however, it is often possible to leverage contextual knowledge for the detection of small objects. Furthermore, CAD data of objects are typically available and can be used to generate synthetic training data. We describe a near real-time small object detection pipeline for egocentric perception in a manual assembly scenario: We generate a training data set based on CAD data and realistic backgrounds in Unity. We then train a YOLOv4 model for a two-stage detection process: First, the context is recognized, then the small object of interest is detected. We evaluate our pipeline on the augmented reality device Microsoft Hololens 2. △ Less

Submitted 11 June, 2021; originally announced June 2021.

Comments: Accepted for presentation at EPIC@CVPR2021 workshop

arXiv:2103.11904 [pdf]

New Capacity Upper Bounds For Binary Deletion Channel

Authors: Hassan Tavakoli

Abstract: This paper considers a binary channel with deletions. We derive two close form upper bound on the capacity of binary deletion channel. The first upper bound is based on computing the capacity of an auxiliary channel and we show how the capacity of auxiliary channel is the upper bound of the binary deletion channel. Our main idea for the second bound is based on computing the mutual information bet… ▽ More This paper considers a binary channel with deletions. We derive two close form upper bound on the capacity of binary deletion channel. The first upper bound is based on computing the capacity of an auxiliary channel and we show how the capacity of auxiliary channel is the upper bound of the binary deletion channel. Our main idea for the second bound is based on computing the mutual information between the sent bits and the received bits in binary deletion channel. We approximate the exact mutual information and we give a close form expression. All bounds utilize first-order Markov process for the channel input. The second proposed upper bound improves the best upper bound [6,11] up to 0.1. △ Less

Submitted 7 February, 2021; originally announced March 2021.

Comments: 6 pages

arXiv:2101.09176 [pdf, other]

doi 10.1145/3379503.3403557

Understanding Visual Saliency in Mobile User Interfaces

Authors: Luis A. Leiva, Yunfei Xue, Avya Bansal, Hamed R. Tavakoli, Tuğçe Köroğlu, Niraj R. Dayama, Antti Oulasvirta

Abstract: For graphical user interface (UI) design, it is important to understand what attracts visual attention. While previous work on saliency has focused on desktop and web-based UIs, mobile app UIs differ from these in several respects. We present findings from a controlled study with 30 participants and 193 mobile UIs. The results speak to a role of expectations in guiding where users look at. Strong… ▽ More For graphical user interface (UI) design, it is important to understand what attracts visual attention. While previous work on saliency has focused on desktop and web-based UIs, mobile app UIs differ from these in several respects. We present findings from a controlled study with 30 participants and 193 mobile UIs. The results speak to a role of expectations in guiding where users look at. Strong bias toward the top-left corner of the display, text, and images was evident, while bottom-up features such as color or size affected saliency less. Classic, parameter-free saliency models showed a weak fit with the data, and data-driven models improved significantly when trained specifically on this dataset (e.g., NSS rose from 0.66 to 0.84). We also release the first annotated dataset for investigating visual saliency in mobile UIs. △ Less

Submitted 22 January, 2021; originally announced January 2021.

Journal ref: Proceedings of the 22nd Intl. Conf. on Human-Computer Interaction with Mobile Devices and Services (MobileHCI), 2020

arXiv:2008.13227 [pdf, other]

A Compact Deep Architecture for Real-time Saliency Prediction

Authors: Saman Zabihi, Hamed Rezazadegan Tavakoli, Ali Borji

Abstract: Saliency computation models aim to imitate the attention mechanism in the human visual system. The application of deep neural networks for saliency prediction has led to a drastic improvement over the last few years. However, deep models have a high number of parameters which makes them less suitable for real-time applications. Here we propose a compact yet fast model for real-time saliency pred… ▽ More Saliency computation models aim to imitate the attention mechanism in the human visual system. The application of deep neural networks for saliency prediction has led to a drastic improvement over the last few years. However, deep models have a high number of parameters which makes them less suitable for real-time applications. Here we propose a compact yet fast model for real-time saliency prediction. Our proposed model consists of a modified U-net architecture, a novel fully connected layer, and central difference convolutional layers. The modified U-Net architecture promotes compactness and efficiency. The novel fully-connected layer facilitates the implicit capturing of the location-dependent information. Using the central difference convolutional layers at different scales enables capturing more robust and biologically motivated features. We compare our model with state of the art saliency models using traditional saliency scores as well as our newly devised scheme. Experimental results over four challenging saliency benchmark datasets demonstrate the effectiveness of our approach in striking a balance between accuracy and speed. Our model can be run in real-time which makes it appealing for edge devices and video processing. △ Less

Submitted 30 August, 2020; originally announced August 2020.

arXiv:2008.01929 [pdf]

Role of coil-crucible geometry in Czochralski bismuth germanate (BGO) crystal growth process: a thermal stress analysis

Authors: Hossein Khodamoradi, Mohammad Hossein Tavakoli

Abstract: A numerical model of 2D finite element in a steady-state level was developed for the electromagnetic field, heat distribution, and thermal stress expansion in an oxide Czochralski crystal growth system. The extended model was employed to compare the impact of different geometries of induction coils and crucibles in the growth process. Analysis of the results emphasizes the potential of the modifie… ▽ More A numerical model of 2D finite element in a steady-state level was developed for the electromagnetic field, heat distribution, and thermal stress expansion in an oxide Czochralski crystal growth system. The extended model was employed to compare the impact of different geometries of induction coils and crucibles in the growth process. Analysis of the results emphasizes the potential of the modified geometries in alteration of the electromagnetic field and heat distribution. Consequently, the optimization of crystal/melt interface shape, thermal stress accumulation, and distribution in the growing crystal can be achieved. Finally, the proposed approaches for thermal stress calculations were compared. The outcomes showed a qualitative agreement between two methods of thermoelastic stress analysis in the calculation of stress distribution in BGO crystal. △ Less

Submitted 5 August, 2020; originally announced August 2020.

arXiv:2007.16054 [pdf, other]

Learning to Learn to Compress

Authors: Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Jani Lainema, Miska Hannuksela, Emre Aksu, Esa Rahtu

Abstract: In this paper we present an end-to-end meta-learned system for image compression. Traditional machine learning based approaches to image compression train one or more neural network for generalization performance. However, at inference time, the encoder or the latent tensor output by the encoder can be optimized for each test image. This optimization can be regarded as a form of adaptation or bene… ▽ More In this paper we present an end-to-end meta-learned system for image compression. Traditional machine learning based approaches to image compression train one or more neural network for generalization performance. However, at inference time, the encoder or the latent tensor output by the encoder can be optimized for each test image. This optimization can be regarded as a form of adaptation or benevolent overfitting to the input content. In order to reduce the gap between training and inference conditions, we propose a new training paradigm for learned image compression, which is based on meta-learning. In a first phase, the neural networks are trained normally. In a second phase, the Model-Agnostic Meta-learning approach is adapted to the specific case of image compression, where the inner-loop performs latent tensor overfitting, and the outer loop updates both encoder and decoder neural networks based on the overfitting performance. Furthermore, after meta-learning, we propose to overfit and cluster the bias terms of the decoder on training image patches, so that at inference time the optimal content-specific bias terms can be selected at encoder-side. Finally, we propose a new probability model for lossless compression, which combines concepts from both multi-scale and super-resolution probability model approaches. We show the benefits of all our proposed ideas via carefully designed experiments. △ Less

Submitted 1 May, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

arXiv:2005.07471 [pdf]

Optimization of allotropic and hardness of aluminum oxide coating by PEO

Authors: Babak Ghorbanian, Mohammad Tajally, Seyed Mohammad Mousavi Khoie, Hossien Tavakoli

Abstract: One of the most important methods of producing materials and oxide coating is plasma electrolytic oxidation (PEO). Coatings made on aluminum in PEO method have two allotropes of $α$-Al$_2$O$_3$ and $γ$-Al$_2$O$_3$. Coatings containing $α$-Al$_2$O$_3$ have more hardness and wear resistance; therefore, the main goal of the present study is to optimize aluminum oxide allotrope for increasing the rati… ▽ More One of the most important methods of producing materials and oxide coating is plasma electrolytic oxidation (PEO). Coatings made on aluminum in PEO method have two allotropes of $α$-Al$_2$O$_3$ and $γ$-Al$_2$O$_3$. Coatings containing $α$-Al$_2$O$_3$ have more hardness and wear resistance; therefore, the main goal of the present study is to optimize aluminum oxide allotrope for increasing the ratio of $α$-Al$_2$O$_3$ in coating made with PEO method. Results of the present study shows that the optimum electrolyte compound for making highest phase of $α$-Al$_2$O$_3$ has 2.9 g/lit of KOH, 1.15 g/lit of Na4P2O7, and 0.34 g/lit of NaAlO2 and ratio of highest allotropic pick of $α$-Al$_2$O$_3$ to highest allotropic pick of $γ$-Al$_2$O$_3$ (in XRD test) in the optimum condition is 0.622 and hardness is 1648 Vickers. △ Less

Submitted 15 May, 2020; originally announced May 2020.

Comments: e.g.: 9 pages, 3 figures, 1 table

arXiv:2004.14231 [pdf, other]

Image Captioning through Image Transformer

Authors: Sen He, Wentong Liao, Hamed R. Tavakoli, Michael Yang, Bodo Rosenhahn, Nicolas Pugeault

Abstract: Automatic captioning of images is a task that combines the challenges of image analysis and text generation. One important aspect in captioning is the notion of attention: How to decide what to describe and in which order. Inspired by the successes in text analysis and translation, previous work have proposed the \textit{transformer} architecture for image captioning. However, the structure betwee… ▽ More Automatic captioning of images is a task that combines the challenges of image analysis and text generation. One important aspect in captioning is the notion of attention: How to decide what to describe and in which order. Inspired by the successes in text analysis and translation, previous work have proposed the \textit{transformer} architecture for image captioning. However, the structure between the \textit{semantic units} in images (usually the detected regions from object detection model) and sentences (each single word) is different. Limited work has been done to adapt the transformer's internal architecture to images. In this work, we introduce the \textbf{\textit{image transformer}}, which consists of a modified encoding transformer and an implicit decoding transformer, motivated by the relative spatial relationship between image regions. Our design widen the original transformer layer's inner architecture to adapt to the structure of images. With only regions feature as inputs, our model achieves new state-of-the-art performance on both MSCOCO offline and online testing benchmarks. △ Less

Submitted 2 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

arXiv:2004.09226 [pdf, other]

End-to-End Learning for Video Frame Compression with Self-Attention

Authors: Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu

Abstract: One of the core components of conventional (i.e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations. In this paper, we propose an end-to-end learned system for compressing video frames. Instead of relying on pixel-space motion (as with optical flow), our system learns deep embeddings of frames and encodes their difference… ▽ More One of the core components of conventional (i.e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations. In this paper, we propose an end-to-end learned system for compressing video frames. Instead of relying on pixel-space motion (as with optical flow), our system learns deep embeddings of frames and encodes their difference in latent space. At decoder-side, an attention mechanism is designed to attend to the latent space of frames to decide how different parts of the previous and current frame are combined to form the final predicted current frame. Spatially-varying channel allocation is achieved by using importance masks acting on the feature-channels. The model is trained to reduce the bitrate by minimizing a loss on importance maps and a loss on the probability output by a context model for arithmetic coding. In our experiments, we show that the proposed system achieves high compression rates and high objective visual quality as measured by MS-SSIM and PSNR. Furthermore, we provide ablation studies where we highlight the contribution of different components. △ Less

Submitted 20 April, 2020; originally announced April 2020.

arXiv:1907.02336 [pdf, other]

Deep Saliency Models : The Quest For The Loss Function

Authors: Alexandre Bruckert, Hamed R. Tavakoli, Zhi Liu, Marc Christie, Olivier Le Meur

Abstract: Recent advances in deep learning have pushed the performances of visual saliency models way further than it has ever been. Numerous models in the literature present new ways to design neural networks, to arrange gaze pattern data, or to extract as much high and low-level image features as possible in order to create the best saliency representation. However, one key part of a typical deep learning… ▽ More Recent advances in deep learning have pushed the performances of visual saliency models way further than it has ever been. Numerous models in the literature present new ways to design neural networks, to arrange gaze pattern data, or to extract as much high and low-level image features as possible in order to create the best saliency representation. However, one key part of a typical deep learning model is often neglected: the choice of the loss function. In this work, we explore some of the most popular loss functions that are used in deep saliency models. We demonstrate that on a fixed network architecture, modifying the loss function can significantly improve (or depreciate) the results, hence emphasizing the importance of the choice of the loss function when designing a model. We also introduce new loss functions that have never been used for saliency prediction to our knowledge. And finally, we show that a linear combination of several well-chosen loss functions leads to significant improvements in performances on different datasets as well as on a different network architecture, hence demonstrating the robustness of a combined metric. △ Less

Submitted 4 July, 2019; originally announced July 2019.

Comments: 10 pages, 4 figures

arXiv:1905.10693 [pdf, other]

DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction

Authors: Hamed R. Tavakoli, Ali Borji, Esa Rahtu, Juho Kannala

Abstract: This paper studies audio-visual deep saliency prediction. It introduces a conceptually simple and effective Deep Audio-Visual Embedding for dynamic saliency prediction dubbed ``DAVE" in conjunction with our efforts towards building an Audio-Visual Eye-tracking corpus named ``AVE". Despite existing a strong relation between auditory and visual cues for guiding gaze during perception, video saliency… ▽ More This paper studies audio-visual deep saliency prediction. It introduces a conceptually simple and effective Deep Audio-Visual Embedding for dynamic saliency prediction dubbed ``DAVE" in conjunction with our efforts towards building an Audio-Visual Eye-tracking corpus named ``AVE". Despite existing a strong relation between auditory and visual cues for guiding gaze during perception, video saliency models only consider visual cues and neglect the auditory information that is ubiquitous in dynamic scenes. Here, we investigate the applicability of audio cues in conjunction with visual ones in predicting saliency maps using deep neural networks. To this end, the proposed model is intentionally designed to be simple. Two baseline models are developed on the same architecture which consists of an encoder-decoder. The encoder projects the input into a feature space followed by a decoder that infers saliency. We conduct an extensive analysis on different modalities and various aspects of multi-model dynamic saliency prediction. Our results suggest that (1) audio is a strong contributing cue for saliency prediction, (2) salient visible sound-source is the natural cause of the superiority of our Audio-Visual model, (3) richer feature representations for the input space leads to more powerful predictions even in absence of more sophisticated saliency decoders, and (4) Audio-Visual model improves over 53.54\% of the frames predicted by the best Visual model (our baseline). Our endeavour demonstrates that audio is an important cue that boosts dynamic video saliency prediction and helps models to approach human performance. The code is available at https://github.com/hrtavakoli/DAVE △ Less

Submitted 7 January, 2020; v1 submitted 25 May, 2019; originally announced May 2019.

arXiv:1904.12152 [pdf, other]

PeyeDF: an Eye-Tracking Application for Reading and Self-Indexing Research

Authors: Marco Filetti, Hamed R. Tavakoli, Niklas Ravaja, Giulio Jacucci

Abstract: PeyeDF is a Portable Document Format (PDF) reader with eye tracking support, available as free and open source software. It is especially useful to researchers investigating reading and learning phenomena, as it integrates PDF reading-related behavioural data with gaze-related data. It is suitable for short and long-term research and supports multiple eye tracking systems. We utilised it to conduc… ▽ More PeyeDF is a Portable Document Format (PDF) reader with eye tracking support, available as free and open source software. It is especially useful to researchers investigating reading and learning phenomena, as it integrates PDF reading-related behavioural data with gaze-related data. It is suitable for short and long-term research and supports multiple eye tracking systems. We utilised it to conduct an experiment which demonstrated that features obtained from both gaze and reading data collected in the past can predict reading comprehension which takes place in the future. PeyeDF also provides an integrated means for data collection and indexing using the DiMe personal data storage system. It is designed to collect data in the background without interfering with the reading experience, behaving like a modern lightweight PDF reader. Moreover, it supports annotations, tagging and collaborative work. A modular design allows the application to be easily modified in order to support additional eye tracking protocols and run controlled experiments. We discuss the implementation of the software and report on the results of the experiment which we conducted with it. △ Less

Submitted 27 April, 2019; originally announced April 2019.

arXiv:1904.06882 [pdf, other]

Geometric Image Correspondence Verification by Dense Pixel Matching

Authors: Zakaria Laskar, Iaroslav Melekhov, Hamed R. Tavakoli, Juha Ylioinas, Juho Kannala

Abstract: This paper addresses the problem of determining dense pixel correspondences between two images and its application to geometric correspondence verification in image retrieval. The main contribution is a geometric correspondence verification approach for re-ranking a shortlist of retrieved database images based on their dense pair-wise matching with the query image at a pixel level. We determine a… ▽ More This paper addresses the problem of determining dense pixel correspondences between two images and its application to geometric correspondence verification in image retrieval. The main contribution is a geometric correspondence verification approach for re-ranking a shortlist of retrieved database images based on their dense pair-wise matching with the query image at a pixel level. We determine a set of cyclically consistent dense pixel matches between the pair of images and evaluate local similarity of matched pixels using neural network based image descriptors. Final re-ranking is based on a novel similarity function, which fuses the local similarity metric with a global similarity metric and a geometric consistency measure computed for the matched pixels. For dense matching our approach utilizes a modified version of a recently proposed dense geometric correspondence network (DGC-Net), which we also improve by optimizing the architecture. The proposed model and similarity metric compare favourably to the state-of-the-art image retrieval methods. In addition, we apply our method to the problem of long-term visual localization demonstrating promising results and generalization across datasets. △ Less

Submitted 17 August, 2020; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: The appendix has been updated by adding some clarifications

arXiv:1904.06090 [pdf, other]

doi 10.1109/WACV.2019.00035

Digging Deeper into Egocentric Gaze Prediction

Authors: Hamed R. Tavakoli, Esa Rahtu, Juho Kannala, Ali Borji

Abstract: This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed… ▽ More This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We also propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects. △ Less

Submitted 12 April, 2019; originally announced April 2019.

Comments: presented at WACV 2019

arXiv:1903.02501 [pdf, other]

Understanding and Visualizing Deep Visual Saliency Models

Authors: Sen He, Hamed R. Tavakoli, Ali Borji, Yang Mi, Nicolas Pugeault

Abstract: Recently, data-driven deep saliency models have achieved high performance and have outperformed classical saliency models, as demonstrated by results on datasets such as the MIT300 and SALICON. Yet, there remains a large gap between the performance of these models and the inter-human baseline. Some outstanding questions include what have these models learned, how and where they fail, and how they… ▽ More Recently, data-driven deep saliency models have achieved high performance and have outperformed classical saliency models, as demonstrated by results on datasets such as the MIT300 and SALICON. Yet, there remains a large gap between the performance of these models and the inter-human baseline. Some outstanding questions include what have these models learned, how and where they fail, and how they can be improved. This article attempts to answer these questions by analyzing the representations learned by individual neurons located at the intermediate layers of deep saliency models. To this end, we follow the steps of existing deep saliency models, that is borrowing a pre-trained model of object recognition to encode the visual features and learning a decoder to infer the saliency. We consider two cases when the encoder is used as a fixed feature extractor and when it is fine-tuned, and compare the inner representations of the network. To study how the learned representations depend on the task, we fine-tune the same network using the same image set but for two different tasks: saliency prediction versus scene classification. Our analyses reveal that: 1) some visual regions (e.g. head, text, symbol, vehicle) are already encoded within various layers of the network pre-trained for object recognition, 2) using modern datasets, we find that fine-tuning pre-trained models for saliency prediction makes them favor some categories (e.g. head) over some others (e.g. text), 3) although deep models of saliency outperform classical models on natural images, the converse is true for synthetic stimuli (e.g. pop-out search arrays), an evidence of significant difference between human and data-driven saliency models, and 4) we confirm that, after-fine tuning, the change in inner-representations is mostly due to the task and not the domain shift in the data. △ Less

Submitted 3 April, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

Comments: To appear in CVPR2019, camera ready version

arXiv:1903.02499 [pdf, other]

Human Attention in Image Captioning: Dataset and Analysis

Authors: Sen He, Hamed R. Tavakoli, Ali Borji, Nicolas Pugeault

Abstract: In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images. Using this data, we study the differences in human attention during free-viewing and image captioning tasks. We look into the relationship between human attention and language constructs during perception and sentence articulation. We also analyse attention deployment me… ▽ More In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images. Using this data, we study the differences in human attention during free-viewing and image captioning tasks. We look into the relationship between human attention and language constructs during perception and sentence articulation. We also analyse attention deployment mechanisms in the top-down soft attention approach that is argued to mimic human attention in captioning tasks, and investigate whether visual saliency can help image captioning. Our study reveals that (1) human attention behaviour differs in free-viewing and image description tasks. Humans tend to fixate on a greater variety of regions under the latter task, (2) there is a strong relationship between described objects and attended objects ($97\%$ of the described objects are being attended), (3) a convolutional neural network as feature encoder accounts for human-attended regions during image captioning to a great extent (around $78\%$), (4) soft-attention mechanism differs from human attention, both spatially and temporally, and there is low correlation between caption scores and attention consistency scores. These indicate a large gap between humans and machines in regards to top-down attention, and (5) by integrating the soft attention model with image saliency, we can significantly improve the model's performance on Flickr30k and MSCOCO benchmarks. The dataset can be found at: https://github.com/SenHe/Human-Attention-in-Image-Captioning. △ Less

Submitted 7 August, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

Comments: To appear at ICCV 2019

Journal ref: IEEE International Conference on Computer Vision (ICCV 2019)

arXiv:1901.08341 [pdf, other]

Semantic Matching by Weakly Supervised 2D Point Set Registration

Authors: Zakaria Laskar, Hamed R. Tavakoli, Juho Kannala

Abstract: In this paper we address the problem of establishing correspondences between different instances of the same object. The problem is posed as finding the geometric transformation that aligns a given image pair. We use a convolutional neural network (CNN) to directly regress the parameters of the transformation model. The alignment problem is defined in the setting where an unordered set of semantic… ▽ More In this paper we address the problem of establishing correspondences between different instances of the same object. The problem is posed as finding the geometric transformation that aligns a given image pair. We use a convolutional neural network (CNN) to directly regress the parameters of the transformation model. The alignment problem is defined in the setting where an unordered set of semantic key-points per image are available, but, without the correspondence information. To this end we propose a novel loss function based on cyclic consistency that solves this 2D point set registration problem by inferring the optimal geometric transformation model parameters. We train and test our approach on a standard benchmark dataset Proposal-Flow (PF-PASCAL)\cite{proposal_flow}. The proposed approach achieves state-of-the-art results demonstrating the effectiveness of the method. In addition, we show our approach further benefits from additional training samples in PF-PASCAL generated by using category level information. △ Less

Submitted 24 January, 2019; originally announced January 2019.

Comments: Accepted to WACV 2019

arXiv:1810.05680 [pdf, other]

Bottom-up Attention, Models of

Authors: Ali Borji, Hamed R. Tavakoli, Zoya Bylinskii

Abstract: In this review, we examine the recent progress in saliency prediction and proposed several avenues for future research. In spite of tremendous efforts and huge progress, there is still room for improvement in terms finer-grained analysis of deep saliency models, evaluation measures, datasets, annotation methods, cognitive studies, and new applications. This chapter will appear in Encyclopedia of C… ▽ More In this review, we examine the recent progress in saliency prediction and proposed several avenues for future research. In spite of tremendous efforts and huge progress, there is still room for improvement in terms finer-grained analysis of deep saliency models, evaluation measures, datasets, annotation methods, cognitive studies, and new applications. This chapter will appear in Encyclopedia of Computational Neuroscience. △ Less

Submitted 24 April, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1810.03716

arXiv:1705.10546 [pdf, other]

Saliency Revisited: Analysis of Mouse Movements versus Fixations

Authors: Hamed R. Tavakoli, Fawad Ahmed, Ali Borji, Jorma Laaksonen

Abstract: This paper revisits visual saliency prediction by evaluating the recent advancements in this field such as crowd-sourced mouse tracking-based databases and contextual annotations. We pursue a critical and quantitative approach towards some of the new challenges including the quality of mouse tracking versus eye tracking for model training and evaluation. We extend quantitative evaluation of models… ▽ More This paper revisits visual saliency prediction by evaluating the recent advancements in this field such as crowd-sourced mouse tracking-based databases and contextual annotations. We pursue a critical and quantitative approach towards some of the new challenges including the quality of mouse tracking versus eye tracking for model training and evaluation. We extend quantitative evaluation of models in order to incorporate contextual information by proposing an evaluation methodology that allows accounting for contextual factors such as text, faces, and object attributes. The proposed contextual evaluation scheme facilitates detailed analysis of models and helps identify their pros and cons. Through several experiments, we find that (1) mouse tracking data has lower inter-participant visual congruency and higher dispersion, compared to the eye tracking data, (2) mouse tracking data does not totally agree with eye tracking in general and in terms of different contextual regions in specific, and (3) mouse tracking data leads to acceptable results in training current existing models, and (4) mouse tracking data is less reliable for model selection and evaluation. The contextual evaluation also reveals that, among the studied models, there is no single model that performs best on all the tested annotations. △ Less

Submitted 30 May, 2017; originally announced May 2017.

arXiv:1704.07434 [pdf, other]

Paying Attention to Descriptions Generated by Image Captioning Models

Authors: Hamed R. Tavakoli, Rakshith Shetty, Ali Borji, Jorma Laaksonen

Abstract: To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene. In this paper, we study the agreement between bottom-up saliency-based visual attention and object referrals in scene description constructs. We investigate the properties of human-written descriptions and machine-generated ones. We then propose a… ▽ More To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene. In this paper, we study the agreement between bottom-up saliency-based visual attention and object referrals in scene description constructs. We investigate the properties of human-written descriptions and machine-generated ones. We then propose a saliency-boosted image captioning model in order to investigate benefits from low-level cues in language models. We learn that (1) humans mention more salient objects earlier than less salient ones in their descriptions, (2) the better a captioning model performs, the better attention agreement it has with human descriptions, (3) the proposed saliency-boosted model, compared to its baseline form, does not improve significantly on the MS COCO database, indicating explicit bottom-up boosting does not help when the task is well learnt and tuned on a data, (4) a better generalization is, however, observed for the saliency-boosted model on unseen data. △ Less

Submitted 4 August, 2017; v1 submitted 24 April, 2017; originally announced April 2017.

Comments: To appear in ICCV 2017

arXiv:1704.07402 [pdf, other]

Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition

Authors: Hamed R. Tavakoli, Jorma Laaksonen

Abstract: This manuscript introduces the problem of prominent object detection and recognition inspired by the fact that human seems to priorities perception of scene elements. The problem deals with finding the most important region of interest, segmenting the relevant item/object in that area, and assigning it an object class label. In other words, we are solving the three problems of saliency modeling, s… ▽ More This manuscript introduces the problem of prominent object detection and recognition inspired by the fact that human seems to priorities perception of scene elements. The problem deals with finding the most important region of interest, segmenting the relevant item/object in that area, and assigning it an object class label. In other words, we are solving the three problems of saliency modeling, saliency detection, and object recognition under one umbrella. The motivation behind such a problem formulation is (1) the benefits to the knowledge representation-based vision pipelines, and (2) the potential improvements in emulating bio-inspired vision systems by solving these three problems together. We are foreseeing extending this problem formulation to fully semantically segmented scenes with instance object priority for high-level inferences in various applications including assistive vision. Along with a new problem definition, we also propose a method to achieve such a task. The proposed model predicts the most important area in the image, segments the associated objects, and labels them. The proposed problem and method are evaluated against human fixations, annotated segmentation masks, and object class categories. We define a chance level for each of the evaluation criterion to compare the proposed algorithm with. Despite the good performance of the proposed baseline, the overall evaluations indicate that the problem of prominent object detection and recognition is a challenging task that is still worth investigating further. △ Less

Submitted 4 August, 2017; v1 submitted 24 April, 2017; originally announced April 2017.

arXiv:1704.02218 [pdf, other]

Investigating Natural Image Pleasantness Recognition using Deep Features and Eye Tracking for Loosely Controlled Human-computer Interaction

Authors: Hamed R. Tavakoli, Jorma Laaksonen, Esa Rahtu

Abstract: This paper revisits recognition of natural image pleasantness by employing deep convolutional neural networks and affordable eye trackers. There exist several approaches to recognize image pleasantness: (1) computer vision, and (2) psychophysical signals. For natural images, computer vision approaches have not been as successful as for abstract paintings and is lagging behind the psychophysical si… ▽ More This paper revisits recognition of natural image pleasantness by employing deep convolutional neural networks and affordable eye trackers. There exist several approaches to recognize image pleasantness: (1) computer vision, and (2) psychophysical signals. For natural images, computer vision approaches have not been as successful as for abstract paintings and is lagging behind the psychophysical signals like eye movements. Despite better results, the scalability of eye movements is adversely affected by the sensor cost. While the introduction of affordable sensors have helped the scalability issue by making the sensors more accessible, the application of such sensors in a loosely controlled human-computer interaction setup is not yet studied for affective image tagging. On the other hand, deep convolutional neural networks have boosted the performance of vision-based techniques significantly in recent years. To investigate the current status in regard to affective image tagging, we (1) introduce a new eye movement dataset using an affordable eye tracker, (2) study the use of deep neural networks for pleasantness recognition, (3) investigate the gap between deep features and eye movements. To meet these ends, we record eye movements in a less controlled setup, akin to daily human-computer interaction. We assess features from eye movements, visual features, and their combination. Our results show that (1) recognizing natural image pleasantness from eye movement under less restricted setup is difficult and previously used techniques are prone to fail, and (2) visual class categories are strong cues for predicting pleasantness, due to their correlation with emotions, necessitating careful study of this phenomenon. This latter finding is alerting as some deep learning approaches may fit to the class category bias. △ Less

Submitted 7 April, 2017; originally announced April 2017.

arXiv:1409.4762 [pdf]

Source and Channel Optimal Rate LDPC Code Design for one Sender in BE-MAC with Source Correlation

Authors: Hassan Tavakoli

Abstract: In this paper, we present an extension of the semidefinite programming formulation of the optimal rate code design in single link Binary Erasure Channel (BEC) proposed by the authors to the Binary Erasure Multiple Access Channel (BE-MAC) with two sources correlation. This new way can be easily extended to the multiple access senders. Simulation results show the efficiency and effectiveness of the… ▽ More In this paper, we present an extension of the semidefinite programming formulation of the optimal rate code design in single link Binary Erasure Channel (BEC) proposed by the authors to the Binary Erasure Multiple Access Channel (BE-MAC) with two sources correlation. This new way can be easily extended to the multiple access senders. Simulation results show the efficiency and effectiveness of the new approach in practice. △ Less

Submitted 16 September, 2014; originally announced September 2014.

Comments: This Paper is a draft of final paper which represented in 7th International Symposium on Telecommunications (IST'2014)

arXiv:1409.4761 [pdf]

Reducing the Complexity of the Linear Programming Decoding

Authors: Hassan Tavakoli

Abstract: In this paper we show how the complexity of Linear Programming (LP) decoder can decrease. We use the degree 3 check equation to model all variation check degrees. The complexity of LP decoding is directed relative to the number of constraint. Number of constraint for original LP decoder is O(n*(2^n)). Our method decrease the number of the constraint to O(n). In this paper we show how the complexity of Linear Programming (LP) decoder can decrease. We use the degree 3 check equation to model all variation check degrees. The complexity of LP decoding is directed relative to the number of constraint. Number of constraint for original LP decoder is O(n*(2^n)). Our method decrease the number of the constraint to O(n). △ Less

Submitted 16 September, 2014; originally announced September 2014.

Comments: This Paper is a draft of final paper which represented in 7th International Symposium on Telecommunications (IST'2014)

arXiv:1409.4760 [pdf]

A Fast Convergence Density Evolution Algorithm for Optimal Rate LDPC Codes in BEC

Authors: Hassan Tavakoli

Abstract: We derive a new fast convergent Density Evolution algorithm for finding optimal rate Low-Density Parity-Check (LDPC) codes used over the binary erasure channel (BEC). The fast convergence property comes from the modified Density Evolution (DE), a numerical method for analyzing the behavior of iterative decoding convergence of a LDPC code. We have used the method of [16] for designing of a LDPC cod… ▽ More We derive a new fast convergent Density Evolution algorithm for finding optimal rate Low-Density Parity-Check (LDPC) codes used over the binary erasure channel (BEC). The fast convergence property comes from the modified Density Evolution (DE), a numerical method for analyzing the behavior of iterative decoding convergence of a LDPC code. We have used the method of [16] for designing of a LDPC code with optimal rate. This has been done for a given parity check node degree distribution, erasure probability and specified DE constraint. The fast behavior of DE and found optimal rate with this method compare with the previous DE constraint. △ Less

Submitted 16 September, 2014; originally announced September 2014.

Comments: This Paper is a draft of final paper which represented in 7th International Symposium on Telecommunications (IST'2014)

arXiv:1211.6279 [pdf]

Optimal Rate Irregular LDPC Codes in Binary Erasure Channel

Authors: H. Tavakoli, M. Ahmadian, M. Reza Peyghami

Abstract: In this paper, we design the optimal rate capacity approaching irregular Low-Density Parity-Check code ensemble over Binary Erasure Channel, by using practical Semi-Definite Programming approach. Our method does not use any relaxation or any approximate solution unlike previous works. Our simulation results include two parts; first, we present some codes and their degree distribution functions tha… ▽ More In this paper, we design the optimal rate capacity approaching irregular Low-Density Parity-Check code ensemble over Binary Erasure Channel, by using practical Semi-Definite Programming approach. Our method does not use any relaxation or any approximate solution unlike previous works. Our simulation results include two parts; first, we present some codes and their degree distribution functions that their rates are close to the capacity. Second, the maximum achievable rate behavior of codes in our method is illustrated through some figures. △ Less

Submitted 27 November, 2012; originally announced November 2012.

Comments: published in IET Communications

arXiv:1203.4385 [pdf]

Optimal Rate and Maximum Erasure Probability LDPC Codes in Binary Erasure Channel

Authors: H. Tavakoli, M. Ahmadian Attari, M. R. Peyghami

Abstract: In this paper, we present a novel way for solving the main problem of designing the capacity approaching irregular low-density parity-check (LDPC) code ensemble over binary erasure channel (BEC). The proposed method is much simpler, faster, accurate and practical than other methods. Our method does not use any relaxation or any approximate solution like previous works. Our method works and finds o… ▽ More In this paper, we present a novel way for solving the main problem of designing the capacity approaching irregular low-density parity-check (LDPC) code ensemble over binary erasure channel (BEC). The proposed method is much simpler, faster, accurate and practical than other methods. Our method does not use any relaxation or any approximate solution like previous works. Our method works and finds optimal answer for any given check node degree distribution. The proposed method was implemented and it works well in practice with polynomial time complexity. As a result, we represent some degree distributions that their rates are close to the capacity with maximum erasure probability and maximum code rate. △ Less

Submitted 27 February, 2021; v1 submitted 20 March, 2012; originally announced March 2012.

Comments: 6

arXiv:1108.1572 [pdf]

doi 10.1109/ITW.2011.6089360

Optimal Rate for Irregular LDPC Codes in Binary Erasure Channel

Authors: H. Tavakoli, M. Ahmadian Attari, M. Reza Peyghami

Abstract: In this paper, we introduce a new practical and general method for solving the main problem of designing the capacity approaching, optimal rate, irregular low-density parity-check (LDPC) code ensemble over binary erasure channel (BEC). Compared to some new researches, which are based on application of asymptotic analysis tools out of optimization process, the proposed method is much simpler, faste… ▽ More In this paper, we introduce a new practical and general method for solving the main problem of designing the capacity approaching, optimal rate, irregular low-density parity-check (LDPC) code ensemble over binary erasure channel (BEC). Compared to some new researches, which are based on application of asymptotic analysis tools out of optimization process, the proposed method is much simpler, faster, accurate and practical. Because of not using any relaxation or any approximate solution like previous works, the found answer with this method is optimal. We can construct optimal variable node degree distribution for any given binary erasure rate, ε, and any check node degree distribution. The presented method is implemented and works well in practice. The time complexity of this method is of polynomial order. As a result, we obtain some degree distribution which their rates are close to the capacity. △ Less

Submitted 7 August, 2011; originally announced August 2011.

Comments: 5 pages, to be presented at the 2011 IEEE Information Theory Workshop (ITW 2011), Paraty, Brazil, October, 2011

MSC Class: 94A15

Showing 1–39 of 39 results for author: Tavakoli, H