Skip to main content

Showing 1–26 of 26 results for author: Tavakoli, H R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.10163  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning

    Authors: Yue Jiang, Zixin Guo, Hamed Rezazadegan Tavakoli, Luis A. Leiva, Antti Oulasvirta

    Abstract: From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpa… ▽ More

    Submitted 20 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  2. arXiv:2402.05202  [pdf, other

    cs.HC

    UEyes: An Eye-Tracking Dataset across User Interface Types

    Authors: Yue Jiang, Luis A. Leiva, Paul R. B. Houssel, Hamed R. Tavakoli, Julia Kylmälä, Antti Oulasvirta

    Abstract: Different types of user interfaces differ significantly in the number of elements and how they are displayed. To examine how such differences affect the way users look at UIs, we collected and analyzed a large eye-tracking-based dataset, UEyes (62 participants, 1,980 UI screenshots, near 20K eye movement sequences), covering four major UI types: webpage, desktop UI, mobile UI, and poster. Furtherm… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted as a CHI2023 workshop paper

  3. arXiv:2401.10761  [pdf, other

    eess.IV cs.CV

    NN-VVC: Versatile Video Coding boosted by self-supervisedly learned image coding for machines

    Authors: Jukka I. Ahonen, Nam Le, Honglei Zhang, Antti Hallapuro, Francesco Cricri, Hamed Rezazadegan Tavakoli, Miska M. Hannuksela, Esa Rahtu

    Abstract: The recent progress in artificial intelligence has led to an ever-increasing usage of images and videos by machine analysis algorithms, mainly neural networks. Nonetheless, compression, storage and transmission of media have traditionally been designed considering human beings as the viewers of the content. Recent research on image and video coding for machine analysis has progressed mainly in two… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: ISM 2023 Best paper award winner version

  4. Bridging the gap between image coding for machines and humans

    Authors: Nam Le, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed Rezazadegan Tavakoli, Emre Aksu, Miska M. Hannuksela, Esa Rahtu

    Abstract: Image coding for machines (ICM) aims at reducing the bitrate required to represent an image while minimizing the drop in machine vision analysis accuracy. In many use cases, such as surveillance, it is also important that the visual quality is not drastically deteriorated by the compression process. Recent works on using neural network (NN) based ICM codecs have shown significant coding gains agai… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Journal ref: IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 2022, pp. 3411-3415

  5. arXiv:2210.04112  [pdf, other

    cs.CV cs.LG cs.MM eess.IV

    Leveraging progressive model and overfitting for efficient learned image compression

    Authors: Honglei Zhang, Francesco Cricri, Hamed Rezazadegan Tavakoli, Emre Aksu, Miska M. Hannuksela

    Abstract: Deep learning is overwhelmingly dominant in the field of computer vision and image/video processing for the last decade. However, for image and video compression, it lags behind the traditional techniques based on discrete cosine transform (DCT) and linear filters. Built on top of an autoencoder architecture, learned image compression (LIC) systems have drawn enormous attention in recent years. Ne… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

  6. arXiv:2112.08767  [pdf, other

    eess.IV cs.CV cs.LG

    Adaptation and Attention for Neural Video Coding

    Authors: Nannan Zou, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu

    Abstract: Neural image coding represents now the state-of-the-art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-to-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired w… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  7. arXiv:2108.10551  [pdf, ps, other

    eess.IV cs.CV cs.LG

    Lossless Image Compression Using a Multi-Scale Progressive Statistical Model

    Authors: Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Nannan Zou, Emre Aksu, Miska M. Hannuksela

    Abstract: Lossless image compression is an important technique for image storage and transmission when information loss is not allowed. With the fast development of deep learning techniques, deep neural networks have been used in this field to achieve a higher compression rate. Methods based on pixel-wise autoregressive statistical models have shown good performance. However, the sequential processing way p… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

    Comments: Accepted ACCV 2020

  8. Learned Image Coding for Machines: A Content-Adaptive Approach

    Authors: Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Hamed Rezazadegan Tavakoli, Esa Rahtu

    Abstract: Today, according to the Cisco Annual Internet Report (2018-2023), the fastest-growing category of Internet traffic is machine-to-machine communication. In particular, machine-to-machine communication of images and videos represents a new challenge and opens up new perspectives in the context of data compression. One possible solution approach consists of adapting current human-targeted image and v… ▽ More

    Submitted 13 October, 2021; v1 submitted 23 August, 2021; originally announced August 2021.

    Comments: Fig 4 correction

    Journal ref: 2021 IEEE International Conference on Multimedia and Expo (ICME), 2021, pp. 1-6

  9. Understanding Visual Saliency in Mobile User Interfaces

    Authors: Luis A. Leiva, Yunfei Xue, Avya Bansal, Hamed R. Tavakoli, Tuğçe Köroğlu, Niraj R. Dayama, Antti Oulasvirta

    Abstract: For graphical user interface (UI) design, it is important to understand what attracts visual attention. While previous work on saliency has focused on desktop and web-based UIs, mobile app UIs differ from these in several respects. We present findings from a controlled study with 30 participants and 193 mobile UIs. The results speak to a role of expectations in guiding where users look at. Strong… ▽ More

    Submitted 22 January, 2021; originally announced January 2021.

    Journal ref: Proceedings of the 22nd Intl. Conf. on Human-Computer Interaction with Mobile Devices and Services (MobileHCI), 2020

  10. arXiv:2008.13227  [pdf, other

    cs.CV

    A Compact Deep Architecture for Real-time Saliency Prediction

    Authors: Saman Zabihi, Hamed Rezazadegan Tavakoli, Ali Borji

    Abstract: Saliency computation models aim to imitate the attention mechanism in the human visual system. The application of deep neural networks for saliency prediction has led to a drastic improvement over the last few years. However, deep models have a high number of parameters which makes them less suitable for real-time applications. Here we propose a compact yet fast model for real-time saliency pred… ▽ More

    Submitted 30 August, 2020; originally announced August 2020.

  11. arXiv:2007.16054  [pdf, other

    eess.IV cs.CV cs.LG cs.MM stat.ML

    Learning to Learn to Compress

    Authors: Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Jani Lainema, Miska Hannuksela, Emre Aksu, Esa Rahtu

    Abstract: In this paper we present an end-to-end meta-learned system for image compression. Traditional machine learning based approaches to image compression train one or more neural network for generalization performance. However, at inference time, the encoder or the latent tensor output by the encoder can be optimized for each test image. This optimization can be regarded as a form of adaptation or bene… ▽ More

    Submitted 1 May, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

  12. arXiv:2004.14231  [pdf, other

    cs.CV

    Image Captioning through Image Transformer

    Authors: Sen He, Wentong Liao, Hamed R. Tavakoli, Michael Yang, Bodo Rosenhahn, Nicolas Pugeault

    Abstract: Automatic captioning of images is a task that combines the challenges of image analysis and text generation. One important aspect in captioning is the notion of attention: How to decide what to describe and in which order. Inspired by the successes in text analysis and translation, previous work have proposed the \textit{transformer} architecture for image captioning. However, the structure betwee… ▽ More

    Submitted 2 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

  13. arXiv:2004.09226  [pdf, other

    eess.IV cs.CV cs.LG

    End-to-End Learning for Video Frame Compression with Self-Attention

    Authors: Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu

    Abstract: One of the core components of conventional (i.e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations. In this paper, we propose an end-to-end learned system for compressing video frames. Instead of relying on pixel-space motion (as with optical flow), our system learns deep embeddings of frames and encodes their difference… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

  14. arXiv:1907.02336  [pdf, other

    cs.CV

    Deep Saliency Models : The Quest For The Loss Function

    Authors: Alexandre Bruckert, Hamed R. Tavakoli, Zhi Liu, Marc Christie, Olivier Le Meur

    Abstract: Recent advances in deep learning have pushed the performances of visual saliency models way further than it has ever been. Numerous models in the literature present new ways to design neural networks, to arrange gaze pattern data, or to extract as much high and low-level image features as possible in order to create the best saliency representation. However, one key part of a typical deep learning… ▽ More

    Submitted 4 July, 2019; originally announced July 2019.

    Comments: 10 pages, 4 figures

  15. arXiv:1905.10693  [pdf, other

    cs.CV

    DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction

    Authors: Hamed R. Tavakoli, Ali Borji, Esa Rahtu, Juho Kannala

    Abstract: This paper studies audio-visual deep saliency prediction. It introduces a conceptually simple and effective Deep Audio-Visual Embedding for dynamic saliency prediction dubbed ``DAVE" in conjunction with our efforts towards building an Audio-Visual Eye-tracking corpus named ``AVE". Despite existing a strong relation between auditory and visual cues for guiding gaze during perception, video saliency… ▽ More

    Submitted 7 January, 2020; v1 submitted 25 May, 2019; originally announced May 2019.

  16. arXiv:1904.12152  [pdf, other

    cs.HC

    PeyeDF: an Eye-Tracking Application for Reading and Self-Indexing Research

    Authors: Marco Filetti, Hamed R. Tavakoli, Niklas Ravaja, Giulio Jacucci

    Abstract: PeyeDF is a Portable Document Format (PDF) reader with eye tracking support, available as free and open source software. It is especially useful to researchers investigating reading and learning phenomena, as it integrates PDF reading-related behavioural data with gaze-related data. It is suitable for short and long-term research and supports multiple eye tracking systems. We utilised it to conduc… ▽ More

    Submitted 27 April, 2019; originally announced April 2019.

  17. arXiv:1904.06882  [pdf, other

    cs.CV

    Geometric Image Correspondence Verification by Dense Pixel Matching

    Authors: Zakaria Laskar, Iaroslav Melekhov, Hamed R. Tavakoli, Juha Ylioinas, Juho Kannala

    Abstract: This paper addresses the problem of determining dense pixel correspondences between two images and its application to geometric correspondence verification in image retrieval. The main contribution is a geometric correspondence verification approach for re-ranking a shortlist of retrieved database images based on their dense pair-wise matching with the query image at a pixel level. We determine a… ▽ More

    Submitted 17 August, 2020; v1 submitted 15 April, 2019; originally announced April 2019.

    Comments: The appendix has been updated by adding some clarifications

  18. Digging Deeper into Egocentric Gaze Prediction

    Authors: Hamed R. Tavakoli, Esa Rahtu, Juho Kannala, Ali Borji

    Abstract: This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: presented at WACV 2019

  19. arXiv:1903.02501  [pdf, other

    cs.CV

    Understanding and Visualizing Deep Visual Saliency Models

    Authors: Sen He, Hamed R. Tavakoli, Ali Borji, Yang Mi, Nicolas Pugeault

    Abstract: Recently, data-driven deep saliency models have achieved high performance and have outperformed classical saliency models, as demonstrated by results on datasets such as the MIT300 and SALICON. Yet, there remains a large gap between the performance of these models and the inter-human baseline. Some outstanding questions include what have these models learned, how and where they fail, and how they… ▽ More

    Submitted 3 April, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

    Comments: To appear in CVPR2019, camera ready version

  20. arXiv:1903.02499  [pdf, other

    cs.CV

    Human Attention in Image Captioning: Dataset and Analysis

    Authors: Sen He, Hamed R. Tavakoli, Ali Borji, Nicolas Pugeault

    Abstract: In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images. Using this data, we study the differences in human attention during free-viewing and image captioning tasks. We look into the relationship between human attention and language constructs during perception and sentence articulation. We also analyse attention deployment me… ▽ More

    Submitted 7 August, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

    Comments: To appear at ICCV 2019

    Journal ref: IEEE International Conference on Computer Vision (ICCV 2019)

  21. arXiv:1901.08341  [pdf, other

    cs.CV

    Semantic Matching by Weakly Supervised 2D Point Set Registration

    Authors: Zakaria Laskar, Hamed R. Tavakoli, Juho Kannala

    Abstract: In this paper we address the problem of establishing correspondences between different instances of the same object. The problem is posed as finding the geometric transformation that aligns a given image pair. We use a convolutional neural network (CNN) to directly regress the parameters of the transformation model. The alignment problem is defined in the setting where an unordered set of semantic… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

    Comments: Accepted to WACV 2019

  22. arXiv:1810.05680  [pdf, other

    cs.CV

    Bottom-up Attention, Models of

    Authors: Ali Borji, Hamed R. Tavakoli, Zoya Bylinskii

    Abstract: In this review, we examine the recent progress in saliency prediction and proposed several avenues for future research. In spite of tremendous efforts and huge progress, there is still room for improvement in terms finer-grained analysis of deep saliency models, evaluation measures, datasets, annotation methods, cognitive studies, and new applications. This chapter will appear in Encyclopedia of C… ▽ More

    Submitted 24 April, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1810.03716

  23. arXiv:1705.10546  [pdf, other

    cs.CV

    Saliency Revisited: Analysis of Mouse Movements versus Fixations

    Authors: Hamed R. Tavakoli, Fawad Ahmed, Ali Borji, Jorma Laaksonen

    Abstract: This paper revisits visual saliency prediction by evaluating the recent advancements in this field such as crowd-sourced mouse tracking-based databases and contextual annotations. We pursue a critical and quantitative approach towards some of the new challenges including the quality of mouse tracking versus eye tracking for model training and evaluation. We extend quantitative evaluation of models… ▽ More

    Submitted 30 May, 2017; originally announced May 2017.

  24. arXiv:1704.07434  [pdf, other

    cs.CV cs.AI

    Paying Attention to Descriptions Generated by Image Captioning Models

    Authors: Hamed R. Tavakoli, Rakshith Shetty, Ali Borji, Jorma Laaksonen

    Abstract: To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene. In this paper, we study the agreement between bottom-up saliency-based visual attention and object referrals in scene description constructs. We investigate the properties of human-written descriptions and machine-generated ones. We then propose a… ▽ More

    Submitted 4 August, 2017; v1 submitted 24 April, 2017; originally announced April 2017.

    Comments: To appear in ICCV 2017

  25. arXiv:1704.07402  [pdf, other

    cs.CV cs.AI

    Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition

    Authors: Hamed R. Tavakoli, Jorma Laaksonen

    Abstract: This manuscript introduces the problem of prominent object detection and recognition inspired by the fact that human seems to priorities perception of scene elements. The problem deals with finding the most important region of interest, segmenting the relevant item/object in that area, and assigning it an object class label. In other words, we are solving the three problems of saliency modeling, s… ▽ More

    Submitted 4 August, 2017; v1 submitted 24 April, 2017; originally announced April 2017.

  26. arXiv:1704.02218  [pdf, other

    cs.CV

    Investigating Natural Image Pleasantness Recognition using Deep Features and Eye Tracking for Loosely Controlled Human-computer Interaction

    Authors: Hamed R. Tavakoli, Jorma Laaksonen, Esa Rahtu

    Abstract: This paper revisits recognition of natural image pleasantness by employing deep convolutional neural networks and affordable eye trackers. There exist several approaches to recognize image pleasantness: (1) computer vision, and (2) psychophysical signals. For natural images, computer vision approaches have not been as successful as for abstract paintings and is lagging behind the psychophysical si… ▽ More

    Submitted 7 April, 2017; originally announced April 2017.