Skip to main content

Showing 51–100 of 113 results for author: Escalera, S

.
  1. arXiv:2201.01609  [pdf, other

    cs.CV cs.CL

    All You Need In Sign Language Production

    Authors: Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, Vassilis Athitsos, Mohammad Sabokrou

    Abstract: Sign Language is the dominant form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are two necessary p… ▽ More

    Submitted 6 January, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2103.15910

  2. arXiv:2111.07846  [pdf, other

    cs.CV

    Multi-Task Classification of Sewer Pipe Defects and Properties using a Cross-Task Graph Neural Network Decoder

    Authors: Joakim Bruslund Haurum, Meysam Madadi, Sergio Escalera, Thomas B. Moeslund

    Abstract: The sewerage infrastructure is one of the most important and expensive infrastructures in modern society. In order to efficiently manage the sewerage infrastructure, automated sewer inspection has to be utilized. However, while sewer defect classification has been investigated for decades, little attention has been given to classifying sewer pipe properties such as water level, pipe material, and… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: WACV 2022

  3. Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform

    Authors: Zhen Xu, Sergio Escalera, Isabelle Guyon, Adrien Pavão, Magali Richard, Wei-Wei Tu, Quanming Yao, Huan Zhao

    Abstract: Obtaining standardized crowdsourced benchmark of computational methods is a major issue in data science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here we introduce Codabench, an open-source, community-driven platform for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench (https://w… ▽ More

    Submitted 25 February, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Journal ref: Patterns Cell Press 2022

  4. arXiv:2110.02902  [pdf, ps, other

    cs.CV

    SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021

    Authors: Swathikiran Sudhakaran, Adrian Bulat, Juan-Manuel Perez-Rua, Alex Falcon, Sergio Escalera, Oswald Lanz, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: GSF and XViT. GSF is an efficient spatio-temporal feature extracting module that can be plugged into 2D CNNs for video action recognition. XViT is a… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: Ranked third in the EPIC-Kitchens-100 Action Recognition Challenge @ CVPR 2021

  5. arXiv:2110.01614  [pdf, other

    cs.GR cs.LG

    Neural Implicit Surfaces for Efficient and Accurate Collisions in Physically Based Simulations

    Authors: Hugo Bertiche, Meysam Madadi, Sergio Escalera

    Abstract: Current trends in the computer graphics community propose leveraging the massive parallel computational power of GPUs to accelerate physically based simulations. Collision detection and solving is a fundamental part of this process. It is also the most significant bottleneck on physically based simulations and it easily becomes intractable as the number of vertices in the scene increases. Brute fo… ▽ More

    Submitted 3 October, 2021; originally announced October 2021.

  6. arXiv:2109.09487  [pdf

    cs.CV cs.AI cs.LG

    Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions

    Authors: David Curto, Albert Clapés, Javier Selva, Sorina Smeureanu, Julio C. S. Jacques Junior, David Gallardo-Pujol, Georgina Guilera, David Leiva, Thomas B. Moeslund, Sergio Escalera, Cristina Palmero

    Abstract: Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to mo… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: Accepted to the 2021 ICCV Workshop on Understanding Social Behavior in Dyadic and Small Group Interactions

  7. arXiv:2109.00796  [pdf, other

    cs.CV

    Multi-Modal Zero-Shot Sign Language Recognition

    Authors: Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, Mohammad Sabokrou

    Abstract: Zero-Shot Learning (ZSL) has rapidly advanced in recent years. Towards overcoming the annotation bottleneck in the Sign Language Recognition (SLR), we explore the idea of Zero-Shot Sign Language Recognition (ZS-SLR) with no annotated visual examples, by leveraging their textual descriptions. In this way, we propose a multi-modal Zero-Shot Sign Language Recognition (ZS-SLR) model harnessing from th… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: text overlap with arXiv:2108.10059

  8. arXiv:2108.10059  [pdf, other

    cs.CV cs.HC

    ZS-SLR: Zero-Shot Sign Language Recognition from RGB-D Videos

    Authors: Razieh Rastgoo, Kourosh Kiani, Sergio Escalera

    Abstract: Sign Language Recognition (SLR) is a challenging research area in computer vision. To tackle the annotation bottleneck in SLR, we formulate the problem of Zero-Shot Sign Language Recognition (ZS-SLR) and propose a two-stream model from two input modalities: RGB and Depth videos. To benefit from the vision Transformer capabilities, we use two vision Transformer models, for human detection and visua… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

  9. arXiv:2108.06968  [pdf, other

    cs.CV

    3D High-Fidelity Mask Face Presentation Attack Detection Challenge

    Authors: Ajian Liu, Chenxu Zhao, Zitong Yu, Anyang Su, Xing Liu, Zijian Kong, Jun Wan, Sergio Escalera, Hugo Jair Escalante, Zhen Lei, Guodong Guo

    Abstract: The threat of 3D masks to face recognition systems is increasingly serious and has been widely concerned by researchers. To facilitate the study of the algorithms, a large-scale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask) has been collected. Specifically, it consists of a total amount of 54, 600 videos which are recorded from 75 subjects with 225 realistic masks under… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

  10. ChaLearn Looking at People: Inpainting and Denoising challenges

    Authors: Sergio Escalera, Marti Soler, Stephane Ayache, Umut Guclu, Jun Wan, Meysam Madadi, Xavier Baro, Hugo Jair Escalante, Isabelle Guyon

    Abstract: Dealing with incomplete information is a well studied problem in the context of machine learning and computational intelligence. However, in the context of computer vision, the problem has only been studied in specific scenarios (e.g., certain types of occlusions in specific types of images), although it is common to have incomplete information in visual data. This chapter describes the design of… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Journal ref: Inpainting and Denoising Challenges. The Springer Series on Challenges in Machine Learning. Springer, Cham. (2019)

  11. Deep unsupervised 3D human body reconstruction from a sparse set of landmarks

    Authors: Meysam Madadi, Hugo Bertiche, Sergio Escalera

    Abstract: In this paper we propose the first deep unsupervised approach in human body reconstruction to estimate body surface from a sparse set of landmarks, so called DeepMurf. We apply a denoising autoencoder to estimate missing landmarks. Then we apply an attention model to estimate body joints from landmarks. Finally, a cascading network is applied to regress parameters of a statistical generative model… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Journal ref: IJCV (2021)

  12. arXiv:2105.05066  [pdf, other

    cs.CV

    ChaLearn LAP Large Scale Signer Independent Isolated Sign Language Recognition Challenge: Design, Results and Future Research

    Authors: Ozge Mercanoglu Sincan, Julio C. S. Jacques Junior, Sergio Escalera, Hacer Yalim Keles

    Abstract: The performances of Sign Language Recognition (SLR) systems have improved considerably in recent years. However, several open challenges still need to be solved to allow SLR to be useful in practice. The research in the field is in its infancy in regards to the robustness of the models to a large diversity of signs and signers, and to fairness of the models to performers from different demographic… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: Preprint of the accepted paper at ChaLearn Looking at People Sign Language Recognition in the Wild Workshop at CVPR 2021

  13. The EMPATHIC Project: Mid-term Achievements

    Authors: M. I. Torres, J. M. Olaso, C. Montenegro, R. Santana, A. Vázquez, R. Justo, J. A. Lozano, S. Schlögl, G. Chollet, N. Dugan, M. Irvine, N. Glackin, C. Pickard, A. Esposito, G. Cordasco, A. Troncone, D. Petrovska-Delacretaz, A. Mtibaa, M. A. Hmani, M. S. Korsnes, L. J. Martinussen, S. Escalera, C. Palmero Cantariño, O. Deroo, O. Gordeeva , et al. (4 additional authors not shown)

    Abstract: The goal of active aging is to promote changes in the elderly community so as to maintain an active, independent and socially-engaged lifestyle. Technological advancements currently provide the necessary tools to foster and monitor such processes. This paper reports on mid-term achievements of the European H2020 EMPATHIC project, which aims to research, innovate, explore and validate new interacti… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 12 pages

  14. Deep learning with self-supervision and uncertainty regularization to count fish in underwater images

    Authors: Penny Tarling, Mauricio Cantor, Albert Clapés, Sergio Escalera

    Abstract: Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging,… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

    Comments: 22 pages, 6 figures, submitted to indexed journal

  15. arXiv:2104.06148  [pdf, other

    cs.CV

    Contrastive Context-Aware Learning for 3D High-Fidelity Mask Face Presentation Attack Detection

    Authors: Ajian Liu, Chenxu Zhao, Zitong Yu, Jun Wan, Anyang Su, Xing Liu, Zichang Tan, Sergio Escalera, Junliang Xing, Yanyan Liang, Guodong Guo, Zhen Lei, Stan Z. Li, Du Zhang

    Abstract: Face presentation attack detection (PAD) is essential to secure face recognition systems primarily from high-fidelity mask attacks. Most existing 3D mask PAD benchmarks suffer from several drawbacks: 1) a limited number of mask identities, types of sensors, and a total number of videos; 2) low-fidelity quality of facial masks. Basic deep models and remote photoplethysmography (rPPG) methods achiev… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  16. arXiv:2103.15910  [pdf, other

    cs.CV

    Sign Language Production: A Review

    Authors: Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, Mohammad Sabokrou

    Abstract: Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

  17. Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component of EgoACO is class activation pooling (CAP), a differentiable pooling operation that combines ideas from bilinear pooling for fine-grained re… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: Accepted to TPAMI

  18. arXiv:2012.14259  [pdf, other

    cs.CV cs.AI cs.LG

    Context-Aware Personality Inference in Dyadic Scenarios: Introducing the UDIVA Dataset

    Authors: Cristina Palmero, Javier Selva, Sorina Smeureanu, Julio C. S. Jacques Junior, Albert Clapés, Alexa Moseguí, Zejian Zhang, David Gallardo, Georgina Guilera, David Leiva, Sergio Escalera

    Abstract: This paper introduces UDIVA, a new non-acted dataset of face-to-face dyadic interactions, where interlocutors perform competitive and collaborative tasks with different behavior elicitation and cognitive workload. The dataset consists of 90.5 hours of dyadic interactions among 147 participants distributed in 188 sessions, recorded using multiple audiovisual and physiological sensors. Currently, it… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

    Comments: Accepted to the 11th International Workshop on Human Behavior Understanding workshop at Winter Conference on Applications of Computer Vision 2021

  19. arXiv:2012.11310  [pdf, other

    cs.CV cs.GR

    PBNS: Physically Based Neural Simulator for Unsupervised Garment Pose Space Deformation

    Authors: Hugo Bertiche, Meysam Madadi, Sergio Escalera

    Abstract: We present a methodology to automatically obtain Pose Space Deformation (PSD) basis for rigged garments through deep learning. Classical approaches rely on Physically Based Simulations (PBS) to animate clothes. These are general solutions that, given a sufficiently fine-grained discretization of space and time, can achieve highly realistic results. However, they are computationally expensive and a… ▽ More

    Submitted 21 May, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

  20. arXiv:2011.14906  [pdf, other

    cs.CV cs.DB

    Person Perception Biases Exposed: Revisiting the First Impressions Dataset

    Authors: Julio C. S. Jacques Junior, Agata Lapedriza, Cristina Palmero, Xavier Baró, Sergio Escalera

    Abstract: This work revisits the ChaLearn First Impressions database, annotated for personality perception using pairwise comparisons via crowdsourcing. We analyse for the first time the original pairwise annotations, and reveal existing person perception biases associated to perceived attributes like gender, ethnicity, age and face attractiveness. We show how person perception bias can influence data label… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

    Comments: accepted on 11th International Workshop on Human Behavior Understanding (HBU), organized as part of WACV 2021

  21. arXiv:2009.07838  [pdf, other

    cs.CV

    FairFace Challenge at ECCV 2020: Analyzing Bias in Face Recognition

    Authors: Tomáš Sixta, Julio C. S. Jacques Junior, Pau Buch-Cardona, Neil M. Robertson, Eduard Vazquez, Sergio Escalera

    Abstract: This work summarizes the 2020 ChaLearn Looking at People Fair Face Recognition and Analysis Challenge and provides a description of the top-winning solutions and analysis of the results. The aim of the challenge was to evaluate accuracy and bias in gender and skin colour of submitted algorithms on the task of 1:1 face verification in the presence of other confounding attributes. Participants were… ▽ More

    Submitted 2 December, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: accepted on ECCV'2020 Fair Face Recognition and Analysis Workshop

  22. arXiv:2009.02715  [pdf, other

    cs.CV

    DeePSD: Automatic Deep Skinning And Pose Space Deformation For 3D Garment Animation

    Authors: Hugo Bertiche, Meysam Madadi, Emilio Tylson, Sergio Escalera

    Abstract: We present a novel solution to the garment animation problem through deep learning. Our contribution allows animating any template outfit with arbitrary topology and geometric complexity. Recent works develop models for garment edition, resizing and animation at the same time by leveraging the support body model (encoding garments as body homotopies). This leads to complex engineering solutions th… ▽ More

    Submitted 7 April, 2021; v1 submitted 6 September, 2020; originally announced September 2020.

  23. arXiv:2006.13725  [pdf, other

    cs.CV

    FBK-HUPBA Submission to the EPIC-Kitchens Action Recognition 2020 Challenge

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: In this report we describe the technical details of our submission to the EPIC-Kitchens Action Recognition 2020 Challenge. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: Gate-Shift Module (GSM) [1] and EgoACO, an extension of Long Short-Term Attention (LSTA) [2]. We design an ensemble of GSM and EgoACO model familie… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Ranked 3rd in the EPIC-Kitchens action recognition challenge @ CVPR 2020

  24. arXiv:2005.00450  [pdf, other

    cs.CV

    Computing the Testing Error without a Testing Set

    Authors: Ciprian Corneanu, Meysam Madadi, Sergio Escalera, Aleix Martinez

    Abstract: Deep Neural Networks (DNNs) have revolutionized computer vision. We now have DNNs that achieve top (performance) results in many problems, including object recognition, facial expression analysis, and semantic segmentation, to name but a few. The design of the DNNs that achieve top results is, however, non-trivial and mostly done by trail-and-error. That is, typically, researchers will derive many… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

  25. arXiv:2004.10998  [pdf, other

    cs.CV

    Cross-ethnicity Face Anti-spoofing Recognition Challenge: A Review

    Authors: Ajian Liu, Xuan Li, Jun Wan, Sergio Escalera, Hugo Jair Escalante, Meysam Madadi, Yi **, Zhuoyuan Wu, Xiaogang Yu, Zichang Tan, Qi Yuan, Ruikun Yang, Benjia Zhou, Guodong Guo, Stan Z. Li

    Abstract: Face anti-spoofing is critical to prevent face recognition systems from a security breach. The biometrics community has %possessed achieved impressive progress recently due the excellent performance of deep neural networks and the availability of large datasets. Although ethnic bias has been verified to severely affect the performance of face recognition systems, it still remains an open research… ▽ More

    Submitted 23 April, 2020; originally announced April 2020.

    Comments: 18 figures, 6 tables, 12 pages

  26. arXiv:2003.05136  [pdf, other

    cs.CV

    CASIA-SURF CeFA: A Benchmark for Multi-modal Cross-ethnicity Face Anti-spoofing

    Authors: Ajian Li, Zichang Tan, Xuan Li, Jun Wan, Sergio Escalera, Guodong Guo, Stan Z. Li

    Abstract: Ethnic bias has proven to negatively affect the performance of face recognition systems, and it remains an open research problem in face anti-spoofing. In order to study the ethnic bias for face anti-spoofing, we introduce the largest up to date CASIA-SURF Cross-ethnicity Face Anti-spoofing (CeFA) dataset (briefly named CeFA), covering $3$ ethnicities, $3$ modalities, $1,607$ subjects, and 2D plus… ▽ More

    Submitted 11 March, 2020; originally announced March 2020.

    Comments: 17 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:1912.02340

  27. arXiv:2003.05056  [pdf, other

    eess.IV cs.CV

    Multi-level Context Gating of Embedded Collective Knowledge for Medical Image Segmentation

    Authors: Maryam Asadi-Aghbolaghi, Reza Azad, Mahmood Fathy, Sergio Escalera

    Abstract: Medical image segmentation has been very challenging due to the large variation of anatomy across different cases. Recent advances in deep learning frameworks have exhibited faster and more accurate performance in image segmentation. Among the existing networks, U-Net has been successfully applied on medical image segmentation. In this paper, we propose an extension of U-Net for medical image segm… ▽ More

    Submitted 10 March, 2020; originally announced March 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1909.00166

  28. arXiv:1912.02792  [pdf, other

    cs.CV cs.LG eess.IV

    CLOTH3D: Clothed 3D Humans

    Authors: Hugo Bertiche, Meysam Madadi, Sergio Escalera

    Abstract: This work presents CLOTH3D, the first big scale synthetic dataset of 3D clothed human sequences. CLOTH3D contains a large variability on garment type, topology, shape, size, tightness and fabric. Clothes are simulated on top of thousands of different pose sequences and body shapes, generating realistic cloth dynamics. We provide the dataset with a generative model for cloth generation. We propose… ▽ More

    Submitted 6 September, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

  29. arXiv:1912.02340  [pdf, other

    cs.CV

    Static and Dynamic Fusion for Multi-modal Cross-ethnicity Face Anti-spoofing

    Authors: Ajian Liu, Zichang Tan, Xuan Li, Jun Wan, Sergio Escalera, Guodong Guo, Stan Z. Li

    Abstract: Regardless of the usage of deep learning and handcrafted methods, the dynamic information from videos and the effect of cross-ethnicity are rarely considered in face anti-spoofing. In this work, we propose a static-dynamic fusion mechanism for multi-modal face anti-spoofing. Inspired by motion divergences between real and fake faces, we incorporate the dynamic image calculated by rank pooling with… ▽ More

    Submitted 15 December, 2019; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: 10 pages, 9 figures, conference

  30. arXiv:1912.00381  [pdf, other

    cs.CV

    Gate-Shift Networks for Video Action Recognition

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space. In practice however, because of the large number of parameters and computations involved, they may under-perform in the lack of sufficiently large datasets for training them at scale. In this paper we introduce spatial gating in spatial-temporal decomposition of 3D k… ▽ More

    Submitted 21 March, 2020; v1 submitted 1 December, 2019; originally announced December 2019.

    Comments: CVPR20 camera ready version. Code and models available at https://github.com/swathikirans/GSM

  31. arXiv:1909.05568  [pdf, other

    cs.CV

    On the Effect of Observed Subject Biases in Apparent Personality Analysis from Audio-visual Signals

    Authors: Ricardo Darío Pérez Principi, Cristina Palmero, Julio C. S. Jacques Junior, Sergio Escalera

    Abstract: Personality perception is implicitly biased due to many subjective factors, such as cultural, social, contextual, gender and appearance. Approaches developed for automatic personality perception are not expected to predict the real personality of the target, but the personality external observers attributed to it. Hence, they have to deal with human bias, inherently transferred to the training dat… ▽ More

    Submitted 28 November, 2019; v1 submitted 12 September, 2019; originally announced September 2019.

    Comments: Accepted in IEEE Transactions on Affective Computing (TAC)

  32. arXiv:1909.00166  [pdf, other

    eess.IV cs.CV

    Bi-Directional ConvLSTM U-Net with Densley Connected Convolutions

    Authors: Reza Azad, Maryam Asadi-Aghbolaghi, Mahmood Fathy, Sergio Escalera

    Abstract: In recent years, deep learning-based networks have achieved state-of-the-art performance in medical image segmentation. Among the existing networks, U-Net has been successfully applied on medical image segmentation. In this paper, we propose an extension of U-Net, Bi-directional ConvLSTM U-Net with Densely connected convolutions (BCDU-Net), for medical image segmentation, in which we take full adv… ▽ More

    Submitted 31 August, 2019; originally announced September 2019.

  33. arXiv:1908.10654  [pdf, other

    cs.CV

    CASIA-SURF: A Large-scale Multi-modal Benchmark for Face Anti-spoofing

    Authors: Shifeng Zhang, Ajian Liu, Jun Wan, Yanyan Liang, Guogong Guo, Sergio Escalera, Hugo Jair Escalante, Stan Z. Li

    Abstract: Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects ($\le\negmedspace170$) and modalities ($\leq\negmedspace2$), which hinder the further development of the academi… ▽ More

    Submitted 4 February, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

    Comments: Accepted by TBIOM; Journal extension of our previous conference paper: arXiv:1812.00408

  34. ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition

    Authors: Jun Wan, Chi Lin, Longyin Wen, Yunan Li, Qiguang Miao, Sergio Escalera, Gholamreza Anbarjafari, Isabelle Guyon, Guodong Guo, Stan Z. Li

    Abstract: The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than $200$ teams round the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. This paper d… ▽ More

    Submitted 28 July, 2019; originally announced July 2019.

    Comments: 14 pages, 8 figures, 6 tables

    Journal ref: IEEE Transactions on Cybernetics 2020

  35. arXiv:1906.08960  [pdf, other

    cs.CV

    FBK-HUPBA Submission to the EPIC-Kitchens 2019 Action Recognition Challenge

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: In this report we describe the technical details of our submission to the EPIC-Kitchens 2019 action recognition challenge. To participate in the challenge we have developed a number of CNN-LSTA [3] and HF-TSN [2] variants, and submitted predictions from an ensemble compiled out of these two model families. Our submission, visible on the public leaderboard with team name FBK-HUPBA, achieved a top-1… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

    Comments: Ranked 3rd in the EPIC-Kitchens 2019 action recognition challenge, held as part of CVPR 2019

  36. arXiv:1905.12462  [pdf, other

    cs.CV

    Hierarchical Feature Aggregation Networks for Video Action Recognition

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: Most action recognition methods base on a) a late aggregation of frame level CNN features using average pooling, max pooling, or RNN, among others, or b) spatio-temporal aggregation via 3D convolutions. The first assume independence among frame features up to a certain level of abstraction and then perform higher-level aggregation, while the second extracts spatio-temporal features from grouped fr… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  37. arXiv:1905.03003  [pdf, other

    cs.CV

    Multi-task human analysis in still images: 2D/3D pose, depth map, and multi-part segmentation

    Authors: Daniel Sánchez, Marc Oliu, Meysam Madadi, Xavier Baró, Sergio Escalera

    Abstract: While many individual tasks in the domain of human analysis have recently received an accuracy boost from deep learning approaches, multi-task learning has mostly been ignored due to a lack of data. New synthetic datasets are being released, filling this gap with synthetic generated data. In this work, we analyze four related human analysis tasks in still images in a multi-task scenario by leverag… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Comments: 8 pages, 4 Figures, 5 Tables, Conference Faces and Gestures 2019

  38. arXiv:1902.07653  [pdf, other

    cs.CV

    On the effect of age perception biases for real age regression

    Authors: Julio C. S. Jacques Junior, Cagri Ozcinar, Marina Marjanovic, Xavier Baró, Gholamreza Anbarjafari, Sergio Escalera

    Abstract: Automatic age estimation from facial images represents an important task in computer vision. This paper analyses the effect of gender, age, ethnic, makeup and expression attributes of faces as sources of bias to improve deep apparent age prediction. Following recent works where it is shown that apparent age labels benefit real age estimation, rather than direct real to real age regression, our mai… ▽ More

    Submitted 20 February, 2019; originally announced February 2019.

    Comments: Accepted in the 14th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2019)

  39. arXiv:1812.10766  [pdf, other

    cs.CV

    SMPLR: Deep SMPL reverse for 3D human pose and shape recovery

    Authors: Meysam Madadi, Hugo Bertiche, Sergio Escalera

    Abstract: Current state-of-the-art in 3D human pose and shape recovery relies on deep neural networks and statistical morphable body models, such as the Skinned Multi-Person Linear model (SMPL). However, regardless of the advantages of having both body pose and shape, SMPL-based solutions have shown difficulties to predict 3D bodies accurately. This is mainly due to the unconstrained nature of SMPL, which m… ▽ More

    Submitted 8 August, 2019; v1 submitted 27 December, 2018; originally announced December 2018.

  40. arXiv:1812.00408  [pdf, other

    cs.CV

    A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing

    Authors: Shifeng Zhang, Xiaobo Wang, Ajian Liu, Chenxu Zhao, Jun Wan, Sergio Escalera, Hailin Shi, Zezheng Wang, Stan Z. Li

    Abstract: Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects ($\le\negmedspace170$) and modalities ($\leq\negmedspace2$), which hinder the further development of the academi… ▽ More

    Submitted 1 April, 2019; v1 submitted 2 December, 2018; originally announced December 2018.

    Comments: CVPR2019 Camera Ready

  41. arXiv:1811.10698  [pdf, other

    cs.CV

    LSTA: Long Short-Term Attention for Egocentric Action Recognition

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: Egocentric activity recognition is one of the most challenging tasks in video analysis. It requires a fine-grained discrimination of small objects and their manipulation. While some methods base on strong supervision and attention mechanisms, they are either annotation consuming or do not take spatio-temporal patterns into account. In this paper we propose LSTA as a mechanism to focus on features… ▽ More

    Submitted 12 April, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

    Comments: Accepted to CVPR 2019

  42. arXiv:1811.08935  [pdf, other

    eess.AS cs.CL cs.SD

    A Study of Language and Classifier-independent Feature Analysis for Vocal Emotion Recognition

    Authors: Fatemeh Noroozi, Marina Marjanovic, Angelina Njegus, Sergio Escalera, Gholamreza Anbarjafari

    Abstract: Every speech signal carries implicit information about the emotions, which can be extracted by speech processing methods. In this paper, we propose an algorithm for extracting features that are independent from the spoken language and the classification method to have comparatively good recognition performance on different languages independent from the employed classification methods. The propose… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.

    Comments: 24 pages, 4 figure

  43. arXiv:1809.08064  [pdf, other

    cs.CV

    From 2D to 3D Geodesic-based Garment Matching

    Authors: Meysam Madadi, Egils Avots, Sergio Escalera, Jordi Gonzalez, Xavier Baro, Gholamreza Anbarjafari

    Abstract: A new approach for 2D to 3D garment retexturing is proposed based on Gaussian mixture models and thin plate splines (TPS). An automatically segmented garment of an individual is matched to a new source garment and rendered, resulting in augmented images in which the target garment has been retextured by using the texture of the source garment. We divide the problem into garment boundary matching b… ▽ More

    Submitted 21 September, 2018; originally announced September 2018.

  44. Beyond One-hot Encoding: lower dimensional target embedding

    Authors: Pau Rodríguez, Miguel A. Bautista, Jordi Gonzàlez, Sergio Escalera

    Abstract: Target encoding plays a central role when learning Convolutional Neural Networks. In this realm, One-hot encoding is the most prevalent strategy due to its simplicity. However, this so widespread encoding schema assumes a flat label space, thus ignoring rich relationships existing among labels that can be exploited during training. In large-scale datasets, data does not span the full label space,… ▽ More

    Submitted 28 June, 2018; originally announced June 2018.

    Comments: Published at Image and Vision Computing

  45. arXiv:1805.03064  [pdf, other

    cs.CV

    Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues

    Authors: Cristina Palmero, Javier Selva, Mohammad Ali Bagheri, Sergio Escalera

    Abstract: Gaze behavior is an important non-verbal cue in social signal processing and human-computer interaction. In this paper, we tackle the problem of person- and head pose-independent 3D gaze estimation from remote cameras, using a multi-modal recurrent convolutional neural network (CNN). We propose to combine face, eyes region, and face landmarks as individual streams in a CNN to estimate gaze in stil… ▽ More

    Submitted 17 September, 2018; v1 submitted 8 May, 2018; originally announced May 2018.

    Comments: Proc. of British Machine Vision Conference (BMVC), BMVC 2018. Errata: in pg.5 the camera matrices of the transformation matrix W should be interchanged (correct version: W=C_n*M*(C_o)^-1)

  46. arXiv:1804.08046  [pdf, other

    cs.CV

    First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis

    Authors: Julio C. S. Jacques Junior, Yağmur Güçlütürk, Marc Pérez, Umut Güçlü, Carlos Andujar, Xavier Baró, Hugo Jair Escalante, Isabelle Guyon, Marcel A. J. van Gerven, Rob van Lier, Sergio Escalera

    Abstract: Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing inter… ▽ More

    Submitted 17 July, 2019; v1 submitted 21 April, 2018; originally announced April 2018.

    Comments: Accepted on IEEE Transactions on Affective Computing (TAC)

  47. arXiv:1804.04419  [pdf, other

    cs.CV

    Exploiting feature representations through similarity learning, post-ranking and ranking aggregation for person re-identification

    Authors: Julio C. S. Jacques Junior, Xavier Baró, Sergio Escalera

    Abstract: Person re-identification has received special attention by the human analysis community in the last few years. To address the challenges in this field, many researchers have proposed different strategies, which basically exploit either cross-view invariant features or cross-view robust metrics. In this work, we propose to exploit a post-ranking approach and combine different feature representation… ▽ More

    Submitted 12 April, 2018; originally announced April 2018.

    Comments: Preprint submitted to Image and Vision Computing

  48. arXiv:1803.05873  [pdf, other

    cs.CV

    Deep Structure Inference Network for Facial Action Unit Recognition

    Authors: Ciprian A. Corneanu, Meysam Madadi, Sergio Escalera

    Abstract: Facial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for develo** general facial expression analysis. In recent years, most efforts in automatic AU recognition have been dedicated to learning combinations of local features and to exploiting correlations between Action Units. In this paper, we propose a deep neural architecture that tackles both… ▽ More

    Submitted 23 March, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

  49. arXiv:1802.00745  [pdf, other

    cs.CV

    Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos

    Authors: Hugo Jair Escalante, Heysem Kaya, Albert Ali Salah, Sergio Escalera, Yagmur Gucluturk, Umut Guclu, Xavier Baro, Isabelle Guyon, Julio Jacques Junior, Meysam Madadi, Stephane Ayache, Evelyne Viegas, Furkan Gurpinar, Achmadnoer Sukma Wicaksana, Cynthia C. S. Liem, Marcel A. J. van Gerven, Rob van Lier

    Abstract: Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in… ▽ More

    Submitted 28 September, 2019; v1 submitted 2 February, 2018; originally announced February 2018.

    Comments: Preprint submitted to TAC

  50. arXiv:1801.07481  [pdf, other

    cs.CV

    Survey on Emotional Body Gesture Recognition

    Authors: Fatemeh Noroozi, Ciprian Adrian Corneanu, Dorota Kamińska, Tomasz Sapiński, Sergio Escalera, Gholamreza Anbarjafari

    Abstract: Automatic emotion recognition has become a trending research topic in the past decade. While works based on facial expressions or speech abound, recognizing affect from body gestures remains a less explored topic. We present a new comprehensive survey ho** to boost research in the field. We first introduce emotional body gestures as a component of what is commonly known as "body language" and co… ▽ More

    Submitted 23 January, 2018; originally announced January 2018.