Search | arXiv e-print repository

Multi-Perspective LSTM for Joint Visual Representation Learning

Authors: Alireza Sepas-Moghaddam, Fernando Pereira, Paulo Lobato Correia, Ali Etemad

Abstract: We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level. We demonstrate that by using the proposed cell to create a network, more effective and richer visu… ▽ More We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level. We demonstrate that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks. We validate the performance of our proposed architecture in the context of two multi-perspective visual recognition tasks namely lip reading and face recognition. Three relevant datasets are considered and the results are compared against fusion strategies, other existing multi-input LSTM architectures, and alternative recognition solutions. The experiments show the superior performance of our solution over the considered benchmarks, both in terms of recognition accuracy and complexity. We make our code publicly available at https://github.com/arsm/MPLSTM. △ Less

Submitted 6 May, 2021; originally announced May 2021.

Comments: Accepted to CVPR2021. Project link: https://github.com/arsm/MPLSTM

arXiv:2105.01634 [pdf, other]

doi 10.3390/diagnostics11101824

Remote Pathological Gait Classification System

Authors: Pedro Albuquerque, Joao Machado, Tanmay Tulsidas Verlekar, Luis Ducla Soares, Paulo Lobato Correia

Abstract: Several pathologies can alter the way people walk, i.e. their gait. Gait analysis can therefore be used to detect impairments and help diagnose illnesses and assess patient recovery. Using vision-based systems, diagnoses could be done at home or in a clinic, with the needed computation being done remotely. State-of-the-art vision-based gait analysis systems use deep learning, requiring large datas… ▽ More Several pathologies can alter the way people walk, i.e. their gait. Gait analysis can therefore be used to detect impairments and help diagnose illnesses and assess patient recovery. Using vision-based systems, diagnoses could be done at home or in a clinic, with the needed computation being done remotely. State-of-the-art vision-based gait analysis systems use deep learning, requiring large datasets for training. However, to our best knowledge, the biggest publicly available pathological gait dataset contains only 10 subjects, simulating 4 gait pathologies. This paper presents a new dataset called GAIT-IT, captured from 21 subjects simulating 4 gait pathologies, with 2 severity levels, besides normal gait, being considerably larger than publicly available gait pathology datasets, allowing to train a deep learning model for gait pathology classification. Moreover, it was recorded in a professional studio, making it possible to obtain nearly perfect silhouettes, free of segmentation errors. Recognizing the importance of remote healthcare, this paper proposes a prototype of a web application allowing to upload a walking person's video, possibly acquired using a smartphone camera, and execute a web service that classifies the person's gait as normal or across different pathologies. The web application has a user friendly interface and could be used by healthcare professionals or other end users. An automatic gait analysis system is also developed and integrated with the web application for pathology classification. Compared to state-of-the-art solutions, it achieves a drastic reduction in the number of model parameters, which means significantly lower memory requirements, as well as lower training and execution times. Classification accuracy is on par with the state-of-the-art. △ Less

Submitted 4 May, 2021; originally announced May 2021.

Journal ref: https://www.mdpi.com/2075-4418/11/10/1824

arXiv:2101.03503 [pdf, other]

doi 10.1109/TIP.2021.3054476

CapsField: Light Field-based Face and Expression Recognition in the Wild using Capsule Routing

Authors: Alireza Sepas-Moghaddam, Ali Etemad, Fernando Pereira, Paulo Lobato Correia

Abstract: Light field (LF) cameras provide rich spatio-angular visual representations by sensing the visual scene from multiple perspectives and have recently emerged as a promising technology to boost the performance of human-machine systems such as biometrics and affective computing. Despite the significant success of LF representation for constrained facial image analysis, this technology has never been… ▽ More Light field (LF) cameras provide rich spatio-angular visual representations by sensing the visual scene from multiple perspectives and have recently emerged as a promising technology to boost the performance of human-machine systems such as biometrics and affective computing. Despite the significant success of LF representation for constrained facial image analysis, this technology has never been used for face and expression recognition in the wild. In this context, this paper proposes a new deep face and expression recognition solution, called CapsField, based on a convolutional neural network and an additional capsule network that utilizes dynamic routing to learn hierarchical relations between capsules. CapsField extracts the spatial features from facial images and learns the angular part-whole relations for a selected set of 2D sub-aperture images rendered from each LF image. To analyze the performance of the proposed solution in the wild, the first in the wild LF face dataset, along with a new complementary constrained face dataset captured from the same subjects recorded earlier have been captured and are made available. A subset of the in the wild dataset contains facial images with different expressions, annotated for usage in the context of face expression recognition tests. An extensive performance assessment study using the new datasets has been conducted for the proposed and relevant prior solutions, showing that the CapsField proposed solution achieves superior performance for both face and expression recognition tasks when compared to the state-of-the-art. △ Less

Submitted 10 January, 2021; originally announced January 2021.

Comments: Accepted in IEEE Transactions on Image Processing (IEEE T-IP)

arXiv:1905.04421 [pdf, other]

Long Short-Term Memory with Gate and State Level Fusion for Light Field-Based Face Recognition

Authors: Alireza Sepas-Moghaddam, Ali Etemad, Fernando Pereira, Paulo Lobato Correia

Abstract: Long Short-Term Memory (LSTM) is a prominent recurrent neural network for extracting dependencies from sequential data such as time-series and multi-view data, having achieved impressive results for different visual recognition tasks. A conventional LSTM network can learn a model to posteriorly extract information from one input sequence. However, if two or more dependent sequences of data are sim… ▽ More Long Short-Term Memory (LSTM) is a prominent recurrent neural network for extracting dependencies from sequential data such as time-series and multi-view data, having achieved impressive results for different visual recognition tasks. A conventional LSTM network can learn a model to posteriorly extract information from one input sequence. However, if two or more dependent sequences of data are simultaneously acquired, the conventional LSTM networks may only process those sequences consecutively, not taking benefit of the information carried out by their mutual dependencies. In this context, this paper proposes two novel LSTM cell architectures that are able to jointly learn from multiple sequences simultaneously acquired, targeting to create richer and more effective models for recognition tasks. The efficacy of the novel LSTM cell architectures is assessed by integrating them into deep learning-based methods for face recognition with multi-view, light field images. The new cell architectures jointly learn the scene horizontal and vertical parallaxes available in a light field image, to capture richer spatio-angular information from both directions. A comprehensive evaluation, with the IST-EURECOM LFFD dataset using three challenging evaluation protocols, shows the advantage of using the novel LSTM cell architectures for face recognition over the state-of-the-art light field-based methods. These results highlight the added value of the novel cell architectures when learning from correlated input sequences. △ Less

Submitted 1 June, 2020; v1 submitted 10 May, 2019; originally announced May 2019.

Comments: Submitted to IEEE TIFS

arXiv:1901.00713 [pdf]

Face Recognition: A Novel Multi-Level Taxonomy based Survey

Authors: Alireza Sepas-Moghaddam, Fernando Pereira, Paulo Lobato Correia

Abstract: In a world where security issues have been gaining growing importance, face recognition systems have attracted increasing attention in multiple application areas, ranging from forensics and surveillance to commerce and entertainment. To help understanding the landscape and abstraction levels relevant for face recognition systems, face recognition taxonomies allow a deeper dissection and comparison… ▽ More In a world where security issues have been gaining growing importance, face recognition systems have attracted increasing attention in multiple application areas, ranging from forensics and surveillance to commerce and entertainment. To help understanding the landscape and abstraction levels relevant for face recognition systems, face recognition taxonomies allow a deeper dissection and comparison of the existing solutions. This paper proposes a new, more encompassing and richer multi-level face recognition taxonomy, facilitating the organization and categorization of available and emerging face recognition solutions; this taxonomy may also guide researchers in the development of more efficient face recognition solutions. The proposed multi-level taxonomy considers levels related to the face structure, feature support and feature extraction approach. Following the proposed taxonomy, a comprehensive survey of representative face recognition solutions is presented. The paper concludes with a discussion on current algorithmic and application related challenges which may define future research directions for face recognition. △ Less

Submitted 3 January, 2019; originally announced January 2019.

Comments: This paper is a preprint of a paper submitted to IET Biometrics. If accepted, the copy of record will be available at the IET Digital Library

arXiv:1805.10078 [pdf]

A Double-Deep Spatio-Angular Learning Framework for Light Field based Face Recognition

Authors: Alireza Sepas-Moghaddam, Mohammad A. Haque, Paulo Lobato Correia, Kamal Nasrollahi, Thomas B. Moeslund, Fernando Pereira

Abstract: Face recognition has attracted increasing attention due to its wide range of applications, but it is still challenging when facing large variations in the biometric data characteristics. Lenslet light field cameras have recently come into prominence to capture rich spatio-angular information, thus offering new possibilities for advanced biometric recognition systems. This paper proposes a double-d… ▽ More Face recognition has attracted increasing attention due to its wide range of applications, but it is still challenging when facing large variations in the biometric data characteristics. Lenslet light field cameras have recently come into prominence to capture rich spatio-angular information, thus offering new possibilities for advanced biometric recognition systems. This paper proposes a double-deep spatio-angular learning framework for light field based face recognition, which is able to learn both texture and angular dynamics in sequence using convolutional representations; this is a novel recognition framework that has never been proposed before for either face recognition or any other visual recognition task. The proposed double-deep learning framework includes a long short-term memory (LSTM) recurrent network whose inputs are VGG-Face descriptions that are computed using a VGG-Very-Deep-16 convolutional neural network (CNN). The VGG-16 network uses different face viewpoints rendered from a full light field image, which are organised as a pseudo-video sequence. A comprehensive set of experiments has been conducted with the IST-EURECOM light field face database, for varied and challenging recognition tasks. Results show that the proposed framework achieves superior face recognition performance when compared to the state-of-the-art. △ Less

Submitted 24 April, 2019; v1 submitted 25 May, 2018; originally announced May 2018.

Comments: Submitted to IEEE Transactions on Circuits and Systems for Video Technology

Showing 1–6 of 6 results for author: Correia, P L