Skeleton Aware Multi-modal Sign Language Recognition

Jiang, Songyao; Sun, Bin; Wang, Lichen; Bai, Yue; Li, Kunpeng; Fu, Yun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.08833v3 (cs)

[Submitted on 16 Mar 2021 (v1), revised 24 Mar 2021 (this version, v3), latest version 2 May 2021 (v5)]

Title:Skeleton Aware Multi-modal Sign Language Recognition

Authors:Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, Yun Fu

View PDF

Abstract:Sign language is used by deaf or speech impaired people to communicate and requires great effort to master. Sign Language Recognition (SLR) aims to bridge between sign language users and others by recognizing words from given videos. It is an important yet challenging task since sign language is performed with fast and complex movement of hand gestures, body posture, and even facial expressions. Recently, skeleton-based action recognition attracts increasing attention due to the independence on subject and background variation. Furthermore, it can be a strong complement to RGB/D modalities to boost the overall recognition rate. However, skeleton-based SLR is still under exploration due to the lack of annotations on hand keypoints. Some efforts have been made to use hand detectors with pose estimators to extract hand key points and learn to recognize sign language via a Recurrent Neural Network, but none of them outperforms RGB-based methods. To this end, we propose a novel Skeleton Aware Multi-modal SLR framework (SAM-SLR) to further improve the recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics and propose a novel Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. Our skeleton-based method achieves a higher recognition rate compared with all other single modalities. Moreover, our proposed SAM-SLR framework can further enhance the performance by assembling our skeleton-based method with other RGB and depth modalities. As a result, SAM-SLR achieves the highest performance in both RGB (98.42\%) and RGB-D (98.53\%) tracks in 2021 Looking at People Large Scale Signer Independent Isolated SLR Challenge. Our code is available at this https URL

Comments:	This submission is a preprint version of our work at CVPR2021 Challenge on Looking at People Large Scale Signer Independent Isolated SLR. Our workshop version will be updated here soon
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2103.08833 [cs.CV]
	(or arXiv:2103.08833v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.08833

Submission history

From: Songyao Jiang [view email]
[v1] Tue, 16 Mar 2021 03:38:17 UTC (901 KB)
[v2] Mon, 22 Mar 2021 04:26:11 UTC (1,085 KB)
[v3] Wed, 24 Mar 2021 15:43:22 UTC (1,071 KB)
[v4] Fri, 26 Mar 2021 19:37:31 UTC (973 KB)
[v5] Sun, 2 May 2021 20:49:40 UTC (780 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Skeleton Aware Multi-modal Sign Language Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Skeleton Aware Multi-modal Sign Language Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators