AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism

Zhong, Chongyang; Hu, Lei; Zhang, Zihao; Xia, Shihong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.00796 (cs)

[Submitted on 2 Sep 2023]

Title:AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism

Authors:Chongyang Zhong, Lei Hu, Zihao Zhang, Shihong Xia

View PDF

Abstract:Generating 3D human motion based on textual descriptions has been a research focus in recent years. It requires the generated motion to be diverse, natural, and conform to the textual description. Due to the complex spatio-temporal nature of human motion and the difficulty in learning the cross-modal relationship between text and motion, text-driven motion generation is still a challenging problem. To address these issues, we propose \textbf{AttT2M}, a two-stage method with multi-perspective attention mechanism: \textbf{body-part attention} and \textbf{global-local motion-text attention}. The former focuses on the motion embedding perspective, which means introducing a body-part spatio-temporal encoder into VQ-VAE to learn a more expressive discrete latent space. The latter is from the cross-modal perspective, which is used to learn the sentence-level and word-level motion-text cross-modal relationship. The text-driven motion is finally generated with a generative transformer. Extensive experiments conducted on HumanML3D and KIT-ML demonstrate that our method outperforms the current state-of-the-art works in terms of qualitative and quantitative evaluation, and achieve fine-grained synthesis and action2motion. Our code is in this https URL

Comments:	IEEE International Conference on Computer Vision 2023, 9 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.00796 [cs.CV]
	(or arXiv:2309.00796v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.00796

Submission history

From: Chongyang Zhong [view email]
[v1] Sat, 2 Sep 2023 02:18:17 UTC (2,310 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators