Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Wang, Zhiwei; Wang, Yunji; Zhang, Zhongwang; Zhou, Zhangchen; **, Hui; Hu, Tianyang; Sun, Jiacheng; Li, Zhenguo; Zhang, Yaoyu; Xu, Zhi-Qin John

Computer Science > Artificial Intelligence

arXiv:2405.15302 (cs)

[Submitted on 24 May 2024]

Title:Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Authors:Zhiwei Wang, Yunji Wang, Zhongwang Zhang, Zhangchen Zhou, Hui **, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Yaoyu Zhang, Zhi-Qin John Xu

View PDF HTML (experimental)

Abstract:Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving. Investigating the internal reasoning mechanisms of these models can help us design better model architectures and training strategies, ultimately enhancing their reasoning capabilities. In this study, we examine the matching mechanism employed by Transformer for multi-step reasoning on a constructed dataset. We investigate factors that influence the model's matching mechanism and discover that small initialization and post-LayerNorm can facilitate the formation of the matching mechanism, thereby enhancing the model's reasoning ability. Moreover, we propose a method to improve the model's reasoning capability by adding orthogonal noise. Finally, we investigate the parallel reasoning mechanism of Transformers and propose a conjecture on the upper bound of the model's reasoning ability based on this phenomenon. These insights contribute to a deeper understanding of the reasoning processes in large language models and guide designing more effective reasoning architectures and training strategies.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2405.15302 [cs.AI]
	(or arXiv:2405.15302v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2405.15302

Submission history

From: Zhiwei Wang [view email]
[v1] Fri, 24 May 2024 07:41:26 UTC (38,944 KB)

Computer Science > Artificial Intelligence

Title:Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators