Mnemosyne: Learning to Train Transformers with Transformers

Jain, Deepali; Choromanski, Krzysztof Marcin; Dubey, Avinava; Singh, Sumeet; Sindhwani, Vikas; Zhang, Tingnan; Tan, Jie

Computer Science > Machine Learning

arXiv:2302.01128v2 (cs)

[Submitted on 2 Feb 2023 (v1), revised 15 Jun 2023 (this version, v2), latest version 16 Jun 2023 (v3)]

Title:Mnemosyne: Learning to Train Transformers with Transformers

Authors:Deepali Jain, Krzysztof Marcin Choromanski, Avinava Dubey, Sumeet Singh, Vikas Sindhwani, Tingnan Zhang, Jie Tan

View PDF

Abstract:Training complex machine learning (ML) architectures requires a compute and time consuming process of selecting the right optimizer and tuning its hyper-parameters. A new paradigm of learning optimizers from data has emerged as a better alternative to hand-designed ML optimizers. We propose Mnemosyne optimizer, that uses Performers: implicit low-rank attention Transformers. It can learn to train entire neural network architectures including other Transformers without any task-specific optimizer tuning. We show that Mnemosyne: (a) generalizes better than popular LSTM optimizer, (b) in particular can successfully train Vision Transformers (ViTs) while meta--trained on standard MLPs and (c) can initialize optimizers for faster convergence in Robotics applications. We believe that these results open the possibility of using Transformers to build foundational optimization models that can address the challenges of regular Transformer training. We complement our results with an extensive theoretical analysis of the compact associative memory used by Mnemosyne.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2302.01128 [cs.LG]
	(or arXiv:2302.01128v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2302.01128

Submission history

From: Deepali Jain [view email]
[v1] Thu, 2 Feb 2023 14:40:28 UTC (13,321 KB)
[v2] Thu, 15 Jun 2023 14:20:59 UTC (31,497 KB)
[v3] Fri, 16 Jun 2023 20:15:43 UTC (31,497 KB)

Computer Science > Machine Learning

Title:Mnemosyne: Learning to Train Transformers with Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Mnemosyne: Learning to Train Transformers with Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators