Reducing the Barriers to Entry for Foundation Model Training
Authors:
Paolo Faraboschi,
Ellis Giles,
Justin Hotard,
Konstanty Owczarek,
Andrew Wheeler
Abstract:
The world has recently witnessed an unprecedented acceleration in demands for Machine Learning and Artificial Intelligence applications. This spike in demand has imposed tremendous strain on the underlying technology stack in supply chain, GPU-accelerated hardware, software, datacenter power density, and energy consumption. If left on the current technological trajectory, future demands show insur…
▽ More
The world has recently witnessed an unprecedented acceleration in demands for Machine Learning and Artificial Intelligence applications. This spike in demand has imposed tremendous strain on the underlying technology stack in supply chain, GPU-accelerated hardware, software, datacenter power density, and energy consumption. If left on the current technological trajectory, future demands show insurmountable spending trends, further limiting market players, stifling innovation, and widening the technology gap. To address these challenges, we propose a fundamental change in the AI training infrastructure throughout the technology ecosystem. The changes require advancements in supercomputing and novel AI training approaches, from high-end software to low-level hardware, microprocessor, and chip design, while advancing the energy efficiency required by a sustainable infrastructure. This paper presents the analytical framework that quantitatively highlights the challenges and points to the opportunities to reduce the barriers to entry for training large language models.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
Hardware Transactional Persistent Memory
Authors:
Ellis Giles,
Kshitij Doshi,
Peter Varman
Abstract:
Emerging Persistent Memory technologies (also PM, Non-Volatile DIMMs, Storage Class Memory or SCM) hold tremendous promise for accelerating popular data-management applications like in-memory databases. However, programmers now need to deal with ensuring the atomicity of transactions on Persistent Memory resident data and maintaining consistency between the order in which processors perform stores…
▽ More
Emerging Persistent Memory technologies (also PM, Non-Volatile DIMMs, Storage Class Memory or SCM) hold tremendous promise for accelerating popular data-management applications like in-memory databases. However, programmers now need to deal with ensuring the atomicity of transactions on Persistent Memory resident data and maintaining consistency between the order in which processors perform stores and that in which the updated values become durable.
The problem is specially challenging when high-performance isolation mechanisms like Hardware Transactional Memory (HTM) are used for concurrency control. This work shows how HTM transactions can be ordered correctly and atomically into PM by the use of a novel software protocol combined with a Persistent Memory Controller, without requiring changes to processor cache hardware or HTM protocols. In contrast, previous approaches require significant changes to existing processor microarchitectures. Our approach, evaluated using both micro-benchmarks and the STAMP suite compares well with standard (volatile) HTM transactions. It also yields significant gains in throughput and latency in comparison with persistent transactional locking.
△ Less
Submitted 22 May, 2018;
originally announced June 2018.