-
ChipNeMo: Domain-Adapted LLMs for Chip Design
Authors:
Mingjie Liu,
Teodor-Dumitru Ene,
Robert Kirby,
Chris Cheng,
Nathaniel Pinckney,
Rongjian Liang,
Jonah Alben,
Himyanshu Anand,
Sanmitra Banerjee,
Ismet Bayraktaroglu,
Bonita Bhaskaran,
Bryan Catanzaro,
Arjun Chaudhuri,
Sharon Clay,
Bill Dally,
Laura Dang,
Parikshit Deshpande,
Siddhanth Dhodhi,
Sameer Halepete,
Eric Hill,
Jiashang Hu,
Sumit Jain,
Ankit **dal,
Brucek Khailany,
George Kokai
, et al. (17 additional authors not shown)
Abstract:
ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We e…
▽ More
ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our evaluations demonstrate that domain-adaptive pretraining of language models, can lead to superior performance in domain related downstream tasks compared to their base LLaMA2 counterparts, without degradations in generic capabilities. In particular, our largest model, ChipNeMo-70B, outperforms the highly capable GPT-4 on two of our use cases, namely engineering assistant chatbot and EDA scripts generation, while exhibiting competitive performance on bug summarization and analysis. These results underscore the potential of domain-specific customization for enhancing the effectiveness of large language models in specialized applications.
△ Less
Submitted 4 April, 2024; v1 submitted 31 October, 2023;
originally announced November 2023.
-
FP8 Formats for Deep Learning
Authors:
Paulius Micikevicius,
Dusan Stosic,
Neil Burgess,
Marius Cornea,
Pradeep Dubey,
Richard Grisenthwaite,
Sangwon Ha,
Alexander Heinecke,
Patrick Judd,
John Kamalu,
Naveen Mellempudi,
Stuart Oberman,
Mohammad Shoeybi,
Michael Siu,
Hao Wu
Abstract:
FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bit exponent and 2-bit mantissa). While E5M2 follows IEEE 754 conventions for representatio of special…
▽ More
FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bit exponent and 2-bit mantissa). While E5M2 follows IEEE 754 conventions for representatio of special values, E4M3's dynamic range is extended by not representing infinities and having only one mantissa bit-pattern for NaNs. We demonstrate the efficacy of the FP8 format on a variety of image and language tasks, effectively matching the result quality achieved by 16-bit training sessions. Our study covers the main modern neural network architectures - CNNs, RNNs, and Transformer-based models, leaving all the hyperparameters unchanged from the 16-bit baseline training sessions. Our training experiments include large, up to 175B parameter, language models. We also examine FP8 post-training-quantization of language models trained using 16-bit formats that resisted fixed point int8 quantization.
△ Less
Submitted 29 September, 2022; v1 submitted 12 September, 2022;
originally announced September 2022.
-
PrefixRL: Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning
Authors:
Rajarshi Roy,
Jonathan Raiman,
Neel Kant,
Ilyas Elkin,
Robert Kirby,
Michael Siu,
Stuart Oberman,
Saad Godil,
Bryan Catanzaro
Abstract:
In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for const…
▽ More
In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for constructing legal prefix circuits. Deep Convolutional RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16.0% and 30.2% lower area for the same delay in the 32b and 64b settings respectively. We observe that agents trained with open-source synthesis tools and cell library can design adder circuits that achieve lower area and delay than commercial tool adders in an industrial cell library.
△ Less
Submitted 14 May, 2022;
originally announced May 2022.