Search | arXiv e-print repository

HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis

Authors: Andy He, Darren Key, Mason Bulling, Andrew Chang, Skyler Shapiro, Everett Lee

Abstract: Graphics Processing Units (GPUs) have become the leading hardware accelerator for deep learning applications and are used widely in training and inference of transformers; transformers have achieved state-of-the-art performance in many areas of machine learning and are especially used in most modern Large Language Models (LLMs). However, GPUs require large amounts of energy, which poses environmen… ▽ More Graphics Processing Units (GPUs) have become the leading hardware accelerator for deep learning applications and are used widely in training and inference of transformers; transformers have achieved state-of-the-art performance in many areas of machine learning and are especially used in most modern Large Language Models (LLMs). However, GPUs require large amounts of energy, which poses environmental concerns, demands high operational costs, and causes GPUs to be unsuitable for edge computing. We develop an accelerator for transformers, namely, Llama 2, an open-source state-of-the-art LLM, using high level synthesis (HLS) on Field Programmable Gate Arrays (FPGAs). HLS allows us to rapidly prototype FPGA designs without writing code at the register-transfer level (RTL). We name our method HLSTransform, and the FPGA designs we synthesize with HLS achieve up to a 12.75x reduction and 8.25x reduction in energy used per token on the Xilinx Virtex UltraScale+ VU9P FPGA compared to an Intel Xeon Broadwell E5-2686 v4 CPU and NVIDIA RTX 3090 GPU respectively, while increasing inference speeds by up to 2.46x compared to CPU and maintaining 0.53x the speed of an RTX 3090 GPU despite the GPU's 4 times higher base clock rate. With the lack of existing open-source FPGA accelerators for transformers, we open-source our code and document our steps for synthesis. We hope this work will serve as a step in democratizing the use of FPGAs in transformer inference and inspire research into energy-efficient inference methods as a whole. The code can be found on https://github.com/HLSTransform/submission. △ Less

Submitted 29 April, 2024; originally announced May 2024.

Comments: 7 pages, 2 figures

arXiv:2402.12275 [pdf, other]

WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment

Authors: Hao Tang, Darren Key, Kevin Ellis

Abstract: We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment. The world model tries to explain its interactions, while also being optimistic about what reward it can achieve. We define this optimism as a logical constraint between a program and a planner. We study our agent on gridworlds, and on task planning, findi… ▽ More We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment. The world model tries to explain its interactions, while also being optimistic about what reward it can achieve. We define this optimism as a logical constraint between a program and a planner. We study our agent on gridworlds, and on task planning, finding our approach is more sample-efficient compared to deep RL, more compute-efficient compared to ReAct-style agents, and that it can transfer its knowledge across environments by editing its code. △ Less

Submitted 26 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2303.15646 [pdf]

7.86 kV GaN-on-GaN PN Power Diode with BaTiO3 for Electrical Field Management

Authors: Yibo Xu, Vijay Gopal Thirupakuzi Vangipuram, Vishank Telasara, Junao Cheng, Yuxuan Zhang, Tadao Hashimoto, Edward Letts, Daryl Key, Hong** Zhao, Wu Lu

Abstract: Device based on GaN have great potential for high power switching applications due to its high breakdown field and high electron mobility. In this work, we present the device design of a vertical GaN-on-GaN PN power diode using high dielectric constant (high-k) dielectrics for electrical field management and high breakdown voltages, in together with guard-rings and a field plate. The fabricated di… ▽ More Device based on GaN have great potential for high power switching applications due to its high breakdown field and high electron mobility. In this work, we present the device design of a vertical GaN-on-GaN PN power diode using high dielectric constant (high-k) dielectrics for electrical field management and high breakdown voltages, in together with guard-rings and a field plate. The fabricated diodes with a 57 um thick drift layer demonstrated a breakdown voltage of 7.86 kV on a bulk GaN substrate. The device has an on-resistance of 2.8 mohm.cm2 and a Baliga figure of merit of 22 GW/cm2. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: 4 pages, 6 figures

arXiv:2210.00848 [pdf, other]

Toward Trustworthy Neural Program Synthesis

Authors: Darren Key, Wen-Ding Li, Kevin Ellis

Abstract: We develop an approach to estimate the probability that a program sampled from a large language model is correct. Given a natural language description of a programming problem, our method samples both candidate programs as well as candidate predicates specifying how the program should behave. This allows learning a model that forms a well-calibrated probabilistic prediction of program correctness.… ▽ More We develop an approach to estimate the probability that a program sampled from a large language model is correct. Given a natural language description of a programming problem, our method samples both candidate programs as well as candidate predicates specifying how the program should behave. This allows learning a model that forms a well-calibrated probabilistic prediction of program correctness. Our system also infers which predicates are useful to explain the behavior of the generated code, and humans preferred these in a human study over raw language model outputs. Our method is simple, easy to implement, and maintains state of the art generation accuracy results. △ Less

Submitted 9 October, 2023; v1 submitted 29 September, 2022; originally announced October 2022.

Comments: 9 pages, 8 figures

Showing 1–4 of 4 results for author: Key, D