-
FP8 versus INT8 for efficient deep learning inference
Authors:
Mart van Baalen,
Andrey Kuzmin,
Suparna S Nair,
Yuwei Ren,
Eric Mahurin,
Chirag Patel,
Sundar Subramanian,
Sanghyuk Lee,
Markus Nagel,
Joseph Soriaga,
Tijmen Blankevoort
Abstract:
Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive t…
▽ More
Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive training procedures in deep learning. A natural question arises regarding what this development means for efficient inference on edge devices. In the efficient inference device world, workloads are frequently executed in INT8. Sometimes going even as low as INT4 when efficiency calls for it. In this whitepaper, we compare the performance for both the FP8 and INT formats for efficient on-device inference. We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice. We also provide a hardware analysis showing that the FP formats are somewhere between 50-180% less efficient in terms of compute in dedicated hardware than the INT format. Based on our research and a read of the research field, we conclude that although the proposed FP8 format could be good for training, the results for inference do not warrant a dedicated implementation of FP8 in favor of INT8 for efficient inference. We show that our results are mostly consistent with previous findings but that important comparisons between the formats have thus far been lacking. Finally, we discuss what happens when FP8-trained networks are converted to INT8 and conclude with a brief discussion on the most efficient way for on-device deployment and an extensive suite of INT8 results for many models.
△ Less
Submitted 15 June, 2023; v1 submitted 31 March, 2023;
originally announced March 2023.
-
Learning to Precode for Integrated Sensing and Communications Systems
Authors:
R. S. Prasobh Sankar,
Sidharth S. Nair,
Siddhant Doshi,
Sundeep Prabhakar Chepuri
Abstract:
In this paper, we present an unsupervised learning neural model to design transmit precoders for integrated sensing and communication (ISAC) systems to maximize the worst-case target illumination power while ensuring a minimum signal-to-interference-plus-noise ratio (SINR) for all the users. The problem of learning transmit precoders from uplink pilots and echoes can be viewed as a parameterized f…
▽ More
In this paper, we present an unsupervised learning neural model to design transmit precoders for integrated sensing and communication (ISAC) systems to maximize the worst-case target illumination power while ensuring a minimum signal-to-interference-plus-noise ratio (SINR) for all the users. The problem of learning transmit precoders from uplink pilots and echoes can be viewed as a parameterized function estimation problem and we propose to learn this function using a neural network model. To learn the neural network parameters, we develop a novel loss function based on the first-order optimality conditions to incorporate the SINR and power constraints. Through numerical simulations, we demonstrate that the proposed method outperforms traditional optimization-based methods in presence of channel estimation errors while incurring lesser computational complexity and generalizing well across different channel conditions that were not shown during training.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
On query complexity measures and their relations for symmetric functions
Authors:
Rajat Mittal,
Sanjay S Nair,
Sunayana Patro
Abstract:
The main reason for query model's prominence in complexity theory and quantum computing is the presence of concrete lower bounding techniques: polynomial and adversary method. There have been considerable efforts to give lower bounds using these methods, and to compare/relate them with other measures based on the decision tree.
We explore the value of these lower bounds on quantum query complexi…
▽ More
The main reason for query model's prominence in complexity theory and quantum computing is the presence of concrete lower bounding techniques: polynomial and adversary method. There have been considerable efforts to give lower bounds using these methods, and to compare/relate them with other measures based on the decision tree.
We explore the value of these lower bounds on quantum query complexity and their relation with other decision tree based complexity measures for the class of symmetric functions, arguably one of the most natural and basic sets of Boolean functions. We show an explicit construction for the dual of the positive adversary method and also of the square root of private coin certificate game complexity for any total symmetric function. This shows that the two values can't be distinguished for any symmetric function. Additionally, we show that the recently introduced measure of spectral sensitivity gives the same value as both positive adversary and approximate degree for every total symmetric Boolean function.
Further, we look at the quantum query complexity of Gap Majority, a partial symmetric function. It has gained importance recently in regard to understanding the composition of randomized query complexity. We characterize the quantum query complexity of Gap Majority and show a lower bound on noisy randomized query complexity (Ben-David and Blais, FOCS 2020) in terms of quantum query complexity.
Finally, we study how large certificate complexity and block sensitivity can be as compared to sensitivity for symmetric functions (even up to constant factors). We show tight separations, i.e., give upper bounds on possible separations and construct functions achieving the same.
△ Less
Submitted 19 February, 2024; v1 submitted 24 October, 2021;
originally announced October 2021.