Search | arXiv e-print repository

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings

Authors: Norman P. Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Cliff Young, Xiang Zhou, Zongwei Zhou, David Patterson

Abstract: In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and perfo… ▽ More In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance; users can pick a twisted 3D torus topology if desired. Much cheaper, lower power, and faster than Infiniband, OCSes and underlying optical components are <5% of system cost and <3% of system power. Each TPU v4 includes SparseCores, dataflow processors that accelerate models that rely on embeddings by 5x-7x yet use only 5% of die area and power. Deployed since 2020, TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. The TPU v4 supercomputer is 4x larger at 4096 chips and thus ~10x faster overall, which along with OCS flexibility helps large language models. For similar sized systems, it is ~4.3x-4.5x faster than the Graphcore IPU Bow and is 1.2x-1.7x faster and uses 1.3x-1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use ~3x less energy and produce ~20x less CO2e than contemporary DSAs in a typical on-premise data center. △ Less

Submitted 20 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: 15 pages; 16 figures; to be published at ISCA 2023 (the International Symposium on Computer Architecture)

arXiv:1907.07181 [pdf]

Deciphering Dynamical Nonlinearities in Short Time Series Using Recurrent Neural Networks

Authors: Radhakrishnan Nagarajan

Abstract: Surrogate testing techniques have been used widely to investigate the presence of dynamical nonlinearities, an essential ingredient of deterministic chaotic processes. Traditional surrogate testing subscribes to statistical hypothesis testing and investigates potential differences in discriminant statistics between the given empirical sample and its surrogate counterparts. The choice and estimatio… ▽ More Surrogate testing techniques have been used widely to investigate the presence of dynamical nonlinearities, an essential ingredient of deterministic chaotic processes. Traditional surrogate testing subscribes to statistical hypothesis testing and investigates potential differences in discriminant statistics between the given empirical sample and its surrogate counterparts. The choice and estimation of the discriminant statistics can be challenging across short time series. Also, conclusion based on a single empirical sample is an inherent limitation. The present study proposes a recurrent neural network classification framework that uses the raw time series obviating the need for discriminant statistic while accommodating multiple time series realizations for enhanced generalizability of the findings. The results are demonstrated on short time series with lengths (L = 32, 64, 128) from continuous and discrete dynamical systems in chaotic regimes, nonlinear transform of linearly correlated noise and experimental data. Accuracy of the classifier is shown to be markedly higher than >> 50% for the processes in chaotic regimes whereas those of nonlinearly correlated noise were around ~50% similar to that of random guess from a one-sample binomial test. These results are promising and elucidate the usefulness of the proposed framework in identifying potential dynamical nonlinearities from short experimental time series. △ Less

Submitted 15 July, 2019; originally announced July 2019.

Comments: 18 pages, 7 Figures, 1 Table

arXiv:1704.04760 [pdf]

In-Datacenter Performance Analysis of a Tensor Processing Unit

Authors: Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg , et al. (50 additional authors not shown)

Abstract: Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOp… ▽ More Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU. △ Less

Submitted 16 April, 2017; originally announced April 2017.

Comments: 17 pages, 11 figures, 8 tables. To appear at the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 24-28, 2017

arXiv:1511.01568 [pdf, ps, other]

doi 10.4204/EPTCS.195.6

Formalization of Quantum Protocols using Coq

Authors: Jaap Boender, Florian Kammüller, Rajagopal Nagarajan

Abstract: Quantum Information Processing, which is an exciting area of research at the intersection of physics and computer science, has great potential for influencing the future development of information processing systems. The building of practical, general purpose Quantum Computers may be some years into the future. However, Quantum Communication and Quantum Cryptography are well developed. Commerci… ▽ More Quantum Information Processing, which is an exciting area of research at the intersection of physics and computer science, has great potential for influencing the future development of information processing systems. The building of practical, general purpose Quantum Computers may be some years into the future. However, Quantum Communication and Quantum Cryptography are well developed. Commercial Quantum Key Distribution systems are easily available and several QKD networks have been built in various parts of the world. The security of the protocols used in these implementations rely on information-theoretic proofs, which may or may not reflect actual system behaviour. Moreover, testing of implementations cannot guarantee the absence of bugs and errors. This paper presents a novel framework for modelling and verifying quantum protocols and their implementations using the proof assistant Coq. We provide a Coq library for quantum bits (qubits), quantum gates, and quantum measurement. As a step towards verifying practical quantum communication and security protocols such as Quantum Key Distribution, we support multiple qubits, communication and entanglement. We illustrate these concepts by modelling the Quantum Teleportation Protocol, which communicates the state of an unknown quantum bit using only a classical channel. △ Less

Submitted 4 November, 2015; originally announced November 2015.

Comments: In Proceedings QPL 2015, arXiv:1511.01181

Journal ref: EPTCS 195, 2015, pp. 71-83

arXiv:1312.5951 [pdf, other]

Automated Verification of Quantum Protocols by Equivalence Checking

Authors: Ebrahim Ardeshir-Larijani, Simon J. Gay, Rajagopal Nagarajan

Abstract: In this paper we introduce a technique and a tool for formal verification of various quantum information processing protocols. The tool uses stabilizer formalism and is capable of representing concurrent quantum protocol, thus is more expressive than quantum circuits. We also report on experimental results of using our Quantum Equivalence Checker (QEC) to analyse a range of quantum information pro… ▽ More In this paper we introduce a technique and a tool for formal verification of various quantum information processing protocols. The tool uses stabilizer formalism and is capable of representing concurrent quantum protocol, thus is more expressive than quantum circuits. We also report on experimental results of using our Quantum Equivalence Checker (QEC) to analyse a range of quantum information processing protocols. △ Less

Submitted 20 December, 2013; originally announced December 2013.

arXiv:1210.0614 [pdf, ps, other]

doi 10.4204/EPTCS.95.7

Analysis of a Quantum Error Correcting Code using Quantum Process Calculus

Authors: Timothy A. S. Davidson, Simon J. Gay, Rajagopal Nagarajan, Ittoop Vergheese Puthoor

Abstract: We describe the use of quantum process calculus to describe and analyze quantum communication protocols, following the successful field of formal methods from classical computer science. The key idea is to define two systems, one modelling a protocol and one expressing a specification, and prove that they are behaviourally equivalent. We summarize the necessary theory in the process calculus CQP,… ▽ More We describe the use of quantum process calculus to describe and analyze quantum communication protocols, following the successful field of formal methods from classical computer science. The key idea is to define two systems, one modelling a protocol and one expressing a specification, and prove that they are behaviourally equivalent. We summarize the necessary theory in the process calculus CQP, including the crucial result that equivalence is a congruence, meaning that it is preserved by embedding in any context. We illustrate the approach by analyzing two versions of a quantum error correction system. △ Less

Submitted 1 October, 2012; originally announced October 2012.

Comments: In Proceedings QPL 2011, arXiv:1210.0298

ACM Class: D.3.1; F.3.1

Journal ref: EPTCS 95, 2012, pp. 67-80

arXiv:1108.0469 [pdf, ps, other]

doi 10.4204/EPTCS.59.9

Formal Analysis of Quantum Systems using Process Calculus

Authors: Timothy A. S. Davidson, Simon J. Gay, Rajagopal Nagarajan

Abstract: Quantum communication and cryptographic protocols are well on the way to becoming an important practical technology. Although a large amount of successful research has been done on proving their correctness, most of this work does not make use of familiar techniques from formal methods, such as formal logics for specification, formal modelling languages, separation of levels of abstraction, and co… ▽ More Quantum communication and cryptographic protocols are well on the way to becoming an important practical technology. Although a large amount of successful research has been done on proving their correctness, most of this work does not make use of familiar techniques from formal methods, such as formal logics for specification, formal modelling languages, separation of levels of abstraction, and compositional analysis. We argue that these techniques will be necessary for the analysis of large-scale systems that combine quantum and classical components, and summarize the results of initial investigation using behavioural equivalence in process calculus. This paper is a summary of Simon Gay's invited talk at ICE'11. △ Less

Submitted 1 August, 2011; originally announced August 2011.

Comments: In Proceedings ICE 2011, arXiv:1108.0144

Journal ref: EPTCS 59, 2011, pp. 104-110

arXiv:quant-ph/0504007 [pdf, ps, other]

Probabilistic Model--Checking of Quantum Protocols

Authors: Simon Gay, Rajagopal Nagarajan, Nikolaos Papanikolaou

Abstract: We establish fundamental and general techniques for formal verification of quantum protocols. Quantum protocols are novel communication schemes involving the use of quantum-mechanical phenomena for representation, storage and transmission of data. As opposed to quantum computers, quantum communication systems can and have been implemented using present-day technology; therefore, the ability to m… ▽ More We establish fundamental and general techniques for formal verification of quantum protocols. Quantum protocols are novel communication schemes involving the use of quantum-mechanical phenomena for representation, storage and transmission of data. As opposed to quantum computers, quantum communication systems can and have been implemented using present-day technology; therefore, the ability to model and analyse such systems rigorously is of primary importance. While current analyses of quantum protocols use a traditional mathematical approach and require considerable understanding of the underlying physics, we argue that automated verification techniques provide an elegant alternative. We demonstrate these techniques through the use of PRISM, a probabilistic model-checking tool. Our approach is conceptually simpler than existing proofs, and allows us to disambiguate protocol definitions and assess their properties. It also facilitates detailed analyses of actual implemented systems. We illustrate our techniques by modelling a selection of quantum protocols (namely superdense coding, quantum teleportation, and quantum error correction) and verifying their basic correctness properties. Our results provide a foundation for further work on modelling and analysing larger systems such as those used for quantum cryptography, in which basic protocols are used as components. △ Less

Submitted 5 October, 2005; v1 submitted 1 April, 2005; originally announced April 2005.

Comments: 15 pages + 2 page appendix

arXiv:cs/0502048 [pdf, ps, other]

An Automated Analysis of the Security of Quantum Key Distribution

Authors: Rajagopal Nagarajan, Nikolaos Papanikolaou, Garry Bowen, Simon Gay

Abstract: This paper discusses the use of computer-aided verification as a practical means for analysing quantum information systems; specifically, the BB84 protocol for quantum key distribution is examined using this method. This protocol has been shown to be unconditionally secure against all attacks in an information-theoretic setting, but the relevant security proof requires a thorough understanding o… ▽ More This paper discusses the use of computer-aided verification as a practical means for analysing quantum information systems; specifically, the BB84 protocol for quantum key distribution is examined using this method. This protocol has been shown to be unconditionally secure against all attacks in an information-theoretic setting, but the relevant security proof requires a thorough understanding of the formalism of quantum mechanics and is not easily adaptable to practical scenarios. Our approach is based on probabilistic model-checking; we have used the PRISM model-checker to show that, as the number of qubits transmitted in BB84 is increased, the equivocation of the eavesdropper with respect to the channel decreases exponentially. We have also shown that the probability of detecting the presence of an eavesdropper increases exponentially with the number of qubits. The results presented here are a testament to the effectiveness of the model-checking approach for systems where analytical solutions may not be possible or plausible. △ Less

Submitted 9 February, 2005; originally announced February 2005.

ACM Class: D.2.4; D.4.6; K.6.5

Showing 1–9 of 9 results for author: Nagarajan, R