Skip to main content

Showing 1–26 of 26 results for author: Naumov, M

.
  1. arXiv:2403.02545  [pdf, other

    cs.LG cs.AI

    Wukong: Towards a Scaling Law for Large-Scale Recommendation

    Authors: Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Daifeng Guo, Yanli Zhao, Shen Li, Yuchen Hao, Yantao Yao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, Wenlin Chen

    Abstract: Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets.… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 12 pages

  2. arXiv:2403.00877  [pdf, other

    cs.LG cs.DC cs.IR

    Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

    Authors: Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov

    Abstract: We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global… ▽ More

    Submitted 2 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  3. arXiv:2310.10537  [pdf, other

    cs.LG cs.AI

    Microscaling Data Formats for Deep Learning

    Authors: Bita Darvish Rouhani, Ritchie Zhao, Ankit More, Mathew Hall, Alireza Khodamoradi, Summer Deng, Dhruv Choudhary, Marius Cornea, Eric Dellinger, Kristof Denolf, Stosic Dusan, Venmugil Elango, Maximilian Golub, Alexander Heinecke, Phil James-Roxby, Dharmesh Jani, Gaurav Kolhe, Martin Langhammer, Ada Li, Levi Melnick, Maral Mesmakhosroshahi, Andres Rodriguez, Michael Schulte, Rasoul Shafipour, Lei Shao , et al. (8 additional authors not shown)

    Abstract: Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical result… ▽ More

    Submitted 19 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

  4. arXiv:2302.08007  [pdf, other

    cs.LG cs.AI cs.AR

    With Shared Microexponents, A Little Shifting Goes a Long Way

    Authors: Bita Rouhani, Ritchie Zhao, Venmugil Elango, Rasoul Shafipour, Mathew Hall, Maral Mesmakhosroshahi, Ankit More, Levi Melnick, Maximilian Golub, Girish Varatkar, Lei Shao, Gaurav Kolhe, Dimitry Melts, Jasmine Klar, Renee L'Heureux, Matt Perry, Doug Burger, Eric Chung, Zhaoxia Deng, Sam Naghshineh, Jongsoo Park, Maxim Naumov

    Abstract: This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-p… ▽ More

    Submitted 12 April, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

  5. arXiv:2203.15837  [pdf

    cs.IR cs.AI cs.DC cs.LG

    Learning to Collide: Recommendation System Model Compression with Learned Hash Functions

    Authors: Benjamin Ghaemmaghami, Mustafa Ozdal, Rakesh Komuravelli, Dmitriy Korchev, Dheevatsa Mudigere, Krishnakumar Nair, Maxim Naumov

    Abstract: A key characteristic of deep recommendation models is the immense memory requirements of their embedding tables. These embedding tables can often reach hundreds of gigabytes which increases hardware requirements and training cost. A common technique to reduce model size is to hash all of the categorical variable identifiers (ids) into a smaller space. This hashing reduces the number of unique repr… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

  6. arXiv:2110.14812  [pdf, other

    cs.LG cs.AI cs.IR

    Differentiable NAS Framework and Application to Ads CTR Prediction

    Authors: Ravi Krishna, Aravind Kalaiah, Bichen Wu, Maxim Naumov, Dheevatsa Mudigere, Misha Smelyanskiy, Kurt Keutzer

    Abstract: Neural architecture search (NAS) methods aim to automatically find the optimal deep neural network (DNN) architecture as measured by a given objective function, typically some combination of task accuracy and inference efficiency. For many areas, such as computer vision and natural language processing, this is a critical, yet still time consuming process. New NAS methods have recently made progres… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  7. arXiv:2110.11489  [pdf, ps, other

    cs.AR cs.LG

    Supporting Massive DLRM Inference Through Software Defined Memory

    Authors: Ehsan K. Ardestani, Changkyu Kim, Seung Jae Lee, Luoshang Pan, Valmiki Rampersad, Jens Axboe, Banit Agrawal, Fuxun Yu, Ansha Yu, Trung Le, Hector Yuen, Shishir Juluri, Akshat Nanda, Manoj Wodekar, Dheevatsa Mudigere, Krishnakumar Nair, Maxim Naumov, Chris Peterson, Mikhail Smelyanskiy, Vijay Rao

    Abstract: Deep Learning Recommendation Models (DLRM) are widespread, account for a considerable data center footprint, and grow by more than 1.5x per year. With model size soon to be in terabytes range, leveraging Storage ClassMemory (SCM) for inference enables lower power consumption and cost. This paper evaluates the major challenges in extending the memory hierarchy to SCM for DLRM, and presents differen… ▽ More

    Submitted 8 November, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: 14 pages, 5 figures

  8. arXiv:2105.12676  [pdf, other

    cs.LG cs.AR cs.IR cs.PF math.NA

    Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

    Authors: Zhaoxia, Deng, Jongsoo Park, ** Tak Peter Tang, Haixin Liu, Jie, Yang, Hector Yuen, Jianyu Huang, Daya Khudia, Xiaohan Wei, Ellie Wen, Dhruv Choudhary, Raghuraman Krishnamoorthi, Carole-Jean Wu, Satish Nadathur, Changkyu Kim, Maxim Naumov, Sam Naghshineh, Mikhail Smelyanskiy

    Abstract: Tremendous success of machine learning (ML) and the unabated growth in ML model complexity motivated many ML-specific designs in both CPU and accelerator architectures to speed up the model inference. While these architectures are diverse, highly optimized low-precision arithmetic is a component shared by most. Impressive compute throughputs are indeed often exhibited by these architectures on ben… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

  9. arXiv:2104.05158  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

    Authors: Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng , et al. (28 additional authors not shown)

    Abstract: Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pa… ▽ More

    Submitted 26 February, 2023; v1 submitted 11 April, 2021; originally announced April 2021.

  10. arXiv:2008.11922  [pdf, other

    cs.IR cs.LG stat.ML

    Time-based Sequence Model for Personalization and Recommendation Systems

    Authors: Tigran Ishkhanov, Maxim Naumov, Xianjie Chen, Yan Zhu, Yuan Zhong, Alisson Gusatti Azzolini, Chonglin Sun, Frank Jiang, Andrey Malevich, Liang Xiong

    Abstract: In this paper we develop a novel recommendation model that explicitly incorporates time information. The model relies on an embedding layer and TSL attention-like mechanism with inner products in different vector spaces, that can be thought of as a modification of multi-headed attention. This mechanism allows the model to efficiently treat sequences of user behavior of different length. We study t… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

    Comments: 17 pages, 7 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.5.0; H.3.3; H.3.4

  11. arXiv:2003.09518  [pdf, other

    cs.DC

    Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

    Authors: Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, Mikhail Smelyanskiy

    Abstract: Large-scale training is important to ensure high performance and accuracy of machine-learning models. At Facebook we use many different models, including computer vision, video and language models. However, in this paper we focus on the deep learning recommendation models (DLRMs), which are responsible for more than 50% of the training demand in our data centers. Recommendation models present uniq… ▽ More

    Submitted 18 August, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

    Comments: 10 pages, 14 figures; adjusted Fig. 10, added reference; fixed typos

    MSC Class: 68T05; 68M10 ACM Class: H.3.3; I.2.6; C.2.1

  12. arXiv:1912.12953  [pdf, other

    cs.DC cs.AR

    RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

    Authors: Liu Ke, Udit Gupta, Carole-Jean Wu, Benjamin Youngjae Cho, Mark Hempstead, Brandon Reagen, Xuan Zhang, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang

    Abstract: Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate per… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

  13. arXiv:1909.11810  [pdf, other

    cs.LG stat.ML

    Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

    Authors: Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou

    Abstract: Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings. To help manage this outsized memory consumption, we explore mixed dimension embeddings, an embedding layer architecture in which a particular embedding vector's dimension scales with its que… ▽ More

    Submitted 8 February, 2021; v1 submitted 25 September, 2019; originally announced September 2019.

  14. arXiv:1909.02107  [pdf, other

    cs.LG cs.IR stat.ML

    Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

    Authors: Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang

    Abstract: Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens o… ▽ More

    Submitted 28 June, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: 11 pages, 7 figures, 1 table

  15. arXiv:1906.03109  [pdf, other

    cs.DC cs.LG

    The Architectural Implications of Facebook's DNN-based Personalized Recommendation

    Authors: Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, Xuan Zhang

    Abstract: The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. However, despite the importance of these models and the amount of compute cycles they consume, relatively little research attention has been devoted to systems for recomme… ▽ More

    Submitted 15 February, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: 11 pages

  16. arXiv:1906.00091  [pdf, other

    cs.IR cs.LG

    Deep Learning Recommendation Model for Personalization and Recommendation Systems

    Authors: Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, Misha Smelyanskiy

    Abstract: With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation m… ▽ More

    Submitted 31 May, 2019; originally announced June 2019.

    Comments: 10 pages, 6 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.5.0; H.3.3; H.3.4

  17. arXiv:1901.02132  [pdf, other

    cs.CV cs.LG cs.NE

    Spatial-Winograd Pruning Enabling Sparse Winograd Convolution

    Authors: Jiecao Yu, Jongsoo Park, Maxim Naumov

    Abstract: Deep convolutional neural networks (CNNs) are deployed in various applications but demand immense computational requirements. Pruning techniques and Winograd convolution are two typical methods to reduce the CNN computation. However, they cannot be directly combined because Winograd transformation fills in the sparsity resulting from pruning. Li et al. (2017) propose sparse Winograd convolution in… ▽ More

    Submitted 7 January, 2019; originally announced January 2019.

  18. arXiv:1901.02103  [pdf, other

    cs.LG cs.CV cs.IT stat.ML

    On the Dimensionality of Embeddings for Sparse Features and Data

    Authors: Maxim Naumov

    Abstract: In this note we discuss a common misconception, namely that embeddings are always used to reduce the dimensionality of the item space. We show that when we measure dimensionality in terms of information entropy then the embedding of sparse probability distributions, that can be used to represent sparse features or data, may or not reduce the dimensionality of the item space. However, the embedding… ▽ More

    Submitted 7 January, 2019; originally announced January 2019.

    Comments: 8 pages, 2 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.5.0

  19. arXiv:1811.09886  [pdf, other

    cs.LG stat.ML

    Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

    Authors: Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, Juan Pino, Martin Schatz, Alexander Sidorov, Viswanath Sivakumar, Andrew Tulloch, Xiaodong Wang, Yiming Wu, Hector Yuen, Utku Diril, Dmytro Dzhulgakov, Kim Hazelwood, Bill Jia, Yangqing Jia, Lin Qiao, Vijay Rao , et al. (3 additional authors not shown)

    Abstract: The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper provides detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high performance optimizations targeting existing systems, point out their limitations and make suggestions… ▽ More

    Submitted 29 November, 2018; v1 submitted 24 November, 2018; originally announced November 2018.

  20. arXiv:1811.09862  [pdf, other

    cs.LG cs.CV stat.ML

    On Periodic Functions as Regularizers for Quantization of Neural Networks

    Authors: Maxim Naumov, Utku Diril, Jongsoo Park, Benjamin Ray, Jedrzej Jablonski, Andrew Tulloch

    Abstract: Deep learning models have been successfully used in computer vision and many other fields. We propose an unorthodox algorithm for performing quantization of the model parameters. In contrast with popular quantization schemes based on thresholds, we use a novel technique based on periodic functions, such as continuous trigonometric sine or cosine as well as non-continuous hat functions. We apply th… ▽ More

    Submitted 24 November, 2018; originally announced November 2018.

    Comments: 11 pages, 7 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.5.0

  21. arXiv:1811.05922  [pdf, other

    cs.LG stat.ML

    Bandana: Using Non-volatile Memory for Storing Deep Learning Models

    Authors: Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, Sachin Katti

    Abstract: Typical large-scale recommender systems use deep learning models that are stored on a large amount of DRAM. These models often rely on embeddings, which consume most of the required memory. We present Bandana, a storage system that reduces the DRAM footprint of embeddings, by using Non-volatile Memory (NVM) as the primary storage medium, with a small amount of DRAM as cache. The main challenge in… ▽ More

    Submitted 14 November, 2018; v1 submitted 14 November, 2018; originally announced November 2018.

  22. arXiv:1712.06577  [pdf, ps, other

    cs.LG cs.AI math.NA

    Parallel Complexity of Forward and Backward Propagation

    Authors: Maxim Naumov

    Abstract: We show that the forward and backward propagation can be formulated as a solution of lower and upper triangular systems of equations. For standard feedforward (FNNs) and recurrent neural networks (RNNs) the triangular systems are always block bi-diagonal, while for a general computation graph (directed acyclic graph) they can have a more complex triangular sparsity pattern. We discuss direct and i… ▽ More

    Submitted 18 December, 2017; originally announced December 2017.

    Comments: 18 pages

    MSC Class: 68T05 (Primary) 65F99; 15B99 (Secondary) ACM Class: I.2.6; I.5.0

  23. arXiv:1712.02029  [pdf, other

    cs.LG cs.CV cs.DC stat.ML

    AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

    Authors: Aditya Devarakonda, Maxim Naumov, Michael Garland

    Abstract: Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch size f… ▽ More

    Submitted 13 February, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

    Comments: 14 pages

    MSC Class: 68T05; ACM Class: I.2.6; I.5.0

  24. arXiv:1709.06080  [pdf, other

    cs.LG cs.AI math.NA

    Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form

    Authors: Maxim Naumov

    Abstract: In this paper we focus on the linear algebra theory behind feedforward (FNN) and recurrent (RNN) neural networks. We review backward propagation, including backward propagation through time (BPTT). Also, we obtain a new exact expression for Hessian, which represents second order effects. We show that for $t$ time steps the weight gradient can be expressed as a rank-$t$ matrix, while the weight Hes… ▽ More

    Submitted 16 September, 2017; originally announced September 2017.

    Comments: 23 pages, 4 figures

    MSC Class: 68T05 (Primary) 65F99; 15B99 (Secondary) ACM Class: I.2.6; I.5.0

  25. Exact Calculation of Entanglement in a 19-site 2D Spin System

    Authors: Qing Xu, Sabre Kais, Maxim Naumov, Ahmed Sameh

    Abstract: Using the Trace Minimization Algorithm, we carried out an exact calculation of entanglement in a 19-site two-dimensional transverse Ising model. This model consists of a set of localized spin-1/2 particles in a two dimensional triangular lattice coupled through exchange interaction J and subject to an external magnetic field of strength h. We demonstrate, for such a class of two-dimensional magn… ▽ More

    Submitted 13 January, 2010; v1 submitted 25 September, 2009; originally announced September 2009.

    Journal ref: Phys. Rev. A 81, 022324 (2010)

  26. arXiv:0901.1890  [pdf

    physics.comp-ph

    Multimillion Atom Simulations with NEMO 3-D

    Authors: Shaikh Ahmed, Neerav Kharche, Rajib Rahman, Muhammad Usman, Sunhee Lee, Hoon Ryu, Hansang Bae, Steve Clark, Benjamin Haley, Maxim Naumov, Faisal Saied, Marek Korkusinski, Rick Kennel, Michael McLennan, Timothy B. Boykin, Gerhard Klimeck

    Abstract: The rapid progress in nanofabrication technologies has led to the emergence of new classes of nanodevices and structures. At the atomic scale of novel nanostructured semiconductors the distinction between new device and new material is blurred and device physics and material science meet. The quantum mechanical effects in the electronic states of the device and the granular, atomistic representa… ▽ More

    Submitted 13 January, 2009; originally announced January 2009.

    Comments: 35 pages; 37 figures