Skip to main content

Showing 1–33 of 33 results for author: Mudigere, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.00877  [pdf, other

    cs.LG cs.DC cs.IR

    Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

    Authors: Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov

    Abstract: We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global… ▽ More

    Submitted 2 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  2. arXiv:2309.06497  [pdf, other

    cs.LG cs.DC cs.MS math.OC

    A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

    Authors: Hao-Jun Michael Shi, Tsung-Hsien Lee, Shintaro Iwasaki, Jose Gallego-Posada, Zhi**g Li, Kaushik Rangadurai, Dheevatsa Mudigere, Michael Rabbat

    Abstract: Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the perform… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 38 pages, 8 figures, 5 tables

  3. arXiv:2305.01515  [pdf, other

    cs.IR cs.LG cs.PF

    MTrainS: Improving DLRM training efficiency using heterogeneous memories

    Authors: Hiwot Tadese Kassa, Paul Johnson, Jason Akers, Mrinmoy Ghosh, Andrew Tulloch, Dheevatsa Mudigere, Jongsoo Park, Xing Liu, Ronald Dreslinski, Ehsan K. Ardestani

    Abstract: Recommendation models are very large, requiring terabytes (TB) of memory during training. In pursuit of better quality, the model size and complexity grow over time, which requires additional training data to avoid overfitting. This model growth demands a large number of resources in data centers. Hence, training efficiency is becoming considerably more important to keep the data center power dema… ▽ More

    Submitted 19 April, 2023; originally announced May 2023.

  4. arXiv:2203.15837  [pdf

    cs.IR cs.AI cs.DC cs.LG

    Learning to Collide: Recommendation System Model Compression with Learned Hash Functions

    Authors: Benjamin Ghaemmaghami, Mustafa Ozdal, Rakesh Komuravelli, Dmitriy Korchev, Dheevatsa Mudigere, Krishnakumar Nair, Maxim Naumov

    Abstract: A key characteristic of deep recommendation models is the immense memory requirements of their embedding tables. These embedding tables can often reach hundreds of gigabytes which increases hardware requirements and training cost. A common technique to reduce model size is to hash all of the categorical variable identifiers (ids) into a smaller space. This hashing reduces the number of unique repr… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

  5. arXiv:2203.11014  [pdf, other

    cs.IR cs.AI cs.LG

    DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction

    Authors: Buyun Zhang, Liang Luo, Xi Liu, Jay Li, Zeliang Chen, Weilin Zhang, Xiaohan Wei, Yuchen Hao, Michael Tsang, Wenjun Wang, Yang Liu, Huayu Li, Yasmine Badr, Jongsoo Park, Jiyan Yang, Dheevatsa Mudigere, Ellie Wen

    Abstract: Learning feature interactions is important to the model performance of online advertising services. As a result, extensive efforts have been devoted to designing effective architectures to learn feature interactions. However, we observe that the practical performance of those designs can vary from dataset to dataset, even when the order of interactions claimed to be captured is the same. That indi… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

  6. arXiv:2202.00433  [pdf, other

    cs.NI cs.DC

    TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs

    Authors: Weiyang Wang, Moein Khazraee, Zhizhen Zhong, Manya Ghobadi, Zhihao Jia, Dheevatsa Mudigere, Ying Zhang, Anthony Kewitsch

    Abstract: We propose TopoOpt, a novel direct-connect fabric for deep neural network (DNN) training workloads. TopoOpt co-optimizes the distributed training process across three dimensions: computation, communication, and network topology. We demonstrate the mutability of AllReduce traffic, and leverage this property to construct efficient network topologies for DNN training jobs. TopoOpt then uses an altern… ▽ More

    Submitted 29 September, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

  7. arXiv:2110.14812  [pdf, other

    cs.LG cs.AI cs.IR

    Differentiable NAS Framework and Application to Ads CTR Prediction

    Authors: Ravi Krishna, Aravind Kalaiah, Bichen Wu, Maxim Naumov, Dheevatsa Mudigere, Misha Smelyanskiy, Kurt Keutzer

    Abstract: Neural architecture search (NAS) methods aim to automatically find the optimal deep neural network (DNN) architecture as measured by a given objective function, typically some combination of task accuracy and inference efficiency. For many areas, such as computer vision and natural language processing, this is a critical, yet still time consuming process. New NAS methods have recently made progres… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  8. arXiv:2110.11489  [pdf, ps, other

    cs.AR cs.LG

    Supporting Massive DLRM Inference Through Software Defined Memory

    Authors: Ehsan K. Ardestani, Changkyu Kim, Seung Jae Lee, Luoshang Pan, Valmiki Rampersad, Jens Axboe, Banit Agrawal, Fuxun Yu, Ansha Yu, Trung Le, Hector Yuen, Shishir Juluri, Akshat Nanda, Manoj Wodekar, Dheevatsa Mudigere, Krishnakumar Nair, Maxim Naumov, Chris Peterson, Mikhail Smelyanskiy, Vijay Rao

    Abstract: Deep Learning Recommendation Models (DLRM) are widespread, account for a considerable data center footprint, and grow by more than 1.5x per year. With model size soon to be in terabytes range, leveraging Storage ClassMemory (SCM) for inference enables lower power consumption and cost. This paper evaluates the major challenges in extending the memory hierarchy to SCM for DLRM, and presents differen… ▽ More

    Submitted 8 November, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: 14 pages, 5 figures

  9. arXiv:2104.05158  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

    Authors: Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng , et al. (28 additional authors not shown)

    Abstract: Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pa… ▽ More

    Submitted 26 February, 2023; v1 submitted 11 April, 2021; originally announced April 2021.

  10. arXiv:2010.08679  [pdf, other

    cs.IR cs.LG

    Check-N-Run: A Checkpointing System for Training Deep Learning Recommendation Models

    Authors: Assaf Eisenman, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, Murali Annavaram

    Abstract: Checkpoints play an important role in training long running machine learning (ML) models. Checkpoints take a snapshot of an ML model and store it in a non-volatile memory so that they can be used to recover from failures to ensure rapid training progress. In addition, they are used for online training to improve inference prediction accuracy with continuous learning. Given the large and ever incre… ▽ More

    Submitted 4 May, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

  11. arXiv:2003.09518  [pdf, other

    cs.DC

    Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

    Authors: Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, Mikhail Smelyanskiy

    Abstract: Large-scale training is important to ensure high performance and accuracy of machine-learning models. At Facebook we use many different models, including computer vision, video and language models. However, in this paper we focus on the deep learning recommendation models (DLRMs), which are responsible for more than 50% of the training demand in our data centers. Recommendation models present uniq… ▽ More

    Submitted 18 August, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

    Comments: 10 pages, 14 figures; adjusted Fig. 10, added reference; fixed typos

    MSC Class: 68T05; 68M10 ACM Class: H.3.3; I.2.6; C.2.1

  12. SEERL: Sample Efficient Ensemble Reinforcement Learning

    Authors: Rohan Saphal, Balaraman Ravindran, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul

    Abstract: Ensemble learning is a very prevalent method employed in machine learning. The relative success of ensemble methods is attributed to their ability to tackle a wide range of instances and complex problems that require different low-level approaches. However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved… ▽ More

    Submitted 16 May, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: Accepted at Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

  13. arXiv:1912.12953  [pdf, other

    cs.DC cs.AR

    RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

    Authors: Liu Ke, Udit Gupta, Carole-Jean Wu, Benjamin Youngjae Cho, Mark Hempstead, Brandon Reagen, Xuan Zhang, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang

    Abstract: Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate per… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

  14. arXiv:1909.11810  [pdf, other

    cs.LG stat.ML

    Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

    Authors: Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou

    Abstract: Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings. To help manage this outsized memory consumption, we explore mixed dimension embeddings, an embedding layer architecture in which a particular embedding vector's dimension scales with its que… ▽ More

    Submitted 8 February, 2021; v1 submitted 25 September, 2019; originally announced September 2019.

  15. arXiv:1909.02107  [pdf, other

    cs.LG cs.IR stat.ML

    Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

    Authors: Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang

    Abstract: Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens o… ▽ More

    Submitted 28 June, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: 11 pages, 7 figures, 1 table

  16. arXiv:1906.03109  [pdf, other

    cs.DC cs.LG

    The Architectural Implications of Facebook's DNN-based Personalized Recommendation

    Authors: Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, Xuan Zhang

    Abstract: The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. However, despite the importance of these models and the amount of compute cycles they consume, relatively little research attention has been devoted to systems for recomme… ▽ More

    Submitted 15 February, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: 11 pages

  17. arXiv:1906.00091  [pdf, other

    cs.IR cs.LG

    Deep Learning Recommendation Model for Personalization and Recommendation Systems

    Authors: Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, Misha Smelyanskiy

    Abstract: With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation m… ▽ More

    Submitted 31 May, 2019; originally announced June 2019.

    Comments: 10 pages, 6 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.5.0; H.3.3; H.3.4

  18. arXiv:1905.12322  [pdf, other

    cs.LG stat.ML

    A Study of BFLOAT16 for Deep Learning Training

    Authors: Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey

    Abstract: This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can repr… ▽ More

    Submitted 13 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

  19. arXiv:1810.11507  [pdf, other

    cs.LG stat.ML

    Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy

    Authors: Majid Jahani, Xi He, Chenxin Ma, Aryan Mokhtari, Dheevatsa Mudigere, Alejandro Ribeiro, Martin Takáč

    Abstract: In this paper, we propose a Distributed Accumulated Newton Conjugate gradiEnt (DANCE) method in which sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy. Our proposed method is multistage in which the solution of a stage serves as a warm start for the next stage which contains more samples (including the samples in the p… ▽ More

    Submitted 9 March, 2020; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: Updated numerical results

  20. arXiv:1808.03420  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Block Sparse Neural Networks

    Authors: Dharma Teja Vooturi, Dheevatsa Mudigere, Sasikanth Avancha

    Abstract: Sparse deep neural networks(DNNs) are efficient in both memory and compute when compared to dense DNNs. But due to irregularity in computation of sparse DNNs, their efficiencies are much lower than that of dense DNNs on regular parallel hardware such as TPU. This inefficiency leads to poor/no performance benefits for sparse DNNs. Performance issue for sparse DNNs can be alleviated by bringing stru… ▽ More

    Submitted 27 December, 2018; v1 submitted 10 August, 2018; originally announced August 2018.

  21. arXiv:1802.05374  [pdf, other

    math.OC cs.LG stat.ML

    A Progressive Batching L-BFGS Method for Machine Learning

    Authors: Raghu Bollapragada, Dheevatsa Mudigere, Jorge Nocedal, Hao-Jun Michael Shi, ** Tak Peter Tang

    Abstract: The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization pr… ▽ More

    Submitted 30 May, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

    Comments: ICML 2018. 25 pages, 17 figures, 2 tables

  22. arXiv:1802.00930  [pdf, other

    cs.NE cs.LG math.NA

    Mixed Precision Training of Convolutional Neural Networks using Integer Operations

    Authors: Dipankar Das, Naveen Mellempudi, Dheevatsa Mudigere, Dhiraj Kalamkar, Sasikanth Avancha, Kunal Banerjee, Srinivas Sridharan, Karthik Vaidyanathan, Bharat Kaul, Evangelos Georganas, Alexander Heinecke, Pradeep Dubey, Jesus Corbal, Nikita Shustrov, Roma Dubtsov, Evarist Fomenko, Vadim Pirogov

    Abstract: The state-of-the-art (SOTA) for mixed precision training is dominated by variants of low precision floating point operations, and in particular, FP16 accumulating into FP32 Micikevicius et al. (2017). On the other hand, while a lot of research has also happened in the domain of low and mixed-precision Integer training, these works either present results for non-SOTA networks (for instance only Ale… ▽ More

    Submitted 23 February, 2018; v1 submitted 3 February, 2018; originally announced February 2018.

    Comments: Published as a conference paper at ICLR 2018

  23. arXiv:1801.08030  [pdf, other

    cs.DC cs.LG

    On Scale-out Deep Learning Training for Cloud and HPC

    Authors: Srinivas Sridharan, Karthikeyan Vaidyanathan, Dhiraj Kalamkar, Dipankar Das, Mikhail E. Smorkalov, Mikhail Shiryaev, Dheevatsa Mudigere, Naveen Mellempudi, Sasikanth Avancha, Bharat Kaul, Pradeep Dubey

    Abstract: The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes. This can only be achieved through scalable and efficient distributed training, since a single node/card cannot satisfy the compute, memory, and I/O requirements of today's state-of-the-art deep neural networks. However, scaling synchronous Stochasti… ▽ More

    Submitted 24 January, 2018; originally announced January 2018.

    Comments: Accepted in SysML 2018 conference

  24. arXiv:1707.06658  [pdf, other

    cs.LG cs.AI

    RAIL: Risk-Averse Imitation Learning

    Authors: Anirban Santara, Abhishek Naik, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul

    Abstract: Imitation learning algorithms learn viable policies by imitating an expert's behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories. We evaluate in terms of the expert's cost function and observe that the distribution of trajectory-c… ▽ More

    Submitted 29 November, 2017; v1 submitted 20 July, 2017; originally announced July 2017.

    Comments: Accepted for presentation in Deep Reinforcement Learning Symposium at NIPS 2017

  25. arXiv:1707.04679  [pdf, other

    cs.IT cs.AI

    Ternary Residual Networks

    Authors: Abhisek Kundu, Kunal Banerjee, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey

    Abstract: Sub-8-bit representation of DNNs incur some discernible loss of accuracy despite rigorous (re)training at low-precision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of \textit{residual networks} where we add more low-precision edges to sensitiv… ▽ More

    Submitted 31 October, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

  26. arXiv:1705.01462  [pdf, other

    cs.LG cs.NE

    Ternary Neural Networks with Fine-Grained Quantization

    Authors: Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey

    Abstract: We propose a novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits. Using this method, we demonstrate a minimal loss in classification accuracy on state-of-the-art topologies without additional training. We provide an improved theoretical formulation that forms the basis for a higher quality solution using F… ▽ More

    Submitted 30 May, 2017; v1 submitted 2 May, 2017; originally announced May 2017.

  27. arXiv:1701.08978  [pdf, other

    cs.LG cs.NE

    Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

    Authors: Naveen Mellempudi, Abhisek Kundu, Dipankar Das, Dheevatsa Mudigere, Bharat Kaul

    Abstract: We propose a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy. In addition, we also constrain the activations to 8-bits thus enabling sub 8-bit full integer inference pipeline. Our method uses smaller clusters of N filters with a common scaling factor to minimize the quantization loss, while also maximizing the… ▽ More

    Submitted 31 January, 2017; v1 submitted 31 January, 2017; originally announced January 2017.

  28. arXiv:1609.04836  [pdf, other

    cs.LG math.OC

    On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

    Authors: Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, ** Tak Peter Tang

    Abstract: The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say $32$-$512$ data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a degradation in the quality of the mod… ▽ More

    Submitted 9 February, 2017; v1 submitted 15 September, 2016; originally announced September 2016.

    Comments: Accepted as a conference paper at ICLR 2017

  29. arXiv:1606.00511  [pdf, ps, other

    cs.LG cs.DC math.OC

    Distributed Hessian-Free Optimization for Deep Neural Network

    Authors: Xi He, Dheevatsa Mudigere, Mikhail Smelyanskiy, Martin Takáč

    Abstract: Training deep neural network is a high dimensional and a highly non-convex optimization problem. Stochastic gradient descent (SGD) algorithm and it's variations are the current state-of-the-art solvers for this task. However, due to non-covexity nature of the problem, it was observed that SGD slows down near saddle point. Recent empirical work claim that by detecting and esca** saddle point effi… ▽ More

    Submitted 15 January, 2017; v1 submitted 1 June, 2016; originally announced June 2016.

  30. arXiv:1602.06709  [pdf, other

    cs.DC cs.LG

    Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

    Authors: Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidynathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, Pradeep Dubey

    Abstract: We design and implement a distributed multinode synchronous SGD algorithm, without altering hyper parameters, or compressing data, or altering algorithmic behavior. We perform a detailed analysis of scaling, and identify optimal design points for different networks. We demonstrate scaling of CNNs on 100s of nodes, and present what we believe to be record training throughputs. A 512 minibatch VGG-A… ▽ More

    Submitted 22 February, 2016; originally announced February 2016.

  31. arXiv:1411.3251  [pdf

    cs.CE cs.NE

    Identification of Helicopter Dynamics based on Flight Data using Nature Inspired Techniques

    Authors: S. N. Omkar, Dheevatsa Mudigere, J Senthilnath, M. Vijaya Kumar

    Abstract: The complexity of helicopter flight dynamics makes modeling and helicopter system identification a very difficult task. Most of the traditional techniques require a model structure to be defined apriori and in case of helicopter dynamics, this is difficult due to its complexity and the interplay between various subsystems.To overcome this difficulty, non-parametric approaches are commonly adopted… ▽ More

    Submitted 12 November, 2014; originally announced November 2014.

  32. arXiv:1011.3583  [pdf

    cs.DC cs.GR cs.PF

    Fast GPGPU Data Rearrangement Kernels using CUDA

    Authors: Michael Bader, Hans-Joachim Bungartz, Dheevatsa Mudigere, Srihari Narasimhan, Babu Narayanan

    Abstract: Many high performance-computing algorithms are bandwidth limited, hence the need for optimal data rearrangement kernels as well as their easy integration into the rest of the application. In this work, we have built a CUDA library of fast kernels for a set of data rearrangement operations. In particular, we have built generic kernels for rearranging m dimensional data into n dimensions, including… ▽ More

    Submitted 15 November, 2010; originally announced November 2010.

  33. arXiv:1011.0235  [pdf, other

    cs.DC cs.PF

    Fast Histograms using Adaptive CUDA Streams

    Authors: Sisir Koppaka, Dheevatsa Mudigere, Srihari Narasimhan, Babu Narayanan

    Abstract: Histograms are widely used in medical imaging, network intrusion detection, packet analysis and other stream-based high throughput applications. However, while porting such software stacks to the GPU, the computation of the histogram is a typical bottleneck primarily due to the large impact on kernel speed by atomic operations. In this work, we propose a stream-based model implemented in CUDA, usi… ▽ More

    Submitted 31 October, 2010; originally announced November 2010.

    Comments: 5 pages, 5 figures, 4 tables, to appear in Student Research Symposium, High Performance Computing 2010, Goa, India (www.hipc.org)