Skip to main content

Showing 51–79 of 79 results for author: Krishna, T

.
  1. arXiv:2101.04799  [pdf, other

    cs.AR cs.LG

    Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration

    Authors: Ananda Samajdar, Michael Pellauer, Tushar Krishna

    Abstract: With increasing diversity in Deep Neural Network(DNN) models in terms of layer shapes and sizes, the research community has been investigating flexible/reconfigurable accelerator substrates. This line of research has opened up two challenges. The first is to determine the appropriate amount of flexibility within an accelerator array that that can trade-off the performance benefits versus the area… ▽ More

    Submitted 23 April, 2022; v1 submitted 12 January, 2021; originally announced January 2021.

  2. arXiv:2012.12563  [pdf, other

    cs.AR

    Architecture, Dataflow and Physical Design Implications of 3D-ICs for DNN-Accelerators

    Authors: Jan Moritz Joseph, Ananda Samajdar, Lingjun Zhu, Rainer Leupers, Sung-Kyu Lim, Thilo Pionteck, Tushar Krishna

    Abstract: The everlasting demand for higher computing power for deep neural networks (DNNs) drives the development of parallel computing architectures. 3D integration, in which chips are integrated and connected vertically, can further increase performance because it introduces another level of spatial parallelism. Therefore, we analyze dataflows, performance, area, power and temperature of such 3D-DNN-acce… ▽ More

    Submitted 18 February, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

  3. arXiv:2011.14755  [pdf, other

    cs.AR

    Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package

    Authors: Robert Guirado, Hyoukjun Kwon, Sergi Abadal, Eduard Alarcón, Tushar Krishna

    Abstract: Deep neural network (DNN) models continue to grow in size and complexity, demanding higher computational power to enable real-time inference. To efficiently deliver such computational demands, hardware accelerators are being developed and deployed across scales. This naturally requires an efficient scale-out mechanism for increasing compute density as required by the application. 2.5D integration… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

    Comments: ASPDAC '21

  4. arXiv:2009.02010  [pdf, other

    cs.AR cs.LG eess.SP

    ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

    Authors: Sheng-Chun Kao, Geonhwa Jeong, Tushar Krishna

    Abstract: DNN accelerators provide efficiency by leveraging reuse of activations/weights/outputs during the DNN computations to reduce data movement from DRAM to the chip. The reuse is captured by the accelerator's dataflow. While there has been significant prior work in exploring and comparing various dataflows, the strategy for assigning on-chip hardware resources (i.e., compute and memory) given a datafl… ▽ More

    Submitted 4 September, 2020; originally announced September 2020.

  5. arXiv:2008.11881  [pdf, other

    cs.NE cs.DC cs.LG

    CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices

    Authors: Parth Mannan, Ananda Samajdar, Tushar Krishna

    Abstract: Recent advancements in machine learning algorithms, especially the development of Deep Neural Networks (DNNs) have transformed the landscape of Artificial Intelligence (AI). With every passing day, deep learning based methods are applied to solve new problems with exceptional results. The portal to the real world is the edge. The true impact of AI can only be fully realized if we can have AI agent… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: Accepted and appears in ISPASS 2020

  6. arXiv:2008.08289  [pdf, other

    cs.LG cs.DC stat.ML

    Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference

    Authors: Afshin Abdi, Saeed Rashidi, Faramarz Fekri, Tushar Krishna

    Abstract: Using multiple nodes and parallel computing algorithms has become a principal tool to improve training and execution times of deep neural networks as well as effective collective intelligence in sensor networks. In this paper, we consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers) where the deep model is divided into several parallel… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

  7. arXiv:2008.06741  [pdf, other

    cs.AR cs.ET

    Breaking Barriers: Maximizing Array Utilization for Compute In-Memory Fabrics

    Authors: Brian Crafton, Samuel Spetalnick, Gauthaman Murali, Tushar Krishna, Sung-Kyu Lim, Arijit Raychowdhury

    Abstract: Compute in-memory (CIM) is a promising technique that minimizes data transport, the primary performance bottleneck and energy cost of most data intensive applications. This has found wide-spread adoption in accelerating neural networks for machine learning applications. Utilizing a crossbar architecture with emerging non-volatile memories (eNVM) such as dense resistive random access memory (RRAM)… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

    Comments: 6 pages, 9 figures, conference paper

  8. arXiv:2007.03152  [pdf, other

    cs.AR

    The gem5 Simulator: Version 20.0+

    Authors: Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Carlos Escuin, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi , et al. (53 additional authors not shown)

    Abstract: The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 si… ▽ More

    Submitted 29 September, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: Source, comments, and feedback: https://github.com/darchr/gem5-20-paper

  9. Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms

    Authors: Saeed Rashidi, Matthew Denton, Srinivas Sridharan, Sudarshan Srinivasan, Amoghavarsha Suresh, Jade Ni, Tushar Krishna

    Abstract: Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth. However, as we identify in this work, driving this bandwidth is quite challenging. This is because there is a pernicious balance between using the accelerator's compute and memory for both DL computations and commu… ▽ More

    Submitted 4 May, 2022; v1 submitted 30 June, 2020; originally announced July 2020.

  10. arXiv:2006.07137  [pdf, other

    eess.SP cs.AR cs.LG

    STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators

    Authors: Francisco Muñoz-Martínez, José L. Abellán, Manuel E. Acacio, Tushar Krishna

    Abstract: The design of specialized architectures for accelerating the inference procedure of Deep Neural Networks (DNNs) is a booming area of research nowadays. First-generation rigid proposals have been rapidly replaced by more advanced flexible accelerator architectures able to efficiently support a variety of layer types and dimensions. As the complexity of the designs grows, it is more and more appeali… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

  11. arXiv:2006.03969  [pdf, other

    cs.LG stat.ML

    Conditional Neural Architecture Search

    Authors: Sheng-Chun Kao, Arun Ramamurthy, Reed Williams, Tushar Krishna

    Abstract: Designing resource-efficient Deep Neural Networks (DNNs) is critical to deploy deep learning solutions over edge platforms due to diverse performance, power, and memory budgets. Unfortunately, it is often the case a well-trained ML model does not fit to the constraint of deploying edge platforms, causing a long iteration of model reduction and retraining process. Moreover, a ML model optimized for… ▽ More

    Submitted 6 June, 2020; originally announced June 2020.

  12. arXiv:2006.03968  [pdf, other

    cs.LG stat.ML

    Generative Design of Hardware-aware DNNs

    Authors: Sheng-Chun Kao, Arun Ramamurthy, Tushar Krishna

    Abstract: To efficiently run DNNs on the edge/cloud, many new DNN inference accelerators are being designed and deployed frequently. To enhance the resource efficiency of DNNs, model quantization is a widely-used approach. However, different accelerator/HW has different resources leading to the need for specialized quantization strategy of each HW. Moreover, using the same quantization for every layer may b… ▽ More

    Submitted 12 July, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

  13. arXiv:2006.00454  [pdf, other

    physics.flu-dyn

    On the fluidic behavior of an over-expanded planar plug nozzle under lateral confinement

    Authors: M. Chaudhary, T. V. Krishna, Sowmya R. Nanda, S. K. Karthick, A. Khan, A. De, S. Mohammed Ibrahim

    Abstract: The present work aims to study the fluidic behavior on lateral confinement by placing side-walls on the planar plug nozzle through experiments. The study involves two cases of nozzle pressure ratio (NPR=3, 6), which correspond to over-expanded nozzle operating conditions. Steady-state pressure measurements, together with schlieren and surface oil flow visualization, reveal the presence of over-exp… ▽ More

    Submitted 2 August, 2020; v1 submitted 31 May, 2020; originally announced June 2020.

    Comments: 14 pages, 12 figures, Revision submitted to Phys. Fluids

  14. arXiv:2002.07752  [pdf, other

    cs.DC cs.LG cs.PF

    Marvel: A Data-centric Compiler for DNN Operators on Spatial Accelerators

    Authors: Prasanth Chatarasi, Hyoukjun Kwon, Natesh Raina, Saurabh Malik, Vaisakh Haridas, Angshuman Parashar, Michael Pellauer, Tushar Krishna, Vivek Sarkar

    Abstract: The efficiency of a spatial DNN accelerator depends heavily on the compiler and its cost model ability to generate optimized map**s for various operators of DNN models on to the accelerator's compute and memory resources. But, existing cost models lack a formal boundary over the operators for precise and tractable analysis, which poses adaptability challenges for new DNN operators. To address th… ▽ More

    Submitted 11 June, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

  15. arXiv:2002.04116  [pdf, ps, other

    cs.LG eess.SP stat.ML

    Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

    Authors: Lei Yang, Zheyu Yan, Meng Li, Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra, Weiwen Jiang, Yiyu Shi

    Abstract: Neural Architecture Search (NAS) has demonstrated its power on various AI accelerating platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs). However, it remains an open problem, how to integrate NAS with Application-Specific Integrated Circuits (ASICs), despite them being the most powerful AI accelerating platforms. The major bottleneck comes from the large… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    Comments: Accepted by DAC'20

  16. arXiv:1912.01664  [pdf, other

    cs.AR

    Understanding the Impact of On-chip Communication on DNN Accelerator Performance

    Authors: Robert Guirado, Hyoukjun Kwon, Eduard Alarcón, Sergi Abadal, Tushar Krishna

    Abstract: Deep Neural Networks have flourished at an unprecedented pace in recent years. They have achieved outstanding accuracy in fields such as computer vision, natural language processing, medicine or economics. Specifically, Convolutional Neural Networks (CNN) are particularly suited to object recognition or identification tasks. This, however, comes at a high computational cost, prompting the use of s… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: ICECS2019

  17. arXiv:1909.07437  [pdf, other

    cs.DC

    Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

    Authors: Hyoukjun Kwon, Liangzhen Lai, Michael Pellauer, Tushar Krishna, Yu-Hsin Chen, Vikas Chandra

    Abstract: Emerging AI-enabled applications such as augmented/virtual reality (AR/VR) leverage multiple deep neural network (DNN) models for sub-tasks such as object detection, hand tracking, and so on. Because of the diversity of the sub-tasks, the layers within and across the DNN models are highly heterogeneous in operation and shape. Such layer heterogeneity is a challenge for a fixed dataflow accelerator… ▽ More

    Submitted 16 December, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: This paper is accepted at HPCA 2021

  18. arXiv:1908.04484  [pdf, other

    cs.NI cs.AI cs.AR cs.LG eess.SY

    Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization

    Authors: Sheng-Chun Kao, Chao-Han Huck Yang, Pin-Yu Chen, Xiaoli Ma, Tushar Krishna

    Abstract: Applying Machine Learning (ML) techniques to design and optimize computer architectures is a promising research direction. Optimizing the runtime performance of a Network-on-Chip (NoC) necessitates a continuous learning framework. In this work, we demonstrate the promise of applying reinforcement learning (RL) to optimize NoC runtime performance. We present three RL-based methods for learning opti… ▽ More

    Submitted 13 August, 2019; originally announced August 2019.

  19. arXiv:1811.02883  [pdf, other

    cs.DC cs.AR

    SCALE-Sim: Systolic CNN Accelerator Simulator

    Authors: Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna

    Abstract: Systolic Arrays are one of the most popular compute substrates within Deep Learning accelerators today, as they provide extremely high efficiency for running dense matrix multiplications. However, the research community lacks tools to insights on both the design trade-offs and efficient map** strategies for systolic-array based accelerators. We introduce Systolic CNN Accelerator Simulator (SCALE… ▽ More

    Submitted 1 February, 2019; v1 submitted 16 October, 2018; originally announced November 2018.

  20. arXiv:1808.01363  [pdf, other

    cs.NE

    GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware

    Authors: Ananda Samajdar, Parth Mannan, Kartikay Garg, Tushar Krishna

    Abstract: Modern deep learning systems rely on (a) a hand-tuned neural network topology, (b) massive amounts of labeled training data, and (c) extensive training over large-scale compute resources to build a system that can perform efficient image classification or speech recognition. Unfortunately, we are still far away from implementing adaptive general purpose intelligent systems which would need to lear… ▽ More

    Submitted 13 September, 2018; v1 submitted 3 August, 2018; originally announced August 2018.

    Comments: This work is accepted and will appear in MICRO-51

  21. arXiv:1805.02566  [pdf, other

    cs.DC cs.LG

    Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach Using MAESTRO

    Authors: Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, Tushar Krishna

    Abstract: The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse and perform staging are known as dataflow, and they directly impact the performance and energy efficiency of DNN accelerator designs. An accelerator microarchitecture dictates the dataflow(s) that can be employed to execute a layer or network. Selecting an optimal dataflow for a layer shape can have a large… ▽ More

    Submitted 11 May, 2020; v1 submitted 4 May, 2018; originally announced May 2018.

  22. Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube

    Authors: Ramyad Hadidi, Bahar Asgari, Jeffrey Young, Burhan Ahmad Mudassar, Kartikay Garg, Tushar Krishna, Hyesoon Kim

    Abstract: Memories that exploit three-dimensional (3D)-stacking technology, which integrate memory and logic dies in a single stack, are becoming popular. These memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC) design for connecting their internal structural organizations. This novel usage of NoC, in addition to aiding processing-in-memory capabilities, enables numerous benefits su… ▽ More

    Submitted 13 February, 2018; v1 submitted 17 July, 2017; originally announced July 2017.

    Journal ref: 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

  23. arXiv:1702.02313  [pdf, other

    cs.AR

    FASHION: Fault-Aware Self-Healing Intelligent On-chip Network

    Authors: Pengju Ren, Michel A. Kinsy, Mengjiao Zhu, Shreeya Khadka, Mihailo Isakov, Aniruddh Ramrakhyani, Tushar Krishna, Nanning Zheng

    Abstract: To avoid packet loss and deadlock scenarios that arise due to faults or power gating in multicore and many-core systems, the network-on-chip needs to possess resilient communication and load-balancing properties. In this work, we introduce the Fashion router, a self-monitoring and self-reconfiguring design that allows for the on-chip network to dynamically adapt to component failures. First, we in… ▽ More

    Submitted 8 February, 2017; originally announced February 2017.

    Comments: 14 pages, 12 figures

  24. arXiv:1701.03499  [pdf, other

    cs.AR

    VESPA: VIPT Enhancements for Superpage Accesses

    Authors: Mayank Parasar, Abhishek Bhattacharjee, Tushar Krishna

    Abstract: L1 caches are critical to the performance of modern computer systems. Their design involves a delicate balance between fast lookups, high hit rates, low access energy, and simplicity of implementation. Unfortunately, constraints imposed by virtual memory make it difficult to satisfy all these attributes today. Specifically, the modern staple of supporting virtual-indexing and physical-tagging (VIP… ▽ More

    Submitted 14 February, 2017; v1 submitted 12 January, 2017; originally announced January 2017.

    MSC Class: 68-06

  25. arXiv:1209.0257  [pdf, other

    nlin.AO physics.bio-ph q-bio.CB q-bio.NC

    Achieving Control of Lesion Growth in CNS with Minimal Damage

    Authors: Mathankumar Raja, T. R. Krishna Mohan

    Abstract: Lesions in central nervous system (CNS) and their growth leads to debilitating diseases like Multiple Sclerosis (MS), Alzheimer's etc. We developed a model earlier which shows how the lesion growth can be arrested through a beneficial auto-immune mechanism. The success of the approach depends on a set of control parameters and their phase space was shown to have a smooth manifold separating the un… ▽ More

    Submitted 3 September, 2012; originally announced September 2012.

  26. arXiv:1003.4658  [pdf, ps, other

    physics.geo-ph cond-mat.other

    Earthquake Correlations and Networks- A Comparative Study

    Authors: T. R. Krishna Mohan P. G., Revathi

    Abstract: We quantify the correlation between earthquakes and use the same to distinguish between relevant causally connected earthquakes. Our correlation metric is a variation on the one introduced by Baiesi and Paczuski (2004). A network of earthquakes is constructed, which is time ordered and with links between the more correlated ones. Recurrences to earthquakes are identified employing correlation t… ▽ More

    Submitted 24 March, 2010; originally announced March 2010.

    Comments: 17 pages, 6 figures

  27. Network of Earthquakes and Recurrences Therein

    Authors: T. R. Krishna Mohan, P. G. Revathi

    Abstract: We quantify the correlation between earthquakes and use the same to distinguish between relevant causally connected earthquakes. Our correlation metric is a variation on the one introduced by Baiesi and Paczuski (2004). A network of earthquakes is constructed, which is time ordered and with links between the more correlated ones. Data pertaining to the California region has been used in the study.… ▽ More

    Submitted 23 March, 2010; originally announced March 2010.

    Comments: 17 pages, 5 figures

  28. Network of Recurrent events - A case study of Japan

    Authors: P. G. Revathi, T. R. Krishnamohan

    Abstract: A recently proposed method of constructing seismic networks from 'record breaking events' from the earthquake catalog of California (Phy. Rev. E, 77 6,066104, 2008) was successfull in establishing causal features to seismicity and arrive at estimates for rupture length and its scaling with magnitude. The results of our implementation of this procedure on the earthquake catalog of Japan establish… ▽ More

    Submitted 24 January, 2010; originally announced January 2010.

    Comments: 13 pages, 6 figures, 1 table

  29. arXiv:0708.3271  [pdf, ps, other

    physics.bio-ph physics.comp-ph q-bio.NC

    Simulation of Spread and Control of Lesions in Brain

    Authors: T. R. Krishna Mohan

    Abstract: A simulation model for the spread and control of lesions in the brain is constructed using a planar network (graph) representation for the Central Nervous System (CNS). The model is inspired by the lesion structures observed in the case of Multiple Sclerosis (MS), a chronic disease of the CNS. The initial lesion site is at the center of a unit square and spreads outwards based on the success rat… ▽ More

    Submitted 24 August, 2007; originally announced August 2007.

    Comments: 5 pages, 3 postscript figures, submitted for publication