Skip to main content

Showing 1–10 of 10 results for author: Vijaykumar, T N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.04708  [pdf, other

    cs.AR

    Efficient Sparse Processing-in-Memory Architecture (ESPIM) for Machine Learning Inference

    Authors: Mingxuan He, Mithuna Thottethodi, T. N. Vijaykumar

    Abstract: Emerging machine learning (ML) models (e.g., transformers) involve memory pin bandwidth-bound matrix-vector (MV) computation in inference. By avoiding pin crossings, processing in memory (PIM) can improve performance and energy for pin-bound workloads, as evidenced by recent commercial efforts in (digital) PIM. Sparse models can improve performance and energy of inference without losing much accur… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  2. arXiv:2404.03113  [pdf, other

    cs.AR

    QED: Scalable Verification of Hardware Memory Consistency

    Authors: Gokulan Ravi, Xiaokang Qiu, Mithuna Thottethodi, T. N. Vijaykumar

    Abstract: Memory consistency model (MCM) issues in out-of-order-issue microprocessor-based shared-memory systems are notoriously non-intuitive and a source of hardware design bugs. Prior hardware verification work is limited to in-order-issue processors, to proving the correctness only of some test cases, or to bounded verification that does not scale in practice beyond 7 instructions across all threads. Be… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 13 pages, 8 figures

  3. arXiv:2306.07785  [pdf, other

    cs.AR cs.CR

    SafeBet: Secure, Simple, and Fast Speculative Execution

    Authors: Conor Green, Cole Nelson, Mithuna Thottethodi, T. N. Vijaykumar

    Abstract: Spectre attacks exploit microprocessor speculative execution to read and transmit forbidden data outside the attacker's trust domain and sandbox. Recent hardware schemes allow potentially-unsafe speculative accesses but prevent the secret's transmission by delaying most access-dependent instructions even in the predominantly-common, no-attack case, which incurs performance loss and hardware comple… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  4. arXiv:2106.14138  [pdf, other

    cs.AR

    OCCAM: Optimal Data Reuse for Convolutional Neural Networks

    Authors: Ashish Gondimalla, Jianqiao Liu, T. N. Vijaykumar, Mithuna Thottethodi

    Abstract: Convolutional neural networks (CNNs) are emerging as powerful tools for image processing in important commercial applications. We focus on the important problem of improving the latency of image recognition. CNNs' large data at each layer's input, filters, and output poses a memory bandwidth problem. While previous work captures only some of the enormous data reuse, full reuse implies that the ini… ▽ More

    Submitted 26 June, 2021; originally announced June 2021.

  5. arXiv:2104.08734  [pdf, other

    cs.AR

    Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks

    Authors: Ashish Gondimalla, Sree Charan Gundabolu, T. N. Vijaykumar, Mithuna Thottethodi

    Abstract: Convolutional neural networks (CNNs) are emerging as powerful tools for visual recognition. Recent architecture proposals for sparse CNNs exploit zeros in the feature maps and filters for performance and energy without losing accuracy. Sparse architectures that exploit two-sided sparsity in both feature maps and filters have been studied only at small scales (e.g., 1K multiply-accumulate(MAC) unit… ▽ More

    Submitted 8 May, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

  6. arXiv:2011.02022  [pdf, other

    cs.AR

    Booster: An Accelerator for Gradient Boosting Decision Trees

    Authors: Mingxuan He, T. N. Vijaykumar, Mithuna Thottethodi

    Abstract: We propose Booster, a novel accelerator for gradient boosting trees based on the unique characteristics of gradient boosting models. We observe that the dominant steps of gradient boosting training (accounting for 90-98% of training time) involve simple, fine-grained, independent operations on small-footprint data structures (e.g., accumulate and compare values in the structures). Unfortunately, e… ▽ More

    Submitted 5 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

  7. arXiv:1805.11158  [pdf, other

    cs.NI

    Dart: Divide and Specialize for Fast Response to Congestion in RDMA-based Datacenter Networks

    Authors: Jiachen Xue, Muhammad Usama Chaudhry, Balajee Vamanan, T. N. Vijaykumar, Mithuna Thottethodi

    Abstract: Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). Several papers have s… ▽ More

    Submitted 30 December, 2019; v1 submitted 28 May, 2018; originally announced May 2018.

    Comments: 15 pages, 14 figures

    MSC Class: C.2.2 ACM Class: C.2.2

  8. arXiv:1609.07192  [pdf, other

    cs.NI cs.DC

    Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers

    Authors: Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao, T. N. Vijaykumar

    Abstract: The conventional approach to scaling Software Defined Networking (SDN) controllers today is to partition switches based on network topology, with each partition being controlled by a single physical controller, running all SDN applications. However, topological partitioning is limited by the fact that (i) performance of latency-sensitive (e.g., monitoring) SDN applications associated with a given… ▽ More

    Submitted 22 September, 2016; originally announced September 2016.

    Comments: 8 pages

  9. arXiv:1504.04297  [pdf

    cs.AR

    MigrantStore: Leveraging Virtual Memory in DRAM-PCM Memory Architecture

    Authors: Hamza Bin Sohail, Balajee Vamanan, T. N. Vijaykumar

    Abstract: With the imminent slowing down of DRAM scaling, Phase Change Memory (PCM) is emerging as a lead alternative for main memory technology. While PCM achieves low energy due to various technology-specific advantages, PCM is significantly slower than DRAM (especially for writes) and can endure far fewer writes before wearing out. Previous work has proposed to use a large, DRAM-based hardware cache to a… ▽ More

    Submitted 16 April, 2015; originally announced April 2015.

  10. arXiv:1503.05338  [pdf

    cs.DC

    TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications

    Authors: Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, T. N. Vijaykumar

    Abstract: Datacenters running on-line, data-intensive applications (OLDIs) consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key aspect of OLDIs is that each user query goes to all or many of the nodes in the cluster, so that the overall time budget is dictated by the tail of the replies' latency distribution; replies see la… ▽ More

    Submitted 18 March, 2015; originally announced March 2015.

    Comments: 13 pages