Skip to main content

Showing 1–2 of 2 results for author: Subbaraj, H

.
  1. arXiv:2007.05832  [pdf, other

    cs.DC

    Optimizing Prediction Serving on Low-Latency Serverless Dataflow

    Authors: Vikram Sreekanti, Harikaran Subbaraj, Chenggang Wu, Joseph E. Gonzalez, Joseph M. Hellerstein

    Abstract: Prediction serving systems are designed to provide large volumes of low-latency inferences machine learning models. These systems mix data processing and computationally intensive model inference and benefit from multiple heterogeneous processors and distributed computing resources. In this paper, we argue that a familiar dataflow API is well-suited to this latency-sensitive task, and amenable to… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

  2. arXiv:1901.00041  [pdf, other

    cs.DC

    Dynamic Space-Time Scheduling for GPU Inference

    Authors: Paras Jain, Xiangxi Mo, Ajay Jain, Harikaran Subbaraj, Rehan Sohail Durrani, Alexey Tumanov, Joseph Gonzalez, Ion Stoica

    Abstract: Serving deep neural networks in latency critical interactive settings often requires GPU acceleration. However, the small batch sizes typical in online inference results in poor GPU utilization, a potential performance gap which GPU resource sharing can address. In this paper, we explore several techniques to leverage both temporal and spatial multiplexing to improve GPU utilization for deep learn… ▽ More

    Submitted 31 December, 2018; originally announced January 2019.