Skip to main content

Showing 1–8 of 8 results for author: Vamanan, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2302.05865  [pdf, other

    cs.LG cs.DC

    Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization

    Authors: Hamidreza Almasi, Harsh Mishra, Balajee Vamanan, Sathya N. Ravi

    Abstract: Modern ML applications increasingly rely on complex deep learning models and large datasets. There has been an exponential growth in the amount of computation needed to train the largest models. Therefore, to scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model. However, a di… ▽ More

    Submitted 24 September, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

  2. arXiv:2108.08199  [pdf, other

    cs.DC cs.IR

    Modeling Performance and Energy trade-offs in Online Data-Intensive Applications

    Authors: Ajay Badita, Rooji **an, Balajee Vamanan, Parimal Parag

    Abstract: We consider energy minimization for data-intensive applications run on large number of servers, for given performance guarantees. We consider a system, where each incoming application is sent to a set of servers, and is considered to be completed if a subset of them finish serving it. We consider a simple case when each server core has two speed levels, where the higher speed can be achieved by hi… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

  3. arXiv:1809.09751  [pdf, other

    cs.NI

    Pulser: Fast Congestion Response using Explicit Incast Notifications for Datacenter Networks

    Authors: Hamidrezae Almasi, Hamed Rezaei, Muhammad Usama Chaudhry, Balajee Vamanan

    Abstract: Datacenter applications frequently cause incast congestion, which degrades both flow completion times of short flows and throughput of long flows. Without isolating incast, existing congestion control schemes (e.g., DCTCP) rely on existing ECN signal to react to general congestion, and they lose performance due to their slow, cautious, and inaccurate reaction to incast. We propose to isolate incas… ▽ More

    Submitted 2 October, 2018; v1 submitted 25 September, 2018; originally announced September 2018.

    Comments: 7 Pages

  4. arXiv:1807.02184  [pdf, other

    cs.NI

    Slytherin: Dynamic, Network-assisted Prioritization of Tail Packets in Datacenter Networks

    Authors: Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan

    Abstract: Datacenter applications demand both low latency and high throughput; while interactive applications (e.g., Web Search) demand low tail latency for their short messages due to their partition-aggregate software architecture, many data-intensive applications (e.g., Map-Reduce) require high throughput for long flows as they move vast amounts of data across the network. Recent proposals improve latenc… ▽ More

    Submitted 5 July, 2018; originally announced July 2018.

    Comments: 9 pages, 10 figures, ICCCN'18

  5. arXiv:1805.11158  [pdf, other

    cs.NI

    Dart: Divide and Specialize for Fast Response to Congestion in RDMA-based Datacenter Networks

    Authors: Jiachen Xue, Muhammad Usama Chaudhry, Balajee Vamanan, T. N. Vijaykumar, Mithuna Thottethodi

    Abstract: Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). Several papers have s… ▽ More

    Submitted 30 December, 2019; v1 submitted 28 May, 2018; originally announced May 2018.

    Comments: 15 pages, 14 figures

    MSC Class: C.2.2 ACM Class: C.2.2

  6. arXiv:1609.07192  [pdf, other

    cs.NI cs.DC

    Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers

    Authors: Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao, T. N. Vijaykumar

    Abstract: The conventional approach to scaling Software Defined Networking (SDN) controllers today is to partition switches based on network topology, with each partition being controlled by a single physical controller, running all SDN applications. However, topological partitioning is limited by the fact that (i) performance of latency-sensitive (e.g., monitoring) SDN applications associated with a given… ▽ More

    Submitted 22 September, 2016; originally announced September 2016.

    Comments: 8 pages

  7. arXiv:1504.04297  [pdf

    cs.AR

    MigrantStore: Leveraging Virtual Memory in DRAM-PCM Memory Architecture

    Authors: Hamza Bin Sohail, Balajee Vamanan, T. N. Vijaykumar

    Abstract: With the imminent slowing down of DRAM scaling, Phase Change Memory (PCM) is emerging as a lead alternative for main memory technology. While PCM achieves low energy due to various technology-specific advantages, PCM is significantly slower than DRAM (especially for writes) and can endure far fewer writes before wearing out. Previous work has proposed to use a large, DRAM-based hardware cache to a… ▽ More

    Submitted 16 April, 2015; originally announced April 2015.

  8. arXiv:1503.05338  [pdf

    cs.DC

    TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications

    Authors: Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, T. N. Vijaykumar

    Abstract: Datacenters running on-line, data-intensive applications (OLDIs) consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key aspect of OLDIs is that each user query goes to all or many of the nodes in the cluster, so that the overall time budget is dictated by the tail of the replies' latency distribution; replies see la… ▽ More

    Submitted 18 March, 2015; originally announced March 2015.

    Comments: 13 pages