SFC: Near-Source Congestion Signaling and Flow Control
Authors:
Yanfang Le,
Jeongkeun Lee,
Jeremias Blendin,
Jiayi Chen,
Georgios Nikolaidis,
Rong Pan,
Robert Soule,
Aditya Akella,
Pedro Yebenes Segura,
Arjun singhvi,
Yuliang Li,
Qingkai Meng,
Changhoon Kim,
Serhat Arslan
Abstract:
State-of-the-art congestion control algorithms for data centers alone do not cope well with transient congestion and high traffic bursts. To help with these, we revisit the concept of direct \emph{backward} feedback from switches and propose Back-to-Sender (BTS) signaling to many concurrent incast senders. Combining it with our novel approach to in-network caching, we achieve near-source sub-RTT c…
▽ More
State-of-the-art congestion control algorithms for data centers alone do not cope well with transient congestion and high traffic bursts. To help with these, we revisit the concept of direct \emph{backward} feedback from switches and propose Back-to-Sender (BTS) signaling to many concurrent incast senders. Combining it with our novel approach to in-network caching, we achieve near-source sub-RTT congestion signaling. Source Flow Control (SFC) combines these two simple signaling mechanisms to instantly pause traffic sources, hence avoiding the head-of-line blocking problem of conventional hop-by-hop flow control. Our prototype system and scale simulations demonstrate that near-source signaling can significantly reduce the message completion time of various workloads in the presence of incast, complementing existing congestion control algorithms. Our results show that SFC can reduce the $99^{th}$-percentile flow completion times by $1.2-6\times$ and the peak switch buffer usage by $2-3\times$ compared to the recent incast solutions.
△ Less
Submitted 30 April, 2023;
originally announced May 2023.
Backpressure Flow Control
Authors:
Prateesh Goyal,
Preey Shah,
Kevin Zhao,
Georgios Nikolaidis,
Mohammad Alizadeh,
Thomas E. Anderson
Abstract:
Effective congestion control for data center networks is becoming increasingly challenging with a growing amount of latency sensitive traffic, much fatter links, and extremely bursty traffic. Widely deployed algorithms, such as DCTCP and DCQCN, are still far from optimal in many plausible scenarios, particularly for tail latency. Many operators compensate by running their networks at low average u…
▽ More
Effective congestion control for data center networks is becoming increasingly challenging with a growing amount of latency sensitive traffic, much fatter links, and extremely bursty traffic. Widely deployed algorithms, such as DCTCP and DCQCN, are still far from optimal in many plausible scenarios, particularly for tail latency. Many operators compensate by running their networks at low average utilization, dramatically increasing costs.
In this paper, we argue that we have reached the practical limits of end-to-end congestion control. Instead, we propose, implement, and evaluate a new congestion control architecture called Backpressure Flow Control (BFC). BFC provides per-hop per-flow flow control, but with bounded state, constant-time switch operations, and careful use of buffers. We demonstrate BFC's feasibility by implementing it on Tofino2, a state-of-the-art P4-based programmable hardware switch. In simulation, we show that BFC achieves near optimal throughput and tail latency behavior even under challenging conditions such as high network load and incast cross traffic. Compared to existing end-to-end schemes, BFC achieves 2.3 - 60 X lower tail latency for short flows and 1.6 - 5 X better average completion time for long flows.
△ Less
Submitted 29 March, 2021; v1 submitted 21 September, 2019;
originally announced September 2019.