Gemini: Practical Reconfigurable Datacenter Networks with Topology and Traffic Engineering
Authors:
Mingyang Zhang,
Jianan Zhang,
Rui Wang,
Ramesh Govindan,
Jeffrey C. Mogul,
Amin Vahdat
Abstract:
To reduce cost, datacenter network operators are exploring blocking network designs. An example of such a design is a "spine-free" form of a Fat-Tree, in which pods directly connect to each other, rather than via spine blocks. To maintain application-perceived performance in the face of dynamic workloads, these new designs must be able to reconfigure routing and the inter-pod topology. Gemini is a…
▽ More
To reduce cost, datacenter network operators are exploring blocking network designs. An example of such a design is a "spine-free" form of a Fat-Tree, in which pods directly connect to each other, rather than via spine blocks. To maintain application-perceived performance in the face of dynamic workloads, these new designs must be able to reconfigure routing and the inter-pod topology. Gemini is a system designed to achieve these goals on commodity hardware while reconfiguring the network infrequently, rendering these blocking designs practical enough for deployment in the near future. The key to Gemini is the joint optimization of topology and routing, using as input a robust estimation of future traffic derived from multiple historical traffic matrices. Gemini "hedges" against unpredicted bursts, by spreading these bursts across multiple paths, to minimize packet loss in exchange for a small increase in path lengths. It incorporates a robust decision algorithm to determine when to reconfigure, and whether to use hedging. Data from tens of production fabrics allows us to categorize these as either low-or high-volatility; these categories seem stable. For the former, Gemini finds topologies and routing with near-optimal performance and cost. For the latter, Gemini's use of multi-traffic-matrix optimization and hedging avoids the need for frequent topology reconfiguration, with only marginal increases in path length. As a result, Gemini can support existing workloads on these production fabrics using a spine-free topology that is half the cost of the existing topology on these fabrics.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
Flexible Network Bandwidth and Latency Provisioning in the Datacenter
Authors:
Vimalkumar Jeyakumar,
Abdul Kabbani,
Jeffrey C. Mogul,
Amin Vahdat
Abstract:
Predictably sharing the network is critical to achieving high utilization in the datacenter. Past work has focussed on providing bandwidth to endpoints, but often we want to allocate resources among multi-node services. In this paper, we present Parley, which provides service-centric minimum bandwidth guarantees, which can be composed hierarchically. Parley also supports service-centric weighted s…
▽ More
Predictably sharing the network is critical to achieving high utilization in the datacenter. Past work has focussed on providing bandwidth to endpoints, but often we want to allocate resources among multi-node services. In this paper, we present Parley, which provides service-centric minimum bandwidth guarantees, which can be composed hierarchically. Parley also supports service-centric weighted sharing of bandwidth in excess of these guarantees. Further, we show how to configure these policies so services can get low latencies even at high network load. We evaluate Parley on a multi-tiered oversubscribed network connecting 90 machines, each with a 10Gb/s network interface, and demonstrate that Parley is able to meet its goals.
△ Less
Submitted 5 May, 2014; v1 submitted 3 May, 2014;
originally announced May 2014.