Search | arXiv e-print repository

doi 10.1016/j.adhoc.2024.103403

Eventually-Consistent Federated Scheduling for Data Center Workloads

Authors: Meghana Thiyyakat, Subramaniam Kalambur, Rishit Chaudhary, Saurav G Nayak, Adarsh Shetty, Dinkar Sitaram

Abstract: Data center schedulers operate at unprecedented scales today to accommodate the growing demand for computing and storage power. The challenge that schedulers face is meeting the requirements of scheduling speeds despite the scale. To do so, most scheduler architectures use parallelism. However, these architectures consist of multiple parallel scheduling entities that can only utilize partial knowl… ▽ More Data center schedulers operate at unprecedented scales today to accommodate the growing demand for computing and storage power. The challenge that schedulers face is meeting the requirements of scheduling speeds despite the scale. To do so, most scheduler architectures use parallelism. However, these architectures consist of multiple parallel scheduling entities that can only utilize partial knowledge of the data center's state, as maintaining consistent global knowledge or state would involve considerable communication overhead. The disadvantage of scheduling without global knowledge is sub-optimal placements-tasks may be made to wait in queues even though there are resources available in zones outside the scope of the scheduling entity's state. This leads to unnecessary queuing overheads and lower resource utilization of the data center. In this paper, extend our previous work on Megha, a federated decentralized data center scheduling architecture that uses eventual consistency. The architecture utilizes both parallelism and an eventually-consistent global state in each of its scheduling entities to make fast decisions in a scalable manner. In our work, we compare Megha with 3 scheduling architectures: Sparrow, Eagle, and Pigeon, using simulation. We also evaluate Megha's prototype on a 123-node cluster and compare its performance with Pigeon's prototype using cluster traces. The results of our experiments show that Megha consistently reduces delays in job completion time when compared to other architectures. △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: 26 pages. Submitted to Elsevier's Ad Hoc Networks Journal

arXiv:2203.14076 [pdf, other]

doi 10.1145/3491204.3527462

MiSeRTrace: Kernel-level Request Tracing for Microservice Visibility

Authors: Thrivikraman V, Vishnu R. Dixit, Nikhil Ram S, Vikas K. Gowda, Santhosh Kumar Vasudevan, Subramaniam Kalambur

Abstract: With the evolution of microservice applications, the underlying architectures have become increasingly complex compared to their monolith counterparts. This mainly brings in the challenge of observability. By providing a deeper understanding into the functioning of distributed applications, observability enables improving the performance of the system by obtaining a view of the bottlenecks in the… ▽ More With the evolution of microservice applications, the underlying architectures have become increasingly complex compared to their monolith counterparts. This mainly brings in the challenge of observability. By providing a deeper understanding into the functioning of distributed applications, observability enables improving the performance of the system by obtaining a view of the bottlenecks in the implementation. The observability provided by currently existing tools that perform dynamic tracing on distributed applications is limited to the user-space and requires the application to be instrumented to track request flows. In this paper, we present a new open-source framework MiSeRTrace that can trace the end-to-end path of requests entering a microservice application at the kernel space without requiring instrumentation or modification of the application. Observability at the comprehensiveness of the kernel space allows breaking down of various steps in activities such as network transfers and IO tasks, thus enabling root cause based performance analysis and accurate identification of hotspots. MiSeRTrace supports tracing user-enabled kernel events provided by frameworks such as bpftrace or ftrace and isolates kernel activity associated with each application request with minimal overheads. We then demonstrate the working of the solution with results on a benchmark microservice application. △ Less

Submitted 3 December, 2022; v1 submitted 26 March, 2022; originally announced March 2022.

arXiv:2103.08413 [pdf, other]

doi 10.1016/j.adhoc.2024.103403

Megha: Decentralized Global Fair Scheduling for Federated Clusters

Authors: Meghana Thiyyakat, Subramaniam Kalambur, Dinkar Sitaram

Abstract: Increasing scale and heterogeneity in data centers have led to the development of federated clusters such as KubeFed, Hydra, and Pigeon, that federate individual data center clusters. In our work, we introduce Megha, a novel decentralized resource management framework for such federated clusters. Megha employs flexible logical partitioning of clusters to distribute its scheduling load, ensuring th… ▽ More Increasing scale and heterogeneity in data centers have led to the development of federated clusters such as KubeFed, Hydra, and Pigeon, that federate individual data center clusters. In our work, we introduce Megha, a novel decentralized resource management framework for such federated clusters. Megha employs flexible logical partitioning of clusters to distribute its scheduling load, ensuring that the requirements of the workload are satisfied with very low scheduling overheads. It uses a distributed global scheduler that does not rely on a centralized data store but, instead, works with eventual consistency, unlike other schedulers that use a tiered architecture or rely on centralized databases. Our experiments with Megha show that it can schedule tasks taking into account fairness and placement constraints with low resource allocation times - in the order of tens of milliseconds. △ Less

Submitted 10 November, 2022; v1 submitted 15 March, 2021; originally announced March 2021.

Comments: 10 pages, 12 figures, conference paper

Showing 1–3 of 3 results for author: Kalambur, S