-
The OpenDC Microservice Simulator: Design, Implementation, and Experimentation
Authors:
Muhammad Ahsan,
Sacheendra Talluri,
Alexandru Iosup
Abstract:
Microservices is an architectural style that structures an application as a collection of loosely coupled services, making it easy for developers to build and scale their applications. The microservices architecture approach differs from the traditional monolithic style of treating software development as a single entity. Microservice architecture is becoming more and more adapted. However, micros…
▽ More
Microservices is an architectural style that structures an application as a collection of loosely coupled services, making it easy for developers to build and scale their applications. The microservices architecture approach differs from the traditional monolithic style of treating software development as a single entity. Microservice architecture is becoming more and more adapted. However, microservice systems can be complex due to dependencies between the microservices, resulting in unpredictable performance at a large scale. Simulation is a cheap and fast way to investigate the performance of microservices in more detail. This study aims to build a microservices simulator for evaluating and comparing microservices based applications. The microservices reference architecture is designed. The architecture is used as the basis for a simulator. The simulator implementation uses statistical models to generate the workload. The compelling features added to the simulator include concurrent execution of microservices, configurable request depth, three load-balancing policies and four request execution order policies. This paper contains two experiments to show the simulator usage. The first experiment covers request execution order policies at the microservice instance. The second experiment compares load balancing policies across microservice instances.
△ Less
Submitted 26 April, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Failure Analysis of Big Cloud Service Providers Prior to and During Covid-19 Period
Authors:
Muhammad Ahsan,
Sacheendra Talluri,
Alexandru Iosup
Abstract:
Cloud services are important for societal function such as healthcare, commerce, entertainment and education. Cloud can provide a variety of features such as increased collaboration and inexpensive computing. Failures are unavoidable in cloud services due to the large size and complexity, resulting in decreased reliability and efficiency. For example, due to bugs, many high-severity failures have…
▽ More
Cloud services are important for societal function such as healthcare, commerce, entertainment and education. Cloud can provide a variety of features such as increased collaboration and inexpensive computing. Failures are unavoidable in cloud services due to the large size and complexity, resulting in decreased reliability and efficiency. For example, due to bugs, many high-severity failures have been occurring in cloud infrastructure of popular providers, causing outages of several hours and the unrecoverable loss of user data. There are prior studies about cloud failure analyses are limited and use sources such as news articles. However, a detailed cloud failure focused study is required that provides analyses for cloud failure data gathered directly from the vendors. Furthermore, the Covid-19 cloud failures should be studied as cloud services played a major role throughout the Covid-19 period, as individuals relied on cloud services for activities such as working from home. A program can be made for this task. As a result, we will be able to better understand and mitigate cloud failures to reduce the effect of cloud failures.
△ Less
Submitted 17 January, 2023; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Let's Trace It: Fine-Grained Serverless Benchmarking using Synchronous and Asynchronous Orchestrated Applications
Authors:
Joel Scheuner,
Simon Eismann,
Sacheendra Talluri,
Erwin van Eyk,
Cristina Abad,
Philipp Leitner,
Alexandru Iosup
Abstract:
Making serverless computing widely applicable requires detailed performance understanding. Although contemporary benchmarking approaches exist, they report only coarse results, do not apply distributed tracing, do not consider asynchronous applications, and provide limited capabilities for (root cause) analysis. Addressing this gap, we design and implement ServiBench, a serverless benchmarking sui…
▽ More
Making serverless computing widely applicable requires detailed performance understanding. Although contemporary benchmarking approaches exist, they report only coarse results, do not apply distributed tracing, do not consider asynchronous applications, and provide limited capabilities for (root cause) analysis. Addressing this gap, we design and implement ServiBench, a serverless benchmarking suite. ServiBench (i) leverages synchronous and asynchronous serverless applications representative of production usage, (ii) extrapolates cloud-provider data to generate realistic workloads, (iii) conducts comprehensive, end-to-end experiments to capture application-level performance, (iv) analyzes results using a novel approach based on (distributed) serverless tracing, and (v) supports comprehensively serverless performance analysis. With ServiBench, we conduct comprehensive experiments on AWS, covering five common performance factors: median latency, cold starts, tail latency, scalability, and dynamic workloads. We find that the median end-to-end latency of serverless applications is often dominated not by function computation but by external service calls, orchestration, or trigger-based coordination. We release collected experimental data under FAIR principles and ServiBench as a tested, extensible open-source tool.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Characterizing User and Provider Reported Cloud Failures
Authors:
Mehmet Berk Cetin,
Sacheendra Talluri,
Alexandru Iosup
Abstract:
Cloud computing is the backbone of the digital society. Digital banking, media, communication, gaming, and many others depend on cloud services. Unfortunately, cloud services may fail, leading to damaged services, unhappy users, and perhaps millions of dollars lost for companies. Understanding a cloud service failure requires a detailed report on why and how the service failed. Previous work studi…
▽ More
Cloud computing is the backbone of the digital society. Digital banking, media, communication, gaming, and many others depend on cloud services. Unfortunately, cloud services may fail, leading to damaged services, unhappy users, and perhaps millions of dollars lost for companies. Understanding a cloud service failure requires a detailed report on why and how the service failed. Previous work studies how cloud services fail using logs published by cloud operators. However, information is lacking on how users perceive and experience cloud failures. Therefore, we collect and characterize the data for user-reported cloud failures from Down Detector for three cloud service providers over three years. We count and analyze time patterns in the user reports, and derive failures from those user reports and characterize their duration and interarrival time. We characterize provider-reported cloud failures and compare the results with the characterization of user-reported failures. The comparison reveals the information of how users perceive failures and how much of the failures are reported by cloud service providers. Overall, this work provides a characterization of user- and provider-reported cloud failures and compares them with each other.
△ Less
Submitted 26 April, 2023; v1 submitted 23 October, 2021;
originally announced October 2021.
-
Question Generation from Paragraphs: A Tale of Two Hierarchical Models
Authors:
Vishwajeet Kumar,
Raktim Chaki,
Sai Teja Talluri,
Ganesh Ramakrishnan,
Yuan-Fang Li,
Gholamreza Haffari
Abstract:
Automatic question generation from paragraphs is an important and challenging problem, particularly due to the long context from paragraphs. In this paper, we propose and study two hierarchical models for the task of question generation from paragraphs. Specifically, we propose (a) a novel hierarchical BiLSTM model with selective attention and (b) a novel hierarchical Transformer architecture, bot…
▽ More
Automatic question generation from paragraphs is an important and challenging problem, particularly due to the long context from paragraphs. In this paper, we propose and study two hierarchical models for the task of question generation from paragraphs. Specifically, we propose (a) a novel hierarchical BiLSTM model with selective attention and (b) a novel hierarchical Transformer architecture, both of which learn hierarchical representations of paragraphs. We model a paragraph in terms of its constituent sentences, and a sentence in terms of its constituent words. While the introduction of the attention mechanism benefits the hierarchical BiLSTM model, the hierarchical Transformer, with its inherent attention and positional encoding mechanisms also performs better than flat transformer model. We conducted empirical evaluation on the widely used SQuAD and MS MARCO datasets using standard metrics. The results demonstrate the overall effectiveness of the hierarchical models over their flat counterparts. Qualitatively, our hierarchical models are able to generate fluent and relevant questions
△ Less
Submitted 8 November, 2019;
originally announced November 2019.
-
The Workflow Trace Archive: Open-Access Data from Public and Private Computing Infrastructures -- Technical Report
Authors:
Laurens Versluis,
Roland Mathá,
Sacheendra Talluri,
Tim Hegeman,
Radu Prodan,
Ewa Deelman,
Alexandru Iosup
Abstract:
Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. We focus in this work on traces of workflows---common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent, and (2) the use of realistic, {\it open-access} trac…
▽ More
Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. We focus in this work on traces of workflows---common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent, and (2) the use of realistic, {\it open-access} traces even more so. Alleviating these issues, we introduce the Workflow Trace Archive (WTA), an open-access archive of workflow traces from diverse computing infrastructures and tooling to parse, validate, and analyze traces. The WTA includes ${>}48$ million workflows captured from ${>}10$ computing infrastructures, representing a broad diversity of trace domains and characteristics. To emphasize the importance of trace diversity, we characterize the WTA contents and analyze in simulation the impact of trace diversity on experiment results. Our results indicate significant differences in characteristics, properties, and workflow structures between workload sources, domains, and fields.
△ Less
Submitted 11 July, 2019; v1 submitted 18 June, 2019;
originally announced June 2019.
-
The AtLarge Vision on the Design of Distributed Systems and Ecosystems
Authors:
Alexandru Iosup,
Laurens Versluis,
Animesh Trivedi,
Erwin van Eyk,
Lucian Toader,
Vincent van Beek,
Giulia Frascaria,
Ahmed Musaafir,
Sacheendra Talluri
Abstract:
High-quality designs of distributed systems and services are essential for our digital economy and society. Threatening to slow down the stream of working designs, we identify the mounting pressure of scale and complexity of \mbox{(eco-)systems}, of ill-defined and wicked problems, and of unclear processes, methods, and tools. We envision design itself as a core research topic in distributed syste…
▽ More
High-quality designs of distributed systems and services are essential for our digital economy and society. Threatening to slow down the stream of working designs, we identify the mounting pressure of scale and complexity of \mbox{(eco-)systems}, of ill-defined and wicked problems, and of unclear processes, methods, and tools. We envision design itself as a core research topic in distributed systems, to understand and improve the science and practice of distributed (eco-)system design. Toward this vision, we propose the AtLarge design framework, accompanied by a set of 8 core design principles. We also propose 10 key challenges, which we hope the community can address in the following 5 years. In our experience so far, the proposed framework and principles are practical, and lead to pragmatic and innovative designs for large-scale distributed systems.
△ Less
Submitted 14 February, 2019;
originally announced February 2019.
-
Massivizing Computer Systems: a Vision to Understand, Design, and Engineer Computer Ecosystems through and beyond Modern Distributed Systems
Authors:
Alexandru Iosup,
Alexandru Uta,
Laurens Versluis,
Georgios Andreadis,
Erwin van Eyk,
Tim Hegeman,
Sacheendra Talluri,
Vincent van Beek,
Lucian Toader
Abstract:
Our society is digital: industry, science, governance, and individuals depend, often transparently, on the inter-operation of large numbers of distributed computer systems. Although the society takes them almost for granted, these computer ecosystems are not available for all, may not be affordable for long, and raise numerous other research challenges. Inspired by these challenges and by our expe…
▽ More
Our society is digital: industry, science, governance, and individuals depend, often transparently, on the inter-operation of large numbers of distributed computer systems. Although the society takes them almost for granted, these computer ecosystems are not available for all, may not be affordable for long, and raise numerous other research challenges. Inspired by these challenges and by our experience with distributed computer systems, we envision Massivizing Computer Systems, a domain of computer science focusing on understanding, controlling, and evolving successfully such ecosystems. Beyond establishing and growing a body of knowledge about computer ecosystems and their constituent systems, the community in this domain should also aim to educate many about design and engineering for this domain, and all people about its principles. This is a call to the entire community: there is much to discover and achieve.
△ Less
Submitted 22 February, 2018; v1 submitted 15 February, 2018;
originally announced February 2018.