-
An Analysis of MLOps Architectures: A Systematic Map** Study
Authors:
Faezeh Amou Najafabadi,
Justus Bogner,
Ilias Gerostathopoulos,
Patricia Lago
Abstract:
Context. Despite the increasing adoption of Machine Learning Operations (MLOps), teams still encounter challenges in effectively applying this paradigm to their specific projects. While there is a large variety of available tools usable for MLOps, there is simultaneously a lack of consolidated architecture knowledge that can inform the architecture design. Objective. Our primary objective is to pr…
▽ More
Context. Despite the increasing adoption of Machine Learning Operations (MLOps), teams still encounter challenges in effectively applying this paradigm to their specific projects. While there is a large variety of available tools usable for MLOps, there is simultaneously a lack of consolidated architecture knowledge that can inform the architecture design. Objective. Our primary objective is to provide a comprehensive overview of (i) how MLOps architectures are defined across the literature and (ii) which tools are mentioned to support the implementation of each architecture component. Method. We apply the Systematic Map** Study method and select 43 primary studies via automatic, manual, and snowballing-based search and selection procedures. Subsequently, we use card sorting to synthesize the results. Results. We contribute (i) a categorization of 35 MLOps architecture components, (ii) a description of several MLOps architecture variants, and (iii) a systematic map between the identified components and the existing MLOps tools. Conclusion. This study provides an overview of the state of the art in MLOps from an architectural perspective. Researchers and practitioners can use our findings to inform the architecture design of their MLOps systems.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Expert-Driven Monitoring of Operational ML Models
Authors:
Joran Leest,
Claudia Raibulet,
Ilias Gerostathopoulos,
Patricia Lago
Abstract:
We propose Expert Monitoring, an approach that leverages domain expertise to enhance the detection and mitigation of concept drift in machine learning (ML) models. Our approach supports practitioners by consolidating domain expertise related to concept drift-inducing events, making this expertise accessible to on-call personnel, and enabling automatic adaptability with expert oversight.
We propose Expert Monitoring, an approach that leverages domain expertise to enhance the detection and mitigation of concept drift in machine learning (ML) models. Our approach supports practitioners by consolidating domain expertise related to concept drift-inducing events, making this expertise accessible to on-call personnel, and enabling automatic adaptability with expert oversight.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
SUAVE: An Exemplar for Self-Adaptive Underwater Vehicles
Authors:
Gustavo Rezende Silva,
Juliane Päßler,
Jeroen Zwanepol,
Elvin Alberts,
S. Lizeth Tapia Tarifa,
Ilias Gerostathopoulos,
Einar Broch Johnsen,
Carlos Hernández Corbato
Abstract:
Once deployed in the real world, autonomous underwater vehicles (AUVs) are out of reach for human supervision yet need to take decisions to adapt to unstable and unpredictable environments. To facilitate research on self-adaptive AUVs, this paper presents SUAVE, an exemplar for two-layered system-level adaptation of AUVs, which clearly separates the application and self-adaptation concerns. The ex…
▽ More
Once deployed in the real world, autonomous underwater vehicles (AUVs) are out of reach for human supervision yet need to take decisions to adapt to unstable and unpredictable environments. To facilitate research on self-adaptive AUVs, this paper presents SUAVE, an exemplar for two-layered system-level adaptation of AUVs, which clearly separates the application and self-adaptation concerns. The exemplar focuses on a mission for underwater pipeline inspection by a single AUV, implemented as a ROS2-based system. This mission must be completed while simultaneously accounting for uncertainties such as thruster failures and unfavorable environmental conditions. The paper discusses how SUAVE can be used with different self-adaptation frameworks, illustrated by an experiment using the Metacontrol framework to compare AUV behavior with and without self-adaptation. The experiment shows that the use of Metacontrol to adapt the AUV during its mission improves its performance when measured by the overall time taken to complete the mission or the length of the inspected pipeline.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Self-Adaptation in Industry: A Survey
Authors:
Danny Weyns,
Ilias Gerostathopoulos,
Nadeem Abbas,
Jesper Andersson,
Stefan Biffl,
Premek Brada,
Tomas Bures,
Amleto Di Salle,
Matthias Galster,
Patricia Lago,
Grace Lewis,
Marin Litoiu,
Angelika Musil,
Juergen Musil,
Panos Patros,
Patrizio Pelliccione
Abstract:
Computing systems form the backbone of many areas in our society, from manufacturing to traffic control, healthcare, and financial systems. When software plays a vital role in the design, construction, and operation, these systems are referred as software-intensive systems. Self-adaptation equips a software-intensive system with a feedback loop that either automates tasks that otherwise need to be…
▽ More
Computing systems form the backbone of many areas in our society, from manufacturing to traffic control, healthcare, and financial systems. When software plays a vital role in the design, construction, and operation, these systems are referred as software-intensive systems. Self-adaptation equips a software-intensive system with a feedback loop that either automates tasks that otherwise need to be performed by human operators or deals with uncertain conditions. Such feedback loops have found their way to a variety of practical applications; typical examples are an elastic cloud to adapt computing resources and automated server management to respond quickly to business needs. To gain insight into the motivations for applying self-adaptation in practice, the problems solved using self-adaptation and how these problems are solved, and the difficulties and risks that industry faces in adopting self-adaptation, we performed a large-scale survey. We received 184 valid responses from practitioners spread over 21 countries. Based on the analysis of the survey data, we provide an empirically grounded overview of state-of-the-practice in the application of self-adaptation. From that, we derive insights for researchers to check their current research with industrial needs, and for practitioners to compare their current practice in applying self-adaptation. These insights also provide opportunities for the application of self-adaptation in practice and pave the way for future industry-research collaborations.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Guidelines for Artifacts to Support Industry-Relevant Research on Self-Adaptation
Authors:
Danny Weyns,
Ilias Gerostathopoulos,
Barbora Buhnova,
Nicolas Cardozo,
Emilia Cioroaica,
Ivana Dusparic,
Lars Grunske,
Pooyan Jamshidi,
Christine Julien,
Judith Michael,
Gabriel Moreno,
Shiva Nejati,
Patrizio Pelliccione,
Federico Quin,
Genaina Rodrigues,
Bradley Schmerl,
Marco Vieira,
Thomas Vogel,
Rebekka Wohlrab
Abstract:
Artifacts support evaluating new research results and help comparing them with the state of the art in a field of interest. Over the past years, several artifacts have been introduced to support research in the field of self-adaptive systems. While these artifacts have shown their value, it is not clear to what extent these artifacts support research on problems in self-adaptation that are relevan…
▽ More
Artifacts support evaluating new research results and help comparing them with the state of the art in a field of interest. Over the past years, several artifacts have been introduced to support research in the field of self-adaptive systems. While these artifacts have shown their value, it is not clear to what extent these artifacts support research on problems in self-adaptation that are relevant to industry. This paper provides a set of guidelines for artifacts that aim at supporting industry-relevant research on self-adaptation. The guidelines that are grounded on data obtained from a survey with practitioners were derived during working sessions at the 17th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. Artifact providers can use the guidelines for aligning future artifacts with industry needs; they can also be used to evaluate the industrial relevance of existing artifacts. We also propose an artifact template.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Preliminary Results of a Survey on the Use of Self-Adaptation in Industry
Authors:
Danny Weyns,
Ilias Gerostathopoulos,
Nadeem Abbas,
Jesper Andersson,
Stefan Biffl,
Premek Brada,
Tomas Bures,
Amleto Di Salle,
Patricia Lago,
Angelika Musil,
Juergen Musil,
Patrizio Pelliccione
Abstract:
Self-adaptation equips a software system with a feedback loop that automates tasks that otherwise need to be performed by operators. Such feedback loops have found their way to a variety of practical applications, one typical example is an elastic cloud. Yet, the state of the practice in self-adaptation is currently not clear. To get insights into the use of self-adaptation in practice, we are run…
▽ More
Self-adaptation equips a software system with a feedback loop that automates tasks that otherwise need to be performed by operators. Such feedback loops have found their way to a variety of practical applications, one typical example is an elastic cloud. Yet, the state of the practice in self-adaptation is currently not clear. To get insights into the use of self-adaptation in practice, we are running a large-scale survey with industry. This paper reports preliminary results based on survey data that we obtained from 113 practitioners spread over 16 countries, 62 of them work with concrete self-adaptive systems. We highlight the main insights obtained so far: motivations for self-adaptation, concrete use cases, and difficulties encountered when applying self-adaptation in practice. We conclude the paper with outlining our plans for the remainder of the study.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
Forming Ensembles at Runtime: A Machine Learning Approach
Authors:
Tomáš Bureš,
Ilias Gerostathopoulos,
Petr Hnětynka,
Jan Pacovský
Abstract:
Smart system applications (SSAs) built on top of cyber-physical and socio-technical systems are increasingly composed of components that can work both autonomously and by cooperating with each other. Cooperating robots, fleets of cars and fleets of drones, emergency coordination systems are examples of SSAs. One approach to enable cooperation of SSAs is to form dynamic cooperation groups-ensembles…
▽ More
Smart system applications (SSAs) built on top of cyber-physical and socio-technical systems are increasingly composed of components that can work both autonomously and by cooperating with each other. Cooperating robots, fleets of cars and fleets of drones, emergency coordination systems are examples of SSAs. One approach to enable cooperation of SSAs is to form dynamic cooperation groups-ensembles-between components at runtime. Ensembles can be formed based on predefined rules that determine which components should be part of an ensemble based on their current state and the state of the environment (e.g., "group together 3 robots that are closer to the obstacle, their battery is sufficient and they would not be better used in another ensemble"). This is a computationally hard problem since all components are potential members of all possible ensembles at runtime. In our experience working with ensembles in several case studies the past years, using constraint programming to decide which ensembles should be formed does not scale for more than a limited number of components and ensembles. Also, the strict formulation in terms of hard/soft constraints does not easily permit for runtime self-adaptation via learning. This poses a serious limitation to the use of ensembles in large-scale and partially uncertain SSAs. To tackle this problem, in this paper we propose to recast the ensemble formation problem as a classification problem and use machine learning to efficiently form ensembles at scale.
△ Less
Submitted 30 April, 2021;
originally announced April 2021.
-
How do we Evaluate Self-adaptive Software Systems?
Authors:
Ilias Gerostathopoulos,
Thomas Vogel,
Danny Weyns,
Patricia Lago
Abstract:
With the increase of research in self-adaptive systems, there is a need to better understand the way research contributions are evaluated. Such insights will support researchers to better compare new findings when develo** new knowledge for the community. However, so far there is no clear overview of how evaluations are performed in self-adaptive systems. To address this gap, we conduct a mappin…
▽ More
With the increase of research in self-adaptive systems, there is a need to better understand the way research contributions are evaluated. Such insights will support researchers to better compare new findings when develo** new knowledge for the community. However, so far there is no clear overview of how evaluations are performed in self-adaptive systems. To address this gap, we conduct a map** study. The study focuses on experimental evaluations published in the last decade at the prime venue of research in software engineering for self-adaptive systems -- the International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS). Results point out that specifics of self-adaptive systems require special attention in the experimental process, including the distinction of the managing system (i.e., the target of evaluation) and the managed system, the presence of uncertainties that affect the system behavior and hence need to be taken into account in data analysis, and the potential of managed systems to be reused across experiments, beyond replications. To conclude, we offer a set of suggestions derived from our study that can be used as input to enhance future experiments in self-adaptive systems.
△ Less
Submitted 21 March, 2021;
originally announced March 2021.
-
Characterizing Technical Debt and Antipatterns in AI-Based Systems: A Systematic Map** Study
Authors:
Justus Bogner,
Roberto Verdecchia,
Ilias Gerostathopoulos
Abstract:
Background: With the rising popularity of Artificial Intelligence (AI), there is a growing need to build large and complex AI-based systems in a cost-effective and manageable way. Like with traditional software, Technical Debt (TD) will emerge naturally over time in these systems, therefore leading to challenges and risks if not managed appropriately. The influence of data science and the stochast…
▽ More
Background: With the rising popularity of Artificial Intelligence (AI), there is a growing need to build large and complex AI-based systems in a cost-effective and manageable way. Like with traditional software, Technical Debt (TD) will emerge naturally over time in these systems, therefore leading to challenges and risks if not managed appropriately. The influence of data science and the stochastic nature of AI-based systems may also lead to new types of TD or antipatterns, which are not yet fully understood by researchers and practitioners. Objective: The goal of our study is to provide a clear overview and characterization of the types of TD (both established and new ones) that appear in AI-based systems, as well as the antipatterns and related solutions that have been proposed. Method: Following the process of a systematic map** study, 21 primary studies are identified and analyzed. Results: Our results show that (i) established TD types, variations of them, and four new TD types (data, model, configuration, and ethics debt) are present in AI-based systems, (ii) 72 antipatterns are discussed in the literature, the majority related to data and model deficiencies, and (iii) 46 solutions have been proposed, either to address specific TD types, antipatterns, or TD in general. Conclusions: Our results can support AI professionals with reasoning about and communicating aspects of TD present in their systems. Additionally, they can serve as a foundation for future research to further our understanding of TD in AI-based systems.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
MEDAL: An AI-driven Data Fabric Concept for Elastic Cloud-to-Edge Intelligence
Authors:
Vasileios Theodorou,
Ilias Gerostathopoulos,
Iyad Alshabani,
Alberto Abello,
David Breitgand
Abstract:
Current Cloud solutions for Edge Computing are inefficient for data-centric applications, as they focus on the IaaS/PaaS level and they miss the data modeling and operations perspective. Consequently, Edge Computing opportunities are lost due to cumbersome and data assets-agnostic processes for end-to-end deployment over the Cloud-to-Edge continuum. In this paper, we introduce MEDAL, an intelligen…
▽ More
Current Cloud solutions for Edge Computing are inefficient for data-centric applications, as they focus on the IaaS/PaaS level and they miss the data modeling and operations perspective. Consequently, Edge Computing opportunities are lost due to cumbersome and data assets-agnostic processes for end-to-end deployment over the Cloud-to-Edge continuum. In this paper, we introduce MEDAL, an intelligent Cloud-to-Edge Data Fabric to support Data Operations (DataOps)across the continuum and to automate management and orchestration operations over a combined view of the data and the resource layer. MEDAL facilitates building and managing data workflows on top of existing flexible and composable data services, seamlessly exploiting and federating IaaS/PaaS/SaaS resources across different Cloud and Edge environments. We describe the MEDAL Platform as a usable tool for Data Scientists and Engineers, encompassing our concept and we illustrate its application though a connected cars use case.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Managing Latency in Edge-Cloud Environment
Authors:
Lubomír Bulej,
Tomáš Bureš,
Adam Filandr,
Petr Hnětynka,
Iveta Hnětynkova,
Jan Pacovský,
Gabor Sandor,
Ilias Gerostathopoulos
Abstract:
Modern Cyber-physical Systems (CPS) include applications like smart traffic, smart agriculture, smart power grid, etc. Commonly, these systems are distributed and composed of end-user applications and microservices that typically run in the cloud. The connection with the physical world, which is inherent to CPS, brings the need to operate and respond in real-time. As the cloud becomes part of the…
▽ More
Modern Cyber-physical Systems (CPS) include applications like smart traffic, smart agriculture, smart power grid, etc. Commonly, these systems are distributed and composed of end-user applications and microservices that typically run in the cloud. The connection with the physical world, which is inherent to CPS, brings the need to operate and respond in real-time. As the cloud becomes part of the computation loop, the real-time requirements have to be also reflected by the cloud. In this paper, we present an approach that provides soft real-time guarantees on the response time of services running in cloud and edge-cloud (i.e., cloud geographically close to the end-user), where these services are developed in high-level programming languages. In particular, we elaborate a method that allows us to predict the upper bound of the response time of a service when sharing the same computer with other services. Importantly, as our approach focuses on minimizing the impact on the developer of such services, it does not require any special programming model nor limits usage of common libraries, etc.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Decentralized Optimization of Vehicle Route Planning -- A Cross-City Comparative Study
Authors:
Brionna Davis,
Grace Jennings,
Taylor Pothast,
Ilias Gerostathopoulos,
Evangelos Pournaras,
Raphael E. Stern
Abstract:
New mobility concepts are at the forefront of research and innovation in smart cities. The introduction of connected and autonomous vehicles enables new possibilities in vehicle routing. Specifically, knowing the origin and destination of each agent in the network can allow for real-time routing of the vehicles to optimize network performance. However, this relies on individual vehicles being "alt…
▽ More
New mobility concepts are at the forefront of research and innovation in smart cities. The introduction of connected and autonomous vehicles enables new possibilities in vehicle routing. Specifically, knowing the origin and destination of each agent in the network can allow for real-time routing of the vehicles to optimize network performance. However, this relies on individual vehicles being "altruistic" i.e., being willing to accept an alternative non-preferred route in order to achieve a network-level performance goal. In this work, we conduct a study to compare different levels of agent altruism and the resulting effect on the network-level traffic performance. Specifically, this study compares the effects of different underlying urban structures on the overall network performance, and investigates which characteristics of the network make it possible to realize routing improvements using a decentralized optimization router. The main finding is that, with increased vehicle altruism, it is possible to balance traffic flow among the links of the network. We show evidence that the decentralized optimization router is more effective with networks of high load while we study the influence of cities characteristics, in particular: networks with a higher number of nodes (intersections) or edges (roads) per unit area allow for more possible alternate routes, and thus higher potential to improve network performance.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
Engineering for a Science-Centric Experimentation Platform
Authors:
Nikos Diamantopoulos,
Jeffrey Wong,
David Issa Mattos,
Ilias Gerostathopoulos,
Matthew Wardrop,
Tobias Mao,
Colin McFarland
Abstract:
Netflix is an internet entertainment service that routinely employs experimentation to guide strategy around product innovations. As Netflix grew, it had the opportunity to explore increasingly specialized improvements to its service, which generated demand for deeper analyses supported by richer metrics and powered by more diverse statistical methodologies. To facilitate this, and more fully harn…
▽ More
Netflix is an internet entertainment service that routinely employs experimentation to guide strategy around product innovations. As Netflix grew, it had the opportunity to explore increasingly specialized improvements to its service, which generated demand for deeper analyses supported by richer metrics and powered by more diverse statistical methodologies. To facilitate this, and more fully harness the skill sets of both engineering and data science, Netflix engineers created a science-centric experimentation platform that leverages the expertise of data scientists from a wide range of backgrounds by allowing them to make direct code contributions in the languages used by scientists (Python and R). Moreover, the same code that runs in production is able to be run locally, making it straightforward to explore and graduate both metrics and causal inference methodologies directly into production services.
In this paper, we utilize a case-study research method to provide two main contributions. Firstly, we report on the architecture of this platform, with a special emphasis on its novel aspects: how it supports science-centric end-to-end workflows without compromising engineering requirements. Secondly, we describe its approach to causal inference, which leverages the potential outcomes conceptual framework to provide a unified abstraction layer for arbitrary statistical models and methodologies.
△ Less
Submitted 9 October, 2019;
originally announced October 2019.
-
Planning as Optimization: Dynamically Discovering Optimal Configurations for Runtime Situations
Authors:
Erik M. Fredericks,
Ilias Gerostathopoulos,
Christian Krupitzer,
Thomas Vogel
Abstract:
The large number of possible configurations of modern software-based systems, combined with the large number of possible environmental situations of such systems, prohibits enumerating all adaptation options at design time and necessitates planning at run time to dynamically identify an appropriate configuration for a situation. While numerous planning techniques exist, they typically assume a det…
▽ More
The large number of possible configurations of modern software-based systems, combined with the large number of possible environmental situations of such systems, prohibits enumerating all adaptation options at design time and necessitates planning at run time to dynamically identify an appropriate configuration for a situation. While numerous planning techniques exist, they typically assume a detailed state-based model of the system and that the situations that warrant adaptations are known. Both of these assumptions can be violated in complex, real-world systems. As a result, adaptation planning must rely on simple models that capture what can be changed (input parameters) and observed in the system and environment (output and context parameters). We therefore propose planning as optimization: the use of optimization strategies to discover optimal system configurations at runtime for each distinct situation that is also dynamically identified at runtime. We apply our approach to CrowdNav, an open-source traffic routing system with the characteristics of a real-world system. We identify situations via clustering and conduct an empirical study that compares Bayesian optimization and two types of evolutionary optimization (NSGA-II and novelty search) in CrowdNav.
△ Less
Submitted 3 May, 2019;
originally announced May 2019.