-
Mutiny! How does Kubernetes fail, and what can we do about it?
Authors:
Marco Barletta,
Marcello Cinque,
Catello Di Martino,
Zbigniew T. Kalbarczyk,
Ravishankar K. Iyer
Abstract:
In this paper, we i) analyze and classify real-world failures of Kubernetes (the most popular container orchestration system), ii) develop a framework to perform a fault/error injection campaign targeting the data store preserving the cluster state, and iii) compare results of our fault/error injection experiments with real-world failures, showing that our fault/error injections can recreate many…
▽ More
In this paper, we i) analyze and classify real-world failures of Kubernetes (the most popular container orchestration system), ii) develop a framework to perform a fault/error injection campaign targeting the data store preserving the cluster state, and iii) compare results of our fault/error injection experiments with real-world failures, showing that our fault/error injections can recreate many real-world failure patterns. The paper aims to address the lack of studies on systematic analyses of Kubernetes failures to date.
Our results show that even a single fault/error (e.g., a bit-flip) in the data stored can propagate, causing cluster-wide failures (3% of injections), service networking issues (4%), and service under/overprovisioning (24%). Errors in the fields tracking dependencies between object caused 51% of such cluster-wide failures. We argue that controlled fault/error injection-based testing should be employed to proactively assess Kubernetes' resiliency and guide the design of failure mitigation strategies.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Multicore DRAM Bank-& Row-Conflict Bomb for Timing Attacks in Mixed-Criticality Systems
Authors:
Antonio Savino,
Gautam Gala,
Marcello Cinque,
Gerhard Fohler
Abstract:
With the increasing use of multicore platforms to realize mixed-criticality systems, understanding the underlying shared resources, such as the memory hierarchy shared among cores, and achieving isolation between co-executing tasks running on the same platform with different criticality levels becomes relevant. In addition to safety considerations, a malicious entity can exploit shared resources t…
▽ More
With the increasing use of multicore platforms to realize mixed-criticality systems, understanding the underlying shared resources, such as the memory hierarchy shared among cores, and achieving isolation between co-executing tasks running on the same platform with different criticality levels becomes relevant. In addition to safety considerations, a malicious entity can exploit shared resources to create timing attacks on critical applications. In this paper, we focus on understanding the shared DRAM dual in-line memory module and created a timing attack, that we named the "bank & row conflict bomb", to target a victim task in a multicore platform. We also created a "navigate" algorithm to understand how victim requests are managed by the Memory Controller and provide valuable inputs for designing the bank & row conflict bomb. We performed experimental tests on a 2nd Gen Intel Xeon Processor with an 8GB DDR4-2666 DRAM module to show that such an attack can produce a significant increase in the execution time of the victim task by about 150%, motivating the need for proper countermeasures to help ensure the safety and security of critical applications.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Orchestrating Mixed-Criticality Cloud Workloads in Reconfigurable Manufacturing Systems
Authors:
Marco Barletta,
Marcello Cinque,
Davide De Vita
Abstract:
The adoption of cloud computing technologies in the industry is paving the way to new manufacturing paradigms. In this paper we propose a model to optimize the orchestration of workloads with differentiated criticality levels on a cloud-enabled factory floor. Preliminary results show that it is possible to optimize the guarantees to deployed jobs without penalizing the number of schedulable jobs.…
▽ More
The adoption of cloud computing technologies in the industry is paving the way to new manufacturing paradigms. In this paper we propose a model to optimize the orchestration of workloads with differentiated criticality levels on a cloud-enabled factory floor. Preliminary results show that it is possible to optimize the guarantees to deployed jobs without penalizing the number of schedulable jobs. We indicate future research paths to quantitatively evaluate job isolation.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
IRIS: a Record and Replay Framework to Enable Hardware-assisted Virtualization Fuzzing
Authors:
Carmine Cesarano,
Marcello Cinque,
Domenico Cotroneo,
Luigi De Simone,
Giorgio Farina
Abstract:
Nowadays, industries are looking into virtualization as an effective means to build safe applications, thanks to the isolation it can provide among virtual machines (VMs) running on the same hardware. In this context, a fundamental issue is understanding to what extent the isolation is guaranteed, despite possible (or induced) problems in the virtualization mechanisms. Uncovering such isolation is…
▽ More
Nowadays, industries are looking into virtualization as an effective means to build safe applications, thanks to the isolation it can provide among virtual machines (VMs) running on the same hardware. In this context, a fundamental issue is understanding to what extent the isolation is guaranteed, despite possible (or induced) problems in the virtualization mechanisms. Uncovering such isolation issues is still an open challenge, especially for hardware-assisted virtualization, since the search space should include all the possible VM states (and the linked hypervisor state), which is prohibitive. In this paper, we propose IRIS, a framework to record (learn) sequences of inputs (i.e., VM seeds) from the real guest execution (e.g., OS boot), replay them as-is to reach valid and complex VM states, and finally use them as valid seed to be mutated for enabling fuzzing solutions for hardware-assisted hypervisors. We demonstrate the accuracy and efficiency of IRIS in automatically reproducing valid VM behaviors, with no need to execute guest workloads. We also provide a proof-of-concept fuzzer, based on the proposed architecture, showing its potential on the Xen hypervisor.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Pooling Strategies for Simplicial Convolutional Networks
Authors:
Domenico Mattia Cinque,
Claudio Battiloro,
Paolo Di Lorenzo
Abstract:
The goal of this paper is to introduce pooling strategies for simplicial convolutional neural networks. Inspired by graph pooling methods, we introduce a general formulation for a simplicial pooling layer that performs: i) local aggregation of simplicial signals; ii) principled selection of sampling sets; iii) downsampling and simplicial topology adaptation. The general layer is then customized to…
▽ More
The goal of this paper is to introduce pooling strategies for simplicial convolutional neural networks. Inspired by graph pooling methods, we introduce a general formulation for a simplicial pooling layer that performs: i) local aggregation of simplicial signals; ii) principled selection of sampling sets; iii) downsampling and simplicial topology adaptation. The general layer is then customized to design four different pooling strategies (i.e., max, top-k, self-attention, and separated top-k) grounded in the theory of topological signal processing. Also, we leverage the proposed layers in a hierarchical architecture that reduce complexity while representing data at different resolutions. Numerical results on real data benchmarks (i.e., flow and graph classification) illustrate the advantage of the proposed methods with respect to the state of the art.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
RunPHI: Enabling Mixed-criticality Containers via Partitioning Hypervisors in Industry 4.0
Authors:
Marco Barletta,
Marcello Cinque,
Luigi De Simone,
Raffaele Della Corte,
Giorgio Farina,
Daniele Ottaviano
Abstract:
Orchestration systems are becoming a key component to automatically manage distributed computing resources in many fields with criticality requirements like Industry 4.0 (I4.0). However, they are mainly linked to OS-level virtualization, which is known to suffer from reduced isolation. In this paper, we propose RunPHI with the aim of integrating partitioning hypervisors, as a solution for assuring…
▽ More
Orchestration systems are becoming a key component to automatically manage distributed computing resources in many fields with criticality requirements like Industry 4.0 (I4.0). However, they are mainly linked to OS-level virtualization, which is known to suffer from reduced isolation. In this paper, we propose RunPHI with the aim of integrating partitioning hypervisors, as a solution for assuring strong isolation, with OS-level orchestration systems. The purpose is to enable container orchestration in mixed-criticality systems with isolation requirements through partitioned containers.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
Assessing Intel's Memory Bandwidth Allocation for resource limitation in real-time systems
Authors:
Giorgio Farina,
Gautam Gala,
Marcello Cinque,
Gerhard Fohler
Abstract:
Industries are recently considering the adoption of cloud computing for hosting safety critical applications. However, the use of multicore processors usually adopted in the cloud introduces temporal anomalies due to contention for shared resources, such as the memory subsystem. In this paper we explore the potential of Intel's Memory Bandwidth Allocation (MBA) technology, available on Xeon Scalab…
▽ More
Industries are recently considering the adoption of cloud computing for hosting safety critical applications. However, the use of multicore processors usually adopted in the cloud introduces temporal anomalies due to contention for shared resources, such as the memory subsystem. In this paper we explore the potential of Intel's Memory Bandwidth Allocation (MBA) technology, available on Xeon Scalable processors. By adopting a systematic measurement approach on real hardware, we assess the indirect memory bandwidth limitation achievable by applying MBA delays, showing that only given delay values (namely 70, 80 and 90) are effective in our setting. We also test the derived bandwidth assured to a hypothetical critical core when interfering cores (e.g., generating a concurrent memory access workload) are present on the same machine. Our results can support designers by providing understanding of impact of the shared memory to enable predictable progress of safety critical applications in cloud environments.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
Introducing k4.0s: a Model for Mixed-Criticality Container Orchestration in Industry 4.0 (extended)
Authors:
Marco Barletta,
Marcello Cinque,
Luigi De Simone,
Raffaele Della Corte
Abstract:
Time predictable edge cloud is seen as the answer for many arising needs in Industry 4.0 environments, since it is able to provide flexible, modular, and reconfigurable services with low latency and reduced costs. Orchestration systems are becoming the core component of clouds since they take decisions on the placement and lifecycle of software components. Current solutions start introducing real-…
▽ More
Time predictable edge cloud is seen as the answer for many arising needs in Industry 4.0 environments, since it is able to provide flexible, modular, and reconfigurable services with low latency and reduced costs. Orchestration systems are becoming the core component of clouds since they take decisions on the placement and lifecycle of software components. Current solutions start introducing real-time containers support for time predictability; however, these approaches lack of determinism as well as support for workloads requiring multiple levels of assurance/criticality.
In this paper, we present k4.0s, an orchestration model for real-time and mixed-criticality environments, which includes timeliness, criticality and network requirements. The model leverages new abstractions for both node and jobs, e.g., node assurance, and requires novel monitoring strategies. We sketch an implementation of the proposal based on Kubernetes, and present an experimentation motivating the need for node assurance levels and adequate monitoring.
△ Less
Submitted 4 January, 2023; v1 submitted 27 May, 2022;
originally announced May 2022.
-
Certify the Uncertified: Towards Assessment of Virtualization for Mixed-criticality in the Automotive Domain
Authors:
Marcello Cinque,
Luigi De Simone,
Andrea Marchetta
Abstract:
Nowadays, a feature-rich automotive vehicle offers several technologies to assist the driver during his trip and guarantee an amusing infotainment system to the other passengers, too. Consolidating worlds at different criticalities is a welcomed challenge for car manufacturers that have recently tried to leverage virtualization technologies due to reduced maintenance, deployment, and ship** cost…
▽ More
Nowadays, a feature-rich automotive vehicle offers several technologies to assist the driver during his trip and guarantee an amusing infotainment system to the other passengers, too. Consolidating worlds at different criticalities is a welcomed challenge for car manufacturers that have recently tried to leverage virtualization technologies due to reduced maintenance, deployment, and ship** costs. For this reason, more and more mixed-criticality systems are emerging, trying to assure compliance with the ISO 26262 Road Vehicle Safety standard. In this short paper, we provide a preliminary investigation of the certification capabilities for Jailhouse, a popular open-source partitioning hypervisor. To this aim, we propose a testing methodology and showcase the results, pointing out when the software gets to a faulting state, deviating from its expected behavior. The ultimate goal is to picture the right direction for the hypervisor towards a potential certification process.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Virtualization over Multiprocessor System-on-Chip: an Enabling Paradigm for Industrial IoT
Authors:
Alessandro Cilardo,
Marcello Cinque,
Luigi De Simone,
Nicola Mazzocca
Abstract:
The next-generation Industrial Internet of Things (IIoT) inherently requires smart devices featuring rich connectivity, local intelligence, and autonomous behavior. Emerging Multiprocessor System-on-Chip (MPSoC) platforms along with comprehensive support for virtualization will represent two key building blocks for smart devices in future IIoT edge infrastructures. We review representative existin…
▽ More
The next-generation Industrial Internet of Things (IIoT) inherently requires smart devices featuring rich connectivity, local intelligence, and autonomous behavior. Emerging Multiprocessor System-on-Chip (MPSoC) platforms along with comprehensive support for virtualization will represent two key building blocks for smart devices in future IIoT edge infrastructures. We review representative existing solutions, highlighting the aspects that are most relevant for integration in IIoT solutions. From the analysis, we derive a reference architecture for a general virtualization-ready edge IIoT node. We then analyze the implications and benefits for a concrete use case scenario and identify the crucial research challenges to be faced to bridge the gap towards full support for virtualization-ready IIoT nodes
△ Less
Submitted 31 December, 2021;
originally announced December 2021.
-
Virtualizing Mixed-Criticality Systems: A Survey on Industrial Trends and Issues
Authors:
Marcello Cinque,
Domenico Cotroneo,
Luigi De Simone,
Stefano Rosiello
Abstract:
Virtualization is gaining attraction in the industry as it promises a flexible way to integrate, manage, and re-use heterogeneous software components with mixed-criticality levels, on a shared hardware platform, while obtaining isolation guarantees. This work surveys the state-of-the-practice of real-time virtualization technologies by discussing common issues in the industry. In particular, we an…
▽ More
Virtualization is gaining attraction in the industry as it promises a flexible way to integrate, manage, and re-use heterogeneous software components with mixed-criticality levels, on a shared hardware platform, while obtaining isolation guarantees. This work surveys the state-of-the-practice of real-time virtualization technologies by discussing common issues in the industry. In particular, we analyze how different virtualization approaches and solutions can impact isolation guarantees and testing/certification activities, and how they deal with dependability challenges. The aim is to highlight current industry trends and support industrial practitioners to choose the most suitable solution according to their application domains.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Fast Abstracts and Student Forum Proceedings, 17th European Dependable Computing Conference -- EDCC 2021
Authors:
Marcello Cinque,
Barbara Gallina
Abstract:
Collection of manuscripts accepted for presentation at the Student Forum and Fast Abstracts tracks of the 17th European Dependable Computing Conference (EDCC 2021).
Collection of manuscripts accepted for presentation at the Student Forum and Fast Abstracts tracks of the 17th European Dependable Computing Conference (EDCC 2021).
△ Less
Submitted 2 September, 2021; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Big Data in Critical Infrastructures Security Monitoring: Challenges and Opportunities
Authors:
L. Aniello,
A. Bondavalli,
A. Ceccarelli,
C. Ciccotelli,
M. Cinque,
F. Frattini,
A. Guzzo,
A. Pecchia,
A. Pugliese,
L. Querzoni,
S. Russo
Abstract:
Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infrastructures, are more and more vulnerable to cyber threats, due to the adoption of commodity computing facilities. Despite the use of several monitoring tools, recent attacks have proven that current defensive mechanisms for CIs are not effective enough against most advanced threats. In this paper we ex…
▽ More
Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infrastructures, are more and more vulnerable to cyber threats, due to the adoption of commodity computing facilities. Despite the use of several monitoring tools, recent attacks have proven that current defensive mechanisms for CIs are not effective enough against most advanced threats. In this paper we explore the idea of a framework leveraging multiple data sources to improve protection capabilities of CIs. Challenges and opportunities are discussed along three main research directions: i) use of distinct and heterogeneous data sources, ii) monitoring with adaptive granularity, and iii) attack modeling and runtime combination of multiple data analysis techniques.
△ Less
Submitted 7 May, 2014; v1 submitted 1 May, 2014;
originally announced May 2014.