Search | arXiv e-print repository

arXiv:2406.19670 [pdf, other]

doi 10.1145/3664646.3664759

Function+Data Flow: A Framework to Specify Machine Learning Pipelines for Digital Twinning

Authors: Eduardo de Conto, Blaise Genest, Arvind Easwaran

Abstract: The development of digital twins (DTs) for physical systems increasingly leverages artificial intelligence (AI), particularly for combining data from different sources or for creating computationally efficient, reduced-dimension models. Indeed, even in very different application domains, twinning employs common techniques such as model order reduction and modelization with hybrid data (that is, da… ▽ More The development of digital twins (DTs) for physical systems increasingly leverages artificial intelligence (AI), particularly for combining data from different sources or for creating computationally efficient, reduced-dimension models. Indeed, even in very different application domains, twinning employs common techniques such as model order reduction and modelization with hybrid data (that is, data sourced from both physics-based models and sensors). Despite this apparent generality, current development practices are ad-hoc, making the design of AI pipelines for digital twinning complex and time-consuming. Here we propose Function+Data Flow (FDF), a domain-specific language (DSL) to describe AI pipelines within DTs. FDF aims to facilitate the design and validation of digital twins. Specifically, FDF treats functions as first-class citizens, enabling effective manipulation of models learned with AI. We illustrate the benefits of FDF on two concrete use cases from different domains: predicting the plastic strain of a structure and modeling the electromagnetic behavior of a bearing. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 10 pages, 5 figures, to be published in AIware'24

arXiv:2403.13411 [pdf, other]

Optimal Fixed Priority Scheduling in Multi-Stage Multi-Resource Distributed Real-Time Systems

Authors: Niraj Kumar, Chuanchao Gao, Arvind Easwaran

Abstract: This work studies fixed priority (FP) scheduling of real-time jobs with end-to-end deadlines in a distributed system. Specifically, given a multi-stage pipeline with multiple heterogeneous resources of the same type at each stage, the problem is to assign priorities to a set of real-time jobs with different release times to access a resource at each stage of the pipeline subject to the end-to-end… ▽ More This work studies fixed priority (FP) scheduling of real-time jobs with end-to-end deadlines in a distributed system. Specifically, given a multi-stage pipeline with multiple heterogeneous resources of the same type at each stage, the problem is to assign priorities to a set of real-time jobs with different release times to access a resource at each stage of the pipeline subject to the end-to-end deadline constraints. Note, in such a system, jobs may compete with different sets of jobs at different stages of the pipeline depending on the job-to-resource map**. To this end, following are the two major contributions of this work. We show that an OPA-compatible schedulability test based on the delay composition algebra can be constructed, which we then use with an optimal priority assignment algorithm to compute a priority ordering. Further, we establish the versatility of pairwise priority assignment in such a multi-stage multi-resource system, compared to a total priority ordering. In particular, we show that a pairwise priority assignment may be feasible even if a priority ordering does not exist. We propose an integer linear programming formulation and a scalable heuristic to compute a pairwise priority assignment. We also show through simulation experiments that the proposed approaches can be used for the holistic scheduling of real-time jobs in edge computing systems. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted in DATE (Design, Automation and Test in Europe Conference) 2024

arXiv:2307.13419 [pdf, other]

Co-Design of Out-of-Distribution Detectors for Autonomous Emergency Braking Systems

Authors: Michael Yuhas, Arvind Easwaran

Abstract: Learning enabled components (LECs), while critical for decision making in autonomous vehicles (AVs), are likely to make incorrect decisions when presented with samples outside of their training distributions. Out-of-distribution (OOD) detectors have been proposed to detect such samples, thereby acting as a safety monitor, however, both OOD detectors and LECs require heavy utilization of embedded h… ▽ More Learning enabled components (LECs), while critical for decision making in autonomous vehicles (AVs), are likely to make incorrect decisions when presented with samples outside of their training distributions. Out-of-distribution (OOD) detectors have been proposed to detect such samples, thereby acting as a safety monitor, however, both OOD detectors and LECs require heavy utilization of embedded hardware typically found in AVs. For both components, there is a tradeoff between non-functional and functional performance, and both impact a vehicle's safety. For instance, giving an OOD detector a longer response time can increase its accuracy at the expense of the LEC. We consider an LEC with binary output like an autonomous emergency braking system (AEBS) and use risk, the combination of severity and occurrence of a failure, to model the effect of both components' design parameters on each other's functional and non-functional performance, as well as their impact on system safety. We formulate a co-design methodology that uses this risk model to find the design parameters for an OOD detector and LEC that decrease risk below that of the baseline system and demonstrate it on a vision based AEBS. Using our methodology, we achieve a 42.3% risk reduction while maintaining equivalent resource utilization. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: 8 pages, 6 figures, ITSC 2023

arXiv:2304.01592 [pdf, other]

PAC-Based Formal Verification for Out-of-Distribution Data Detection

Authors: Mohit Prashant, Arvind Easwaran

Abstract: Cyber-physical systems (CPS) like autonomous vehicles, that utilize learning components, are often sensitive to noise and out-of-distribution (OOD) instances encountered during runtime. As such, safety critical tasks depend upon OOD detection subsystems in order to restore the CPS to a known state or interrupt execution to prevent safety from being compromised. However, it is difficult to guarante… ▽ More Cyber-physical systems (CPS) like autonomous vehicles, that utilize learning components, are often sensitive to noise and out-of-distribution (OOD) instances encountered during runtime. As such, safety critical tasks depend upon OOD detection subsystems in order to restore the CPS to a known state or interrupt execution to prevent safety from being compromised. However, it is difficult to guarantee the performance of OOD detectors as it is difficult to characterize the OOD aspect of an instance, especially in high-dimensional unstructured data. To distinguish between OOD data and data known to the learning component through the training process, an emerging technique is to incorporate variational autoencoders (VAE) within systems and apply classification or anomaly detection techniques on their latent spaces. The rationale for doing so is the reduction of the data domain size through the encoding process, which benefits real-time systems through decreased processing requirements, facilitates feature analysis for unstructured data and allows more explainable techniques to be implemented. This study places probably approximately correct (PAC) based guarantees on OOD detection using the encoding process within VAEs to quantify image features and apply conformal constraints over them. This is used to bound the detection error on unfamiliar instances with user-defined confidence. The approach used in this study is to empirically establish these bounds by sampling the latent probability distribution and evaluating the error with respect to the constraint violations that are encountered. The guarantee is then verified using data generated from CARLA, an open-source driving simulator. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: 10 pages

arXiv:2211.11520 [pdf, other]

Demo Abstract: Real-Time Out-of-Distribution Detection on a Mobile Robot

Authors: Michael Yuhas, Arvind Easwaran

Abstract: In a cyber-physical system such as an autonomous vehicle (AV), machine learning (ML) models can be used to navigate and identify objects that may interfere with the vehicle's operation. However, ML models are unlikely to make accurate decisions when presented with data outside their training distribution. Out-of-distribution (OOD) detection can act as a safety monitor for ML models by identifying… ▽ More In a cyber-physical system such as an autonomous vehicle (AV), machine learning (ML) models can be used to navigate and identify objects that may interfere with the vehicle's operation. However, ML models are unlikely to make accurate decisions when presented with data outside their training distribution. Out-of-distribution (OOD) detection can act as a safety monitor for ML models by identifying such samples at run time. However, in safety critical systems like AVs, OOD detection needs to satisfy real-time constraints in addition to functional requirements. In this demonstration, we use a mobile robot as a surrogate for an AV and use an OOD detector to identify potentially hazardous samples. The robot navigates a miniature town using image data and a YOLO object detection network. We show that our OOD detector is capable of identifying OOD images in real-time on an embedded platform concurrently performing object detection and lane following. We also show that it can be used to successfully stop the vehicle in the presence of unknown, novel samples. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: 3 pages, 5 figures, RTSS 2022

arXiv:2210.09959 [pdf, other]

Out of Distribution Reasoning by Weakly-Supervised Disentangled Logic Variational Autoencoder

Authors: Zahra Rahiminasab, Michael Yuhas, Arvind Easwaran

Abstract: Out-of-distribution (OOD) detection, i.e., finding test samples derived from a different distribution than the training set, as well as reasoning about such samples (OOD reasoning), are necessary to ensure the safety of results generated by machine learning models. Recently there have been promising results for OOD detection in the latent space of variational autoencoders (VAEs). However, without… ▽ More Out-of-distribution (OOD) detection, i.e., finding test samples derived from a different distribution than the training set, as well as reasoning about such samples (OOD reasoning), are necessary to ensure the safety of results generated by machine learning models. Recently there have been promising results for OOD detection in the latent space of variational autoencoders (VAEs). However, without disentanglement, VAEs cannot perform OOD reasoning. Disentanglement ensures a one- to-many map** between generative factors of OOD (e.g., rain in image data) and the latent variables to which they are encoded. Although previous literature has focused on weakly-supervised disentanglement on simple datasets with known and independent generative factors. In practice, achieving full disentanglement through weak supervision is impossible for complex datasets, such as Carla, with unknown and abstract generative factors. As a result, we propose an OOD reasoning framework that learns a partially disentangled VAE to reason about complex datasets. Our framework consists of three steps: partitioning data based on observed generative factors, training a VAE as a logic tensor network that satisfies disentanglement rules, and run-time OOD reasoning. We evaluate our approach on the Carla dataset and compare the results against three state-of-the-art methods. We found that our framework outperformed these methods in terms of disentanglement and end-to-end OOD reasoning. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: Accepted in The 6th International Conference on System Reliability and Safety (ICSRS) 2022

arXiv:2208.10765 [pdf]

A Low-Cost Lane-Following Algorithm for Cyber-Physical Robots

Authors: Archit Gupta, Arvind Easwaran

Abstract: Duckiebots are low-cost mobile robots that are widely used in the fields of research and education. Although there are existing self-driving algorithms for the Duckietown platform, they are either too complex or perform too poorly to navigate a multi-lane track. Moreover, it is essential to give memory and computational resources to a Duckiebot so it can perform additional tasks such as out-of-dis… ▽ More Duckiebots are low-cost mobile robots that are widely used in the fields of research and education. Although there are existing self-driving algorithms for the Duckietown platform, they are either too complex or perform too poorly to navigate a multi-lane track. Moreover, it is essential to give memory and computational resources to a Duckiebot so it can perform additional tasks such as out-of-distribution input detection. In order to satisfy these constraints, we built a low-cost autonomous driving algorithm capable of driving on a two-lane track. The algorithm uses traditional computer vision techniques to identify the central lane on the track and obtain the relevant steering angle. The steering is then controlled by a PID controller that smoothens the movement of the Duckiebot. The performance of the algorithm was compared to that of the NeurIPS 2018 AI Driving Olympics (AIDO) finalists, and it outperformed all but one finalists. The two main contributions of our algorithm are its low computational requirements and very quick set-up, with ongoing efforts to make it more reliable. △ Less

Submitted 9 October, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

arXiv:2207.14694 [pdf, other]

Design Methodology for Deep Out-of-Distribution Detectors in Real-Time Cyber-Physical Systems

Authors: Michael Yuhas, Daniel Jun Xian Ng, Arvind Easwaran

Abstract: When machine learning (ML) models are supplied with data outside their training distribution, they are more likely to make inaccurate predictions; in a cyber-physical system (CPS), this could lead to catastrophic system failure. To mitigate this risk, an out-of-distribution (OOD) detector can run in parallel with an ML model and flag inputs that could lead to undesirable outcomes. Although OOD det… ▽ More When machine learning (ML) models are supplied with data outside their training distribution, they are more likely to make inaccurate predictions; in a cyber-physical system (CPS), this could lead to catastrophic system failure. To mitigate this risk, an out-of-distribution (OOD) detector can run in parallel with an ML model and flag inputs that could lead to undesirable outcomes. Although OOD detectors have been well studied in terms of accuracy, there has been less focus on deployment to resource constrained CPSs. In this study, a design methodology is proposed to tune deep OOD detectors to meet the accuracy and response time requirements of embedded applications. The methodology uses genetic algorithms to optimize the detector's preprocessing pipeline and selects a quantization method that balances robustness and response time. It also identifies several candidate task graphs under the Robot Operating System (ROS) for deployment of the selected design. The methodology is demonstrated on two variational autoencoder based OOD detectors from the literature on two embedded platforms. Insights into the trade-offs that occur during the design process are provided, and it is shown that this design methodology can lead to a drastic reduction in response time in relation to an unoptimized OOD detector while maintaining comparable accuracy. △ Less

Submitted 29 July, 2022; originally announced July 2022.

Comments: 6 pages, 10 figures

arXiv:2206.05950 [pdf, other]

doi 10.1109/GLOBECOM48099.2022.10001137

Deadline-constrained Multi-resource Task Map** and Allocation for Edge-Cloud Systems

Authors: Chuanchao Gao, Aryaman Shaan, Arvind Easwaran

Abstract: In an edge-cloud system, mobile devices can offload their computation intensive tasks to an edge or cloud server to guarantee the quality of service or satisfy task deadline requirements. However, it is challenging to determine where tasks should be offloaded and processed, and how much network and computation resources should be allocated to them, such that a system with limited resources can obt… ▽ More In an edge-cloud system, mobile devices can offload their computation intensive tasks to an edge or cloud server to guarantee the quality of service or satisfy task deadline requirements. However, it is challenging to determine where tasks should be offloaded and processed, and how much network and computation resources should be allocated to them, such that a system with limited resources can obtain a maximum profit while meeting the deadlines. A key challenge in this problem is that the network and computation resources could be allocated on different servers, since the server to which a task is offloaded (e.g., a server with an access point) may be different from the server on which the task is eventually processed. To address this challenge, we first formulate the task map** and resource allocation problem as a non-convex Mixed-Integer Nonlinear Programming (MINLP) problem, known as NP-hard. We then propose a zero-slack based greedy algorithm (ZSG) and a linear discretization method (LDM) to solve this MINLP problem. Experiment results with various synthetic tasksets show that ZSG has an average of $2.98\%$ worse performance than LDM with a minimum unit of 5 but has an average of $6.88\%$ better performance than LDM with a minimum unit of 15. △ Less

Submitted 7 March, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Journal ref: GLOBECOM 2022 - 2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 2022, pp. 5037-5043

arXiv:2108.11800 [pdf, other]

Efficient Out-of-Distribution Detection Using Latent Space of $β$-VAE for Cyber-Physical Systems

Authors: Shreyas Ramakrishna, Zahra Rahiminasab, Gabor Karsai, Arvind Easwaran, Abhishek Dubey

Abstract: Deep Neural Networks are actively being used in the design of autonomous Cyber-Physical Systems (CPSs). The advantage of these models is their ability to handle high-dimensional state-space and learn compact surrogate representations of the operational state spaces. However, the problem is that the sampled observations used for training the model may never cover the entire state space of the physi… ▽ More Deep Neural Networks are actively being used in the design of autonomous Cyber-Physical Systems (CPSs). The advantage of these models is their ability to handle high-dimensional state-space and learn compact surrogate representations of the operational state spaces. However, the problem is that the sampled observations used for training the model may never cover the entire state space of the physical environment, and as a result, the system will likely operate in conditions that do not belong to the training distribution. These conditions that do not belong to training distribution are referred to as Out-of-Distribution (OOD). Detecting OOD conditions at runtime is critical for the safety of CPS. In addition, it is also desirable to identify the context or the feature(s) that are the source of OOD to select an appropriate control action to mitigate the consequences that may arise because of the OOD condition. In this paper, we study this problem as a multi-labeled time series OOD detection problem over images, where the OOD is defined both sequentially across short time windows (change points) as well as across the training data distribution. A common approach to solving this problem is the use of multi-chained one-class classifiers. However, this approach is expensive for CPSs that have limited computational resources and require short inference times. Our contribution is an approach to design and train a single $β$-Variational Autoencoder detector with a partially disentangled latent space sensitive to variations in image features. We use the feature sensitive latent variables in the latent space to detect OOD images and identify the most likely feature(s) responsible for the OOD. We demonstrate our approach using an Autonomous Vehicle in the CARLA simulator and a real-world automotive dataset called nuImages. △ Less

Submitted 26 August, 2021; originally announced August 2021.

Comments: Paper accepted for ACM Transactions on Cyber-Physical Systems (2021)

arXiv:2107.11750 [pdf, other]

doi 10.1145/3477026

Improving Variational Autoencoder based Out-of-Distribution Detection for Embedded Real-time Applications

Authors: Yeli Feng, Daniel Jun Xian Ng, Arvind Easwaran

Abstract: Uncertainties in machine learning are a significant roadblock for its application in safety-critical cyber-physical systems (CPS). One source of uncertainty arises from distribution shifts in the input data between training and test scenarios. Detecting such distribution shifts in real-time is an emerging approach to address the challenge. The high dimensional input space in CPS applications invol… ▽ More Uncertainties in machine learning are a significant roadblock for its application in safety-critical cyber-physical systems (CPS). One source of uncertainty arises from distribution shifts in the input data between training and test scenarios. Detecting such distribution shifts in real-time is an emerging approach to address the challenge. The high dimensional input space in CPS applications involving imaging adds extra difficulty to the task. Generative learning models are widely adopted for the task, namely out-of-distribution (OoD) detection. To improve the state-of-the-art, we studied existing proposals from both machine learning and CPS fields. In the latter, safety monitoring in real-time for autonomous driving agents has been a focus. Exploiting the spatiotemporal correlation of motion in videos, we can robustly detect hazardous motion around autonomous driving agents. Inspired by the latest advances in the Variational Autoencoder (VAE) theory and practice, we tapped into the prior knowledge in data to further boost OoD detection's robustness. Comparison studies over nuScenes and Synthia data sets show our methods significantly improve detection capabilities of OoD factors unique to driving scenarios, 42% better than state-of-the-art approaches. Our model also generalized near-perfectly, 97% better than the state-of-the-art across the real-world and simulation driving data sets experimented. Finally, we customized one proposed method into a twin-encoder model that can be deployed to resource limited embedded devices for real-time OoD detection. Its execution time was reduced over four times in low-precision 8-bit integer inference, while detection capability is comparable to its corresponding floating-point model. △ Less

Submitted 30 July, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

Comments: This article appears as part of the ESWEEK-TECS special issue and will be presented in the International Conference on Embedded Software (EMSOFT), 2021

arXiv:2107.11736 [pdf, other]

doi 10.1145/3450267.3452000

WiP Abstract : Robust Out-of-distribution Motion Detection and Localization in Autonomous CPS

Authors: Yeli Feng, Arvind Easwaran

Abstract: Highly complex deep learning models are increasingly integrated into modern cyber-physical systems (CPS), many of which have strict safety requirements. One problem arising from this is that deep learning lacks interpretability, operating as a black box. The reliability of deep learning is heavily impacted by how well the model training data represents runtime test data, especially when the input… ▽ More Highly complex deep learning models are increasingly integrated into modern cyber-physical systems (CPS), many of which have strict safety requirements. One problem arising from this is that deep learning lacks interpretability, operating as a black box. The reliability of deep learning is heavily impacted by how well the model training data represents runtime test data, especially when the input space dimension is high as natural images. In response, we propose a robust out-of-distribution (OOD) detection framework. Our approach detects unusual movements from driving video in real-time by combining classical optic flow operation with representation learning via variational autoencoder (VAE). We also design a method to locate OOD factors in images. Evaluation on a driving simulation data set shows that our approach is statistically more robust than related works. △ Less

Submitted 25 July, 2021; originally announced July 2021.

arXiv:2106.15965 [pdf, other]

doi 10.1145/3445034.3460509

Embedded out-of-distribution detection on an autonomous robot platform

Authors: Michael Yuhas, Yeli Feng, Daniel Jun Xian Ng, Zahra Rahiminasab, Arvind Easwaran

Abstract: Machine learning (ML) is actively finding its way into modern cyber-physical systems (CPS), many of which are safety-critical real-time systems. It is well known that ML outputs are not reliable when testing data are novel with regards to model training and validation data, i.e., out-of-distribution (OOD) test data. We implement an unsupervised deep neural network-based OOD detector on a real-time… ▽ More Machine learning (ML) is actively finding its way into modern cyber-physical systems (CPS), many of which are safety-critical real-time systems. It is well known that ML outputs are not reliable when testing data are novel with regards to model training and validation data, i.e., out-of-distribution (OOD) test data. We implement an unsupervised deep neural network-based OOD detector on a real-time embedded autonomous Duckiebot and evaluate detection performance. Our OOD detector produces a success rate of 87.5% for emergency stop** a Duckiebot on a braking test bed we designed. We also provide case analysis on computing resource challenges specific to the Robot Operating System (ROS) middleware on the Duckiebot. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: 6 pages, 8 figures

Journal ref: Yuhas, M., Feng, Y., Ng, D. J. X., Rahiminasab, Z., & Easwaran, A. (2021, May). Embedded out-of-distribution detection on an autonomous robot platform. In Proceedings of the Workshop on Design Automation for CPS and IoT (pp. 13-18)

arXiv:2104.11474 [pdf, other]

Monitoring Cumulative Cost Properties

Authors: Omar Al-Bataineh, Daniel Jun Xian Ng, Arvind Easwaran

Abstract: This paper considers the problem of decentralized monitoring of a class of non-functional properties (NFPs) with quantitative operators, namely cumulative cost properties. The decentralized monitoring of NFPs can be a non-trivial task for several reasons: (i) they are typically expressed at a high abstraction level where inter-event dependencies are hidden, (ii) NFPs are difficult to be monitored… ▽ More This paper considers the problem of decentralized monitoring of a class of non-functional properties (NFPs) with quantitative operators, namely cumulative cost properties. The decentralized monitoring of NFPs can be a non-trivial task for several reasons: (i) they are typically expressed at a high abstraction level where inter-event dependencies are hidden, (ii) NFPs are difficult to be monitored in a decentralized way, and (iii) lack of effective decomposition techniques. We address these issues by providing a formal framework for decentralised monitoring of LTL formulas with quantitative operators. The presented framework employs the tableau construction and a formula unwinding technique (i.e., a transformation technique that preserves the semantics of the original formula) to split and distribute the input LTL formula and the corresponding quantitative constraint in a way such that monitoring can be performed in a decentralised manner. The employment of these techniques allows processes to detect early violations of monitored properties and perform some corrective or recovery actions. We demonstrate the effectiveness of the presented framework using a case study based on a Fischertechnik training model,a sorting line which sorts tokens based on their color into storage bins. The analysis of the case study shows the effectiveness of the presented framework not only in early detection of violations, but also in develo** failure recovery plans that can help to avoid serious impact of failures on the performance of the system. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Comments: 12 pages, 8 figures, 5 tables, accepted in FormaliSE 2021

arXiv:2102.03341 [pdf, other]

doi 10.1007/978-3-030-23703-5_2

Challenges in Digital Twin Development for Cyber-Physical Production Systems

Authors: Heejong Park, Arvind Easwaran, Sidharta Andalam

Abstract: The recent advancement of information and communication technology makes digitalisation of an entire manufacturing shop-floor possible where physical processes are tightly intertwined with their cyber counterparts. This led to an emergence of a concept of digital twin, which is a realistic virtual copy of a physical object. Digital twin will be the key technology in Cyber-Physical Production Syste… ▽ More The recent advancement of information and communication technology makes digitalisation of an entire manufacturing shop-floor possible where physical processes are tightly intertwined with their cyber counterparts. This led to an emergence of a concept of digital twin, which is a realistic virtual copy of a physical object. Digital twin will be the key technology in Cyber-Physical Production Systems (CPPS) and its market is expected to grow significantly in the coming years. Nevertheless, digital twin is still relatively a new concept that people have different perspectives on its requirements, capabilities, and limitations. To better understand an effect of digital twin's operations, mitigate complexity of capturing dynamics of physical phenomena, and improve analysis and predictability, it is important to have a development tool with a strong semantic foundation that can accurately model, simulate, and synthesise the digital twin. This paper reviews current state-of-art on tools and developments of digital twin in manufacturing and discusses potential design challenges. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Comments: This is a post-peer-review, pre-copyedit version of an article published in Cyber Physical Systems. Model-Based Design. The final authenticated version is available online at: http://dx.doi.org/10.1007/978-3-030-23703-5_2

Journal ref: Cyber Physical Systems. Model-Based Design (2018)

arXiv:2102.01928 [pdf, other]

doi 10.1016/j.sysarc.2021.102017

Online Cycle Detection for Models with Mode-Dependent Input and Output Dependencies

Authors: Heejong Park, Arvind Easwaran, Etienne Borde

Abstract: In the fields of co-simulation and component-based modelling, designers import models as building blocks to create a composite model that provides more complex functionalities. Modelling tools perform instantaneous cycle detection (ICD) on the composite models having feedback loops to reject the models if the loops are mathematically unsound and to improve simulation performance. In this case, the… ▽ More In the fields of co-simulation and component-based modelling, designers import models as building blocks to create a composite model that provides more complex functionalities. Modelling tools perform instantaneous cycle detection (ICD) on the composite models having feedback loops to reject the models if the loops are mathematically unsound and to improve simulation performance. In this case, the analysis relies heavily on the availability of dependency information from the imported models. However, the cycle detection problem becomes harder when the model's input to output dependencies are mode-dependent, i.e. changes for certain events generated internally or externally as inputs. The number of possible modes created by composing such models increases significantly and unknown factors such as environmental inputs make the offline (statical) ICD a difficult task. In this paper, an online ICD method is introduced to address this issue for the models used in cyber-physical systems. The method utilises an oracle as a central source of information that can answer whether the individual models can make mode transition without creating instantaneous cycles. The oracle utilises three types of data-structures created offline that are adaptively chosen during online (runtime) depending on the frequency as well as the number of models that make mode transitions. During the analysis, the models used online are stalled from running, resulting in the discrepancy with the physical system. The objective is to detect an absence of the instantaneous cycle while minimising the stall time of the model simulation that is induced from the analysis. The benchmark results show that our method is an adequate alternative to the offline analysis methods and significantly reduces the analysis time. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Comments: \c{opyright} 2021. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Journal ref: Journal of Systems Architecture (2021)

arXiv:2007.10141 [pdf, other]

PAC Model Checking of Black-Box Continuous-Time Dynamical Systems

Authors: Bai Xue, Miaomiao Zhang, Arvind Easwaran, Qin Li

Abstract: In this paper we present a novel model checking approach to finite-time safety verification of black-box continuous-time dynamical systems within the framework of probably approximately correct (PAC) learning. The black-box dynamical systems are the ones, for which no model is given but whose states changing continuously through time within a finite time interval can be observed at some discrete t… ▽ More In this paper we present a novel model checking approach to finite-time safety verification of black-box continuous-time dynamical systems within the framework of probably approximately correct (PAC) learning. The black-box dynamical systems are the ones, for which no model is given but whose states changing continuously through time within a finite time interval can be observed at some discrete time instants for a given input. The new model checking approach is termed as PAC model checking due to incorporation of learned models with correctness guarantees expressed using the terms error probability and confidence. Based on the error probability and confidence level, our approach provides statistically formal guarantees that the time-evolving trajectories of the black-box dynamical system over finite time horizons fall within the range of the learned model plus a bounded interval, contributing to insights on the reachability of the black-box system and thus on the satisfiability of its safety requirements. The learned model together with the bounded interval is obtained by scenario optimization, which boils down to a linear programming problem. Three examples demonstrate the performance of our approach. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: Accepted by IEEE TCAD. arXiv admin note: text overlap with arXiv:1207.1272 by other authors

arXiv:2006.15832 [pdf, ps, other]

Resilience Bounds of Network Clock Synchronization with Fault Correction

Authors: Linshan Jiang, Rui Tan, Arvind Easwaran

Abstract: The Internet of Things (IoT) will be a main data generation infrastructure for achieving better system intelligence. This paper considers the design and implementation of a practical privacy-preserving collaborative learning scheme, in which a curious learning coordinator trains a better machine learning model based on the data samples contributed by a number of IoT objects, while the confidential… ▽ More The Internet of Things (IoT) will be a main data generation infrastructure for achieving better system intelligence. This paper considers the design and implementation of a practical privacy-preserving collaborative learning scheme, in which a curious learning coordinator trains a better machine learning model based on the data samples contributed by a number of IoT objects, while the confidentiality of the raw forms of the training data is protected against the coordinator. Existing distributed machine learning and data encryption approaches incur significant computation and communication overhead, rendering them ill-suited for resource-constrained IoT objects. We study an approach that applies independent random projection at each IoT object to obfuscate data and trains a deep neural network at the coordinator based on the projected data from the IoT objects. This approach introduces light computation overhead to the IoT objects and moves most workload to the coordinator that can have sufficient computing resources. Although the independent projections performed by the IoT objects address the potential collusion between the curious coordinator and some compromised IoT objects, they significantly increase the complexity of the projected data. In this paper, we leverage the superior learning capability of deep learning in capturing sophisticated patterns to maintain good learning performance. Extensive comparative evaluation shows that this approach outperforms other lightweight approaches that apply additive noisification for differential privacy and/or support vector machines for learning in the applications with light to moderate data pattern complexities. △ Less

Submitted 29 June, 2020; originally announced June 2020.

arXiv:2004.14804 [pdf, other]

Real-Time Energy Monitoring in IoT-enabled Mobile Devices

Authors: Nitin Shivaraman, Seima Saki, Zhiwei Liu, Saravanan Ramanathan, Arvind Easwaran, Sebastian Steinhorst

Abstract: With rapid advancements in the Internet of Things (IoT) paradigm, electrical devices in the near future is expected to have IoT capabilities. This enables fine-grained tracking of individual energy consumption data of such devices, offering location-independent per-device billing. Thus, it is more fine-grained than the location-based metering of state-of-the-art infrastructure, which traditionally… ▽ More With rapid advancements in the Internet of Things (IoT) paradigm, electrical devices in the near future is expected to have IoT capabilities. This enables fine-grained tracking of individual energy consumption data of such devices, offering location-independent per-device billing. Thus, it is more fine-grained than the location-based metering of state-of-the-art infrastructure, which traditionally aggregates on a building or household level, defining the entity to be billed. However, such in-device energy metering is susceptible to manipulation and fraud. As a remedy, we propose a decentralized metering architecture that enables devices with IoT capabilities to measure their own energy consumption. In this architecture, the device-level consumption is additionally reported to a system-level aggregator that verifies distributed information and provides secure data storage using Blockchain, preventing data manipulation by untrusted entities. Using evaluations on an experimental testbed, we show that the proposed architecture supports device mobility and enables location-independent monitoring of energy consumption. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: 4 pages, 6 figures, accepted DATE 2020 conference

arXiv:2004.14559 [pdf, ps, other]

A Survey on Time-Sensitive Resource Allocation in the Cloud Continuum

Authors: Saravanan Ramanathan, Nitin Shivaraman, Seima Suryasekaran, Arvind Easwaran, Etienne Borde, Sebastian Steinhorst

Abstract: Artificial Intelligence (AI) and Internet of Things (IoT) applications are rapidly growing in today's world where they are continuously connected to the internet and process, store and exchange information among the devices and the environment. The cloud and edge platform is very crucial to these applications due to their inherent compute-intensive and resource-constrained nature. One of the forem… ▽ More Artificial Intelligence (AI) and Internet of Things (IoT) applications are rapidly growing in today's world where they are continuously connected to the internet and process, store and exchange information among the devices and the environment. The cloud and edge platform is very crucial to these applications due to their inherent compute-intensive and resource-constrained nature. One of the foremost challenges in cloud and edge resource allocation is the efficient management of computation and communication resources to meet the performance and latency guarantees of the applications. The heterogeneity of cloud resources (processors, memory, storage, bandwidth), variable cost structure and unpredictable workload patterns make the design of resource allocation techniques complex. Numerous research studies have been carried out to address this intricate problem. In this paper, the current state-of-the-art resource allocation techniques for the cloud continuum, in particular those that consider time-sensitive applications, are reviewed. Furthermore, we present the key challenges in the resource allocation problem for the cloud continuum, a taxonomy to classify the existing literature and the potential research gaps. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: 15 pages. A version submitted to Information Technology | De Gruyter

MSC Class: 68M14 ACM Class: A.1; D.4

arXiv:2004.14072 [pdf, other]

DeCoRIC: Decentralized Connected Resilient IoT Clustering

Authors: Nitin Shivaraman, Saravanan Ramanathan, Shreejith Shanker, Arvind Easwaran, Sebastian Steinhorst

Abstract: Maintaining peer-to-peer connectivity with low energy overhead is a key requirement for several emerging Internet of Things (IoT) applications. It is also desirable to develop such connectivity solutions for non-static network topologies, so that resilience to device failures can be fully realized. Decentralized clustering has emerged as a promising technique to address this critical challenge. Cl… ▽ More Maintaining peer-to-peer connectivity with low energy overhead is a key requirement for several emerging Internet of Things (IoT) applications. It is also desirable to develop such connectivity solutions for non-static network topologies, so that resilience to device failures can be fully realized. Decentralized clustering has emerged as a promising technique to address this critical challenge. Clustering of nodes around cluster heads (CHs) provides an energy-efficient two-tier framework for peer-to-peer communication. At the same time, decentralization ensures that the framework can quickly adapt to a dynamically changing network topology. Although some decentralized clustering solutions have been proposed in the literature, they either lack guarantees on connectivity or incur significant energy overhead to maintain the clusters. In this paper, we present Decentralized Connected Resilient IoT Clustering (DeCoRIC), an energy-efficient clustering scheme that is self-organizing and resilient to network changes while guaranteeing connectivity. Using experiments implemented on the Contiki simulator, we show that our clustering scheme adapts itself to node faults in a time-bound manner. Our experiments show that DeCoRIC achieves 100% connectivity among all nodes while improving the power efficiency of nodes in the system compared to the state-of-the-art techniques BEEM and LEACH by up to 110% and 70%, respectively. The improved power efficiency also translates to longer lifetime before first node death with a best-case of 109% longer than BEEM and 42% longer than LEACH. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: 10 pages, 8 figures, 3 tables, accepted in ICCCN 2020

arXiv:2004.06368 [pdf]

doi 10.1109/RTCSA.2019.8864557

Managing Industrial Communication Delays with Software-Defined Networking

Authors: Rutvij H. Jhaveri, Rui Tan, Arvind Easwaran, Sagar V. Ramani

Abstract: Recent technological advances have fostered the development of complex industrial cyber-physical systems which demand real-time communication with delay guarantees. The consequences of delay requirement violation in such systems may become increasingly severe. In this paper, we propose a contract-based fault-resilient methodology which aims at managing the communication delays of real-time flows i… ▽ More Recent technological advances have fostered the development of complex industrial cyber-physical systems which demand real-time communication with delay guarantees. The consequences of delay requirement violation in such systems may become increasingly severe. In this paper, we propose a contract-based fault-resilient methodology which aims at managing the communication delays of real-time flows in industries. With this objective, we present a light-weight mechanism to estimate end-to-end delay in the network in which the clocks of the switches are not synchronized. The mechanism aims at providing high level of accuracy with lower communication overhead. We then propose a contract-based framework using software-defined networking where the components are associated with delay contracts and a resilience manager. The proposed resilience management framework contains: (1) contracts which state guarantees about components behaviors, (2) observers which are responsible to detect contract failure (fault), (3) monitors to detect events such as run-time changes in the delay requirements and link failure, (4) control logic to take suitable decisions based on the type of the fault, (5) resilience manager to decide response strategies containing the best course of action as per the control logic decision. Finally, we present a delay-aware path finding algorithm which is used to route/reroute the real-time flows to provide resiliency in the case of faults and, to adapt to the changes in the network state. Performance of the proposed framework is evaluated with the Ryu SDN controller and Mininet network emulator. △ Less

Submitted 14 April, 2020; originally announced April 2020.

arXiv:2004.05761 [pdf, other]

doi 10.1109/RTCSA.2019.8864556

Automatic Generation of Hierarchical Contracts for Resilience in Cyber-Physical Systems

Authors: Zhiheng Xu, Daniel Jun Xian Ng, Arvind Easwaran

Abstract: With the growing scale of Cyber-Physical Systems (CPSs), it is challenging to maintain their stability under all operating conditions. How to reduce the downtime and locate the failures becomes a core issue in system design. In this paper, we employ a hierarchical contract-based resilience framework to guarantee the stability of CPS. In this framework, we use Assume Guarantee (A-G) contracts to mo… ▽ More With the growing scale of Cyber-Physical Systems (CPSs), it is challenging to maintain their stability under all operating conditions. How to reduce the downtime and locate the failures becomes a core issue in system design. In this paper, we employ a hierarchical contract-based resilience framework to guarantee the stability of CPS. In this framework, we use Assume Guarantee (A-G) contracts to monitor the non-functional properties of individual components (e.g., power and latency), and hierarchically compose such contracts to deduce information about faults at the system level. The hierarchical contracts enable rapid fault detection in large-scale CPS. However, due to the vast number of components in CPS, manually designing numerous contracts and the hierarchy becomes challenging. To address this issue, we propose a technique to automatically decompose a root contract into multiple lower-level contracts depending on I/O dependencies between components. We then formulate a multi-objective optimization problem to search the optimal parameters of each lower-level contract. This enables automatic contract refinement taking into consideration the communication overhead between components. Finally, we use a case study from the manufacturing domain to experimentally demonstrate the benefits of the proposed framework. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: ©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2004.04477 [pdf, other]

doi 10.1145/3302509.3313323

Demo Abstract: Contract-based Hierarchical Resilience Framework for Cyber-Physical Systems

Authors: Daniel Jun Xian Ng, Arvind Easwaran, Sidharta Andalam

Abstract: This demonstration presents a framework for building a resilient Cyber-Physical Systems (CPS) cyber-infrastructure through the use of hierarchical parametric assume-guarantee contracts. A Fischertechnik Sorting Line with Color Detection training model is used to showcase our framework. This demonstration presents a framework for building a resilient Cyber-Physical Systems (CPS) cyber-infrastructure through the use of hierarchical parametric assume-guarantee contracts. A Fischertechnik Sorting Line with Color Detection training model is used to showcase our framework. △ Less

Submitted 12 April, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: 2 pages, 5 figures, published in the Demo Session of IEEE International Conference on Cyber-Physical Systems 2019. Publication rights licensed to ACM

arXiv:2004.04444 [pdf, other]

doi 10.1109/ISORC.2018.00013

CLAIR: A Contract-based Framework for Develo** Resilient CPS Architectures

Authors: Sidharta Andalam, Daniel Jun Xian Ng, Arvind Easwaran, Karthik Thangamariappan

Abstract: Industrial cyber-infrastructure is normally a multilayered architecture. The purpose of the layered architecture is to hide complexity and allow independent evolution of the layers. In this paper, we argue that this traditional strict layering results in poor transparency across layers affecting the ability to significantly improve resiliency. We propose a contract-based methodology where componen… ▽ More Industrial cyber-infrastructure is normally a multilayered architecture. The purpose of the layered architecture is to hide complexity and allow independent evolution of the layers. In this paper, we argue that this traditional strict layering results in poor transparency across layers affecting the ability to significantly improve resiliency. We propose a contract-based methodology where components across and within the layers of the cyber-infrastructure are associated with contracts and a light-weight resilience manager. This allows the system to detect faults (contract violation monitored using observers) and react (change contracts dynamically) effectively. It results in (1) improving transparency across layers; helps resiliency, (2) decoupling fault-handling code from application code; helps code maintenance, (3) systematically generate error-free fault handling code; reduces development time. Using an industrial case study, we demonstrate the proposed methodology. △ Less

Submitted 13 April, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: ©2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2004.04442 [pdf, other]

doi 10.1109/LES.2018.2801360

Contract-based Methodology for Develo** Resilient Cyber-Infrastructure in the Industry 4.0 Era

Authors: Sidharta Andalam, Daniel Jun Xian Ng, Arvind Easwaran, Karthik Thangamariappan

Abstract: As the industrial cyber-infrastructure become increasingly important to realise the objectives of Industry~4.0, the consequence of disruption due to internal or external faults become increasingly severe. Thus there is a need for a resilient infrastructure. In this paper, we propose a contract-based methodology where components across layers of the cyber-infrastructure are associated with contract… ▽ More As the industrial cyber-infrastructure become increasingly important to realise the objectives of Industry~4.0, the consequence of disruption due to internal or external faults become increasingly severe. Thus there is a need for a resilient infrastructure. In this paper, we propose a contract-based methodology where components across layers of the cyber-infrastructure are associated with contracts and a light-weight resilience manager. This allows the system to detect faults (contract violation monitored using observers) and react (change contracts dynamically) effectively. △ Less

Submitted 12 April, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: ©2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: IEEE Embedded System Letters 11 (2019) 5-8

arXiv:2004.04441 [pdf]

doi 10.1109/MC.2018.2876071

Contract-based Hierarchical Resilience Management for Cyber-Physical Systems

Authors: Mohammad Shihabul Haque, Daniel Jun Xian Ng, Arvind Easwaran, Karthik Thangamariappan

Abstract: Orchestrated collaborative effort of physical and cyber components to satisfy given requirements is the central concept behind Cyber-Physical Systems (CPS). To duly ensure the performance of components, a software-based resilience manager is a flexible choice to detect and recover from faults quickly. However, a single resilience manager, placed at the centre of the system to deal with every fault… ▽ More Orchestrated collaborative effort of physical and cyber components to satisfy given requirements is the central concept behind Cyber-Physical Systems (CPS). To duly ensure the performance of components, a software-based resilience manager is a flexible choice to detect and recover from faults quickly. However, a single resilience manager, placed at the centre of the system to deal with every fault, suffers from decision-making overburden; and therefore, is out of the question for distributed large-scale CPS. On the other hand, prompt detection of failures and efficient recovery from them are challenging for decentralised resilience managers. In this regard, we present a novel resilience management framework that utilises the concept of management hierarchy. System design contracts play a key role in this framework for prompt fault-detection and recovery. Besides the details of the framework, an Industry 4.0 related test case is presented in this article to provide further insights. △ Less

Submitted 12 April, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: ©2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

ACM Class: D.2.4.g; D.2.15; D.2.4.f

Journal ref: Computer 11 (2018) 56-65

arXiv:2004.02439 [pdf, other]

doi 10.1007/s11241-009-9073-x

Optimal Virtual Cluster-based Multiprocessor Scheduling

Authors: Arvind Easwaran, Insik Shin, Insup Lee

Abstract: Scheduling of constrained deadline sporadic task systems on multiprocessor platforms is an area which has received much attention in the recent past. It is widely believed that finding an optimal scheduler is hard, and therefore most studies have focused on develo** algorithms with good processor utilization bounds. These algorithms can be broadly classified into two categories: partitioned sche… ▽ More Scheduling of constrained deadline sporadic task systems on multiprocessor platforms is an area which has received much attention in the recent past. It is widely believed that finding an optimal scheduler is hard, and therefore most studies have focused on develo** algorithms with good processor utilization bounds. These algorithms can be broadly classified into two categories: partitioned scheduling in which tasks are statically assigned to individual processors, and global scheduling in which each task is allowed to execute on any processor in the platform. In this paper we consider a third, more general, approach called cluster-based scheduling. In this approach each task is statically assigned to a processor cluster, tasks in each cluster are globally scheduled among themselves, and clusters in turn are scheduled on the multiprocessor platform. We develop techniques to support such cluster-based scheduling algorithms, and also consider properties that minimize total processor utilization of individual clusters. In the last part of this paper, we develop new virtual cluster-based scheduling algorithms. For implicit deadline sporadic task systems, we develop an optimal scheduling algorithm that is neither Pfair nor ERfair. We also show that the processor utilization bound of US-EDF{m/(2m-1)} can be improved by using virtual clustering. Since neither partitioned nor global strategies dominate over the other, cluster-based scheduling is a natural direction for research towards achieving improved processor utilization bounds. △ Less

Submitted 6 April, 2020; originally announced April 2020.

Comments: This is a post-peer-review, pre-copyedit version of an article published in Springer Real-Time Systems journal. The final authenticated version is available online at: https://doi.org/10.1007/s11241-009-9073-x

Journal ref: Springer Real-Time Systems, Volume 43, Pages 25-59, July 2009

arXiv:2004.02400 [pdf, other]

doi 10.1109/ECRTS.2015.9

Resource Efficient Isolation Mechanisms in Mixed-Criticality Scheduling

Authors: Xiaozhe Gu, Arvind Easwaran, Kieu-My Phan, Insik Shin

Abstract: Mixed-criticality real-time scheduling has been developed to improve resource utilization while guaranteeing safe execution of critical applications. These studies use optimistic resource reservation for all the applications to improve utilization, but prioritize critical applications when the reservations become insufficient at runtime. Many of them however share an impractical assumption that al… ▽ More Mixed-criticality real-time scheduling has been developed to improve resource utilization while guaranteeing safe execution of critical applications. These studies use optimistic resource reservation for all the applications to improve utilization, but prioritize critical applications when the reservations become insufficient at runtime. Many of them however share an impractical assumption that all the critical applications will simultaneously demand additional resources. As a consequence, they under-utilize resources by penalizing all the low-criticality applications. In this paper we overcome this shortcoming using a novel mechanism that comprises a parameter to model the expected number of critical applications simultaneously demanding more resources, and an execution strategy based on the parameter to improve resource utilization. Since most mixed-criticality systems in practice are component-based, we design our mechanism such that the component boundaries provide the isolation necessary to support the execution of low-criticality applications, and at the same time protect the critical ones. We also develop schedulability tests for the proposed mechanism under both a flat as well as a hierarchical scheduling framework. Finally, through simulations, we compare the performance of the proposed approach with existing studies in terms of schedulability and the capability to support low-criticality applications. △ Less

Submitted 6 April, 2020; originally announced April 2020.

Comments: ©2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: Euromicro Conference on Real-Time Systems (ECRTS), Lund, 2015, pp. 13-24

arXiv:2003.09370 [pdf, other]

doi 10.1109/ICCD46524.2019.00019

TiLA: Twin-in-the-Loop Architecture for Cyber-Physical Production Systems

Authors: Heejong Park, Arvind Easwaran, Sidharta Andalam

Abstract: Digital twin is a virtual replica of a real-world object that lives simultaneously with its physical counterpart. Since its first introduction in 2003 by Grieves, digital twin has gained momentum in a wide range of applications such as industrial manufacturing, automotive and artificial intelligence. However, many digital-twin-related approaches, found in industries as well as literature, mainly f… ▽ More Digital twin is a virtual replica of a real-world object that lives simultaneously with its physical counterpart. Since its first introduction in 2003 by Grieves, digital twin has gained momentum in a wide range of applications such as industrial manufacturing, automotive and artificial intelligence. However, many digital-twin-related approaches, found in industries as well as literature, mainly focus on modelling individual physical things with high-fidelity methods with limited scalability. In this paper, we introduce a digital-twin architecture called TiLA (Twin-in-the-Loop Architecture). TiLA employs heterogeneous models and online data to create a digital twin, which follows a Globally Asynchronous Locally Synchronous (GALS) model of computation. It facilitates the creation of a scalable digital twin with different levels of modelling abstraction as well as giving GALS formalism for execution strategy. Furthermore, TiLA provides facilities to develop applications around the twin as well as an interface to synchronise the twin with the physical system through an industrial communication protocol. A digital twin for a manufacturing line has been developed as a case study using TiLA. It demonstrates the use of digital twin models together with online data for monitoring and analysing failures in the physical system. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Journal ref: IEEE International Conference on Computer Design (ICCD), Abu Dhabi, United Arab Emirates, 2019, pages 82-90

arXiv:2003.08740 [pdf, other]

Out-of-Distribution Detection in Multi-Label Datasets using Latent Space of $β$-VAE

Authors: Vijaya Kumar Sundar, Shreyas Ramakrishna, Zahra Rahiminasab, Arvind Easwaran, Abhishek Dubey

Abstract: Learning Enabled Components (LECs) are widely being used in a variety of perception based autonomy tasks like image segmentation, object detection, end-to-end driving, etc. These components are trained with large image datasets with multimodal factors like weather conditions, time-of-day, traffic-density, etc. The LECs learn from these factors during training, and while testing if there is variati… ▽ More Learning Enabled Components (LECs) are widely being used in a variety of perception based autonomy tasks like image segmentation, object detection, end-to-end driving, etc. These components are trained with large image datasets with multimodal factors like weather conditions, time-of-day, traffic-density, etc. The LECs learn from these factors during training, and while testing if there is variation in any of these factors, the components get confused resulting in low confidence predictions. The images with factors not seen during training is commonly referred to as Out-of-Distribution (OOD). For safe autonomy it is important to identify the OOD images, so that a suitable mitigation strategy can be performed. Classical one-class classifiers like SVM and SVDD are used to perform OOD detection. However, the multiple labels attached to the images in these datasets, restricts the direct application of these techniques. We address this problem using the latent space of the $β$-Variational Autoencoder ($β$-VAE). We use the fact that compact latent space generated by an appropriately selected $β$-VAE will encode the information about these factors in a few latent variables, and that can be used for computationally inexpensive detection. We evaluate our approach on the nuScenes dataset, and our results shows the latent space of $β$-VAE is sensitive to encode changes in the values of the generative factor. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Comments: Workshop on Assured Autonomy (WAAS) -2020

arXiv:2003.08364 [pdf, other]

doi 10.1109/RTSS.2016.014

Dynamic Budget Management with Service Guarantees for Mixed-Criticality Systems

Authors: Xiaozhe Gu, Arvind Easwaran

Abstract: Many existing studies on mixed-criticality (MC) scheduling assume that low-criticality budgets for high-criticality applications are known apriori. These budgets are primarily used as guidance to determine when the scheduler should switch the system mode from low to high. Based on this key observation, in this paper we propose a dynamic MC scheduling model under which low-criticality budgets for i… ▽ More Many existing studies on mixed-criticality (MC) scheduling assume that low-criticality budgets for high-criticality applications are known apriori. These budgets are primarily used as guidance to determine when the scheduler should switch the system mode from low to high. Based on this key observation, in this paper we propose a dynamic MC scheduling model under which low-criticality budgets for individual high-criticality applications are determined at runtime as opposed to being fixed offline. To ensure sufficient budget for high-criticality applications at all times, we use offline schedulability analysis to determine a system-wide total low-criticality budget allocation for all the high-criticality applications combined. This total budget is used as guidance in our model to determine the need for a mode-switch. The runtime strategy then distributes this total budget among the various applications depending on their execution requirement and with the objective of postponing mode-switch as much as possible. We show that this runtime strategy is able to postpone mode-switches for a longer time than any strategy that uses a fixed low-criticality budget allocation for each application. Finally, since we are able to control the total budget allocation for high-criticality applications before mode-switch, we also propose techniques to determine these budgets considering system-wide objectives such as schedulability and service guarantee for low-criticality applications. △ Less

Submitted 11 March, 2020; originally announced March 2020.

Comments: ©2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: IEEE Real-Time Systems Symposium (RTSS), Porto, Portugal, 2016, pages 47-56

arXiv:2003.05445 [pdf, other]

doi 10.23919/DATE.2017.7926989

Utilization Difference Based Partitioned Scheduling of Mixed-Criticality Systems

Authors: Saravanan Ramanathan, Arvind Easwaran

Abstract: Mixed-Criticality (MC) systems consolidate multiple functionalities with different criticalities onto a single hardware platform. Such systems improve the overall resource utilization while guaranteeing resources to critical tasks. In this paper, we focus on the problem of partitioned multiprocessor MC scheduling, in particular the problem of designing efficient partitioning strategies. We develop… ▽ More Mixed-Criticality (MC) systems consolidate multiple functionalities with different criticalities onto a single hardware platform. Such systems improve the overall resource utilization while guaranteeing resources to critical tasks. In this paper, we focus on the problem of partitioned multiprocessor MC scheduling, in particular the problem of designing efficient partitioning strategies. We develop two new partitioning strategies based on the principle of evenly distributing the difference between total high-critical utilization and total low-critical utilization for the critical tasks among all processors. By balancing this difference, we are able to reduce the pessimism in uniprocessor MC schedulability tests that are applied on each processor, thus improving overall schedulability. To evaluate the schedulability performance of the proposed strategies, we compare them against existing partitioned algorithms using extensive experiments. We show that the proposed strategies are effective with both dynamic-priority Earliest Deadline First with Virtual Deadlines (EDF-VD) and fixed-priority Adaptive Mixed-Criticality (AMC) algorithms. Specifically, our results show that the proposed strategies improve schedulability by as much as 28.1% and 36.2% for implicit and constrained-deadline task systems respectively. △ Less

Submitted 11 March, 2020; originally announced March 2020.

Comments: ©2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, Lausanne, 2017, pages 238-243

arXiv:2003.05444 [pdf, other]

doi 10.1109/RTSS.2013.16

Demand-based Scheduling of Mixed-Criticality Sporadic Tasks on One Processor

Authors: Arvind Easwaran

Abstract: Strategies that artificially tighten high-criticality task deadlines in low-criticality behaviors have been successfully employed for scheduling mixed-criticality systems. Although efficient scheduling algorithms have been developed for implicit deadline task systems, the same is not true for more general sporadic tasks. In this paper we develop a new demand-based schedulability test for such gene… ▽ More Strategies that artificially tighten high-criticality task deadlines in low-criticality behaviors have been successfully employed for scheduling mixed-criticality systems. Although efficient scheduling algorithms have been developed for implicit deadline task systems, the same is not true for more general sporadic tasks. In this paper we develop a new demand-based schedulability test for such general mixed-criticality task systems, in which we collectively bound the low- and high-criticality demand of tasks. We show that the new test strictly dominates the only other known demand-based test for such systems. We also propose a new deadline tightening strategy based on this test, and show through simulations that the strategy significantly outperforms all known scheduling algorithms for a variety of sporadic task systems. △ Less

Submitted 11 March, 2020; originally announced March 2020.

Comments: ©2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: IEEE Real-Time Systems Symposium (RTSS), Vancouver, Canada, 2013, pages 78-87

arXiv:2003.05442 [pdf, other]

doi 10.1109/DS-RT47707.2019.8958666

Combining Task-level and System-level Scheduling Modes for Mixed Criticality Systems

Authors: Jalil Boudjadar, Saravanan Ramanathan, Arvind Easwaran, Ulrik Nyman

Abstract: Different scheduling algorithms for mixed criticality systems have been recently proposed. The common denominator of these algorithms is to discard low critical tasks whenever high critical tasks are in lack of computation resources. This is achieved upon a switch of the scheduling mode from Normal to Critical. We distinguish two main categories of the algorithms: system-level mode switch and task… ▽ More Different scheduling algorithms for mixed criticality systems have been recently proposed. The common denominator of these algorithms is to discard low critical tasks whenever high critical tasks are in lack of computation resources. This is achieved upon a switch of the scheduling mode from Normal to Critical. We distinguish two main categories of the algorithms: system-level mode switch and task-level mode switch. System-level mode algorithms allow low criticality (LC) tasks to execute only in normal mode. Task-level mode switch algorithms enable to switch the mode of an individual high criticality task (HC), from low (LO) to high (HI), to obtain priority over all LC tasks. This paper investigates an online scheduling algorithm for mixed-criticality systems that supports dynamic mode switches for both task level and system level. When a HC task job overruns its LC budget, then only that particular job is switched to HI mode. If the job cannot be accommodated, then the system switches to Critical mode. To accommodate for resource availability of the HC jobs, the LC tasks are degraded by stretching their periods until the Critical mode exhibiting job complete its execution. The stretching will be carried out until the resource availability is met. We have mechanized and implemented the proposed algorithm using Uppaal. To study the efficiency of our scheduling algorithm, we examine a case study and compare our results to the state of the art algorithms. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Comments: ©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT), Cosenza, Italy, 2019, pages 1-10

arXiv:2003.05168 [pdf, other]

doi 10.1007/s11241-017-9296-1

Multi-Rate Fluid Scheduling of Mixed-Criticality Systems on Multiprocessors

Authors: Saravanan Ramanathan, Arvind Easwaran, Hyeonjoong Cho

Abstract: In this paper we consider the problem of mixed-criticality (MC) scheduling of implicit-deadline sporadic task systems on a homogenous multiprocessor platform. Focusing on dual-criticality systems, algorithms based on the fluid scheduling model have been proposed in the past. These algorithms use a dual-rate execution model for each high-criticality task depending on the system mode. Once the syste… ▽ More In this paper we consider the problem of mixed-criticality (MC) scheduling of implicit-deadline sporadic task systems on a homogenous multiprocessor platform. Focusing on dual-criticality systems, algorithms based on the fluid scheduling model have been proposed in the past. These algorithms use a dual-rate execution model for each high-criticality task depending on the system mode. Once the system switches to the high-criticality mode, the execution rates of such tasks are increased to meet their increased demand. Although these algorithms are speed-up optimal, they are unable to schedule several feasible dual-criticality task systems. This is because a single fixed execution rate for each high-criticality task after the mode switch is not efficient to handle the high variability in demand during the transition period immediately following the mode switch. This demand variability exists as long as the carry-over jobs of high-criticality tasks, that is jobs released before the mode switch, have not completed. Addressing this shortcoming, we propose a multi-rate fluid execution model for dual-criticality task systems in this paper. Under this model, high-criticality tasks are allocated varying execution rates in the transition period after the mode switch to efficiently handle the demand variability. We derive a sufficient schedulability test for the proposed model and show its dominance over the dual-rate fluid execution model. Further, we also present a speed-up optimal rate assignment strategy for the multi-rate model, and experimentally show that the proposed model outperforms all the existing MC scheduling algorithms with known speed-up bounds. △ Less

Submitted 11 March, 2020; originally announced March 2020.

Comments: This is a post-peer-review, pre-copyedit version of an article published in Real-Time Systems. The final authenticated version is available online at the below DOI

Journal ref: Springer Real-Time Systems, Issue 54, pages 247-277, April 2018

arXiv:2003.05160 [pdf, other]

doi 10.1145/3105922

Efficient Schedulability Test for Dynamic-Priority Scheduling of Mixed-Criticality Real-Time Systems

Authors: Xiaozhe Gu, Arvind Easwaran

Abstract: Systems in many safety-critical application domains are subject to certification requirements. In such a system, there are typically different applications providing functionalities that have varying degrees of criticality. Consequently, the certification requirements for functionalities at these different criticality levels are also varying, with very high levels of assurance required for a highl… ▽ More Systems in many safety-critical application domains are subject to certification requirements. In such a system, there are typically different applications providing functionalities that have varying degrees of criticality. Consequently, the certification requirements for functionalities at these different criticality levels are also varying, with very high levels of assurance required for a highly critical functionality, whereas relatively low levels of assurance required for a less critical functionality. Considering the timing assurance given to various applications in the form of guaranteed budgets within deadlines, a theory of real-time scheduling for such multi-criticality systems has been under development in the recent past. In particular, an algorithm called Earliest Deadline First with Virtual Deadlines (EDF-VD) has shown a lot of promise for systems with two criticality levels, especially in terms of practical performance demonstrated through experiment results. In this paper we design a new schedulability test for EDF-VD that extend these performance benefits to multi-criticality systems. We propose a new test based on demand bound functions and also present a novel virtual deadline assignment strategy. Through extensive experiments we show that the proposed technique significantly outperforms existing strategies for a variety of generic real-time systems. △ Less

Submitted 11 March, 2020; originally announced March 2020.

Comments: Publication rights licensed to ACM

Journal ref: ACM Transactions on Embedded Computing Systems, Volume 17, Issue 1, Pages 24:1-24:24, November 2017

arXiv:1910.12000 [pdf, other]

SlotSwapper: A Schedule Randomization protocol for Real-Time WirelessHART Networks

Authors: Ankita Samaddar, Arvind Easwaran, Rui Tan

Abstract: Industrial process control systems are time-critical systems where reliable communications between sensors and actuators need to be guaranteed within strict deadlines to maintain safe operation of all the components of the system. WirelessHART is the most widely adopted standard which serve as the medium of communication in industrial setups due to its support for Time Division Multiple Access (TD… ▽ More Industrial process control systems are time-critical systems where reliable communications between sensors and actuators need to be guaranteed within strict deadlines to maintain safe operation of all the components of the system. WirelessHART is the most widely adopted standard which serve as the medium of communication in industrial setups due to its support for Time Division Multiple Access (TDMA)based communication, multiple channels, channel hop**, centralized architecture, redundant routes and avoidance of spatial re-use of channels. However, the communication schedule in WirelessHART network is decided by a centralized network manager at the time of network initialization and the same communication schedule repeats every hyper-period. Due to predictability in the time slots of the communication schedule, these systems are vulnerable to timing attacks which eventually can disrupt the safety of the system. In this work, we present a moving target defense mechanism, the SlotSwapper, which uses schedule randomization techniques to randomize the time slots over a hyper-period schedule, while still preserving all the feasibility constraints of a real-time WirelessHART network and makes the schedule uncertain every hyper-period. We tested the feasibility of the generated schedules on random topologies with 100 simulated motes in Cooja simulator. We use schedule entropy to measure the confidentiality of our algorithm in terms of randomness in the time slots of the generated schedules. △ Less

Submitted 26 October, 2019; originally announced October 2019.

Comments: RTN, ECRTS 2019

arXiv:1909.04886 [pdf, other]

doi 10.1145/3302509.3311038

Towards Safe Machine Learning for CPS: Infer Uncertainty from Training Data

Authors: Xiaozhe Gu, Arvind Easwaran

Abstract: Machine learning (ML) techniques are increasingly applied to decision-making and control problems in Cyber-Physical Systems among which many are safety-critical, e.g., chemical plants, robotics, autonomous vehicles. Despite the significant benefits brought by ML techniques, they also raise additional safety issues because 1) most expressive and powerful ML models are not transparent and behave as… ▽ More Machine learning (ML) techniques are increasingly applied to decision-making and control problems in Cyber-Physical Systems among which many are safety-critical, e.g., chemical plants, robotics, autonomous vehicles. Despite the significant benefits brought by ML techniques, they also raise additional safety issues because 1) most expressive and powerful ML models are not transparent and behave as a black box and 2) the training data which plays a crucial role in ML safety is usually incomplete. An important technique to achieve safety for ML models is "Safe Fail", i.e., a model selects a reject option and applies the backup solution, a traditional controller or a human operator for example, when it has low confidence in a prediction. Data-driven models produced by ML algorithms learn from training data, and hence they are only as good as the examples they have learnt. As pointed in [17], ML models work well in the "training space" (i.e., feature space with sufficient training data), but they could not extrapolate beyond the training space. As observed in many previous studies, a feature space that lacks training data generally has a much higher error rate than the one that contains sufficient training samples [31]. Therefore, it is essential to identify the training space and avoid extrapolating beyond the training space. In this paper, we propose an efficient Feature Space Partitioning Tree (FSPT) to address this problem. Using experiments, we also show that, a strong relationship exists between model performance and FSPT score. △ Less

Submitted 11 September, 2019; originally announced September 2019.

Comments: Publication rights licensed to ACM

Journal ref: In Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS), 2019. ACM, New York, NY, USA, pages 249-258

arXiv:1810.08342 [pdf, ps, other]

Flow Network Models for Online Scheduling Real-time Tasks on Multiprocessors

Authors: Hyeonjoong Cho, Arvind Easwaran

Abstract: We consider the flow network model to solve the multiprocessor real-time task scheduling problems. Using the flow network model or its generic form, linear programming (LP) formulation, for the problems is not new. However, the previous works have limitations, for example, that they are classified as offline scheduling techniques since they establish a flow network model or an LP problem consideri… ▽ More We consider the flow network model to solve the multiprocessor real-time task scheduling problems. Using the flow network model or its generic form, linear programming (LP) formulation, for the problems is not new. However, the previous works have limitations, for example, that they are classified as offline scheduling techniques since they establish a flow network model or an LP problem considering a very long time interval. In this study, we propose how to construct the flow network model for online scheduling periodic real-time tasks on multiprocessors. Our key idea is to construct the flow network only for the active instances of tasks at the current scheduling time, while guaranteeing the existence of an optimal schedule for the future instances of the tasks. The optimal scheduling is here defined to ensure that all real-time tasks meet their deadlines when the total utilization demand of the given tasks does not exceed the total processing capacity. We then propose the flow network model-based polynomial-time scheduling algorithms. Advantageously, the flow network model allows the task workload to be collected unfairly within a certain time interval without losing the optimality. It thus leads us to designing three unfair-but-optimal scheduling algorithms on both continuous and discrete-time models. Especially, our unfair-but-optimal scheduling algorithm on a discrete-time model is, to the best of our knowledge, the first in the problem domain. We experimentally demonstrate that it significantly alleviates the scheduling overheads, i.e., the reduced number of preemptions with the comparable number of task migrations across processors. △ Less

Submitted 18 October, 2018; originally announced October 2018.

Comments: 33 pages, 12 figures, submitted to Real-Time Systems

arXiv:1809.08195 [pdf, other]

Crossbar-Constrained Technology Map** for ReRAM based In-Memory Computing

Authors: Debjyoti Bhattacharjee, Yaswanth Tavva, Arvind Easwaran, Anupam Chattopadhyay

Abstract: In recent times, Resistive RAMs (ReRAMs) have gained significant prominence due to their unique feature of supporting both non-volatile storage and logic capabilities. ReRAM is also reported to provide extremely low power consumption compared to the standard CMOS storage devices. As a result, researchers have explored the map** and design of diverse applications, ranging from arithmetic to neuro… ▽ More In recent times, Resistive RAMs (ReRAMs) have gained significant prominence due to their unique feature of supporting both non-volatile storage and logic capabilities. ReRAM is also reported to provide extremely low power consumption compared to the standard CMOS storage devices. As a result, researchers have explored the map** and design of diverse applications, ranging from arithmetic to neuromorphic computing structures to ReRAM-based platforms. ReVAMP, a general-purpose ReRAM computing platform, has been proposed recently to leverage the parallelism exhibited in a crossbar structure. However, the technology map** on ReVAMP remains an open challenge. Though the technology map** with device/area-constraints have been proposed, crossbar constraints are not considered so far. In this work, we address this problem. Two technology map** flows are proposed, considering different runtime-efficiency trade-offs. Both the map** flows take crossbar constraints into account and generate feasible map** for a variety of crossbar dimensions. Our proposed algorithms are highly scalable and reveal important design hints for ReRAM-based implementations. △ Less

Submitted 21 September, 2018; originally announced September 2018.

arXiv:1809.03165 [pdf, ps, other]

Resilience Bounds of Sensing-Based Network Clock Synchronization

Authors: Rui Tan, Linshan Jiang, Arvind Easwaran, Jothi Prasanna Shanmuga Sundaram

Abstract: Recent studies exploited external periodic synchronous signals to synchronize a pair of network nodes to address a threat of delaying the communications between the nodes. However, the sensing-based synchronization may yield faults due to nonmalicious signal and sensor noises. This paper considers a system of N nodes that will fuse their peer-to-peer synchronization results to correct the faults.… ▽ More Recent studies exploited external periodic synchronous signals to synchronize a pair of network nodes to address a threat of delaying the communications between the nodes. However, the sensing-based synchronization may yield faults due to nonmalicious signal and sensor noises. This paper considers a system of N nodes that will fuse their peer-to-peer synchronization results to correct the faults. Our analysis gives the lower bound of the number of faults that the system can tolerate when N is up to 12. If the number of faults is no greater than the lower bound, the faults can be identified and corrected. We also prove that the system cannot tolerate more than N-2 faults. Our results can guide the design of resilient sensing-based clock synchronization systems. △ Less

Submitted 10 September, 2018; originally announced September 2018.

Showing 1–42 of 42 results for author: Easwaran, A