-
Serverless data pipeline approaches for IoT data in fog and cloud computing
Authors:
Shivananda R Poojara,
Chinmaya Kumar Dehury,
Pelle Jakovits,
Satish Narayana Srirama
Abstract:
With the increasing number of Internet of Things (IoT) devices, massive amounts of raw data is being generated. The latency, cost, and other challenges in cloud-based IoT data processing have driven the adoption of Edge and Fog computing models, where some data processing tasks are moved closer to data sources. Properly dealing with the flow of such data requires building data pipelines, to contro…
▽ More
With the increasing number of Internet of Things (IoT) devices, massive amounts of raw data is being generated. The latency, cost, and other challenges in cloud-based IoT data processing have driven the adoption of Edge and Fog computing models, where some data processing tasks are moved closer to data sources. Properly dealing with the flow of such data requires building data pipelines, to control the complete life cycle of data streams from data acquisition at the data source, edge and fog processing, to Cloud side storage and analytics. Data analytics tasks need to be executed dynamically at different distances from the data sources and often on very heterogeneous hardware devices. This can be streamlined by the use of a Serverless (or FaaS) cloud computing model, where tasks are defined as virtual functions, which can be migrated from edge to cloud (and vice versa) and executed in an event-driven manner on data streams. In this work, we investigate the benefits of building Serverless data pipelines (SDP) for IoT data analytics and evaluate three different approaches for designing SDPs: 1) Off-the-shelf data flow tool (DFT) based, 2) Object storage service (OSS) based and 3) MQTT based. Further, we applied these strategies on three fog applications (Aeneas, PocketSphinx, and custom Video processing application) and evaluated the performance by comparing their processing time (computation time, network communication and disk access time), and resource utilization. Results show that DFT is unsuitable for compute-intensive applications such as video or image processing, whereas OSS is best suitable for this task. However, DFT is nicely fit for bandwidth-intensive applications due to the minimum use of network resources. On the other hand, MQTT-based SDP is observed with increase in CPU and Memory usage as the number of...<truncted to fit character limit in Arxiv>
△ Less
Submitted 18 December, 2021;
originally announced December 2021.
-
TOSCAdata: Modelling data pipeline applications in TOSCA
Authors:
Chinmaya Kumar Dehury,
Pelle Jakovits,
Satish Narayana Srirama,
Giorgos Giotis,
Gaurav Garg
Abstract:
The serverless platform allows a customer to effectively use cloud resources and pay for the exact amount of used resources. A number of dedicated open source and commercial cloud data management tools are available to handle the massive amount of data. Such modern cloud data management tools are not enough matured to integrate the generic cloud application with the serverless platform due to the…
▽ More
The serverless platform allows a customer to effectively use cloud resources and pay for the exact amount of used resources. A number of dedicated open source and commercial cloud data management tools are available to handle the massive amount of data. Such modern cloud data management tools are not enough matured to integrate the generic cloud application with the serverless platform due to the lack of mature and stable standards. One of the most popular and mature standards, TOSCA (Topology and Orchestration Specification for Cloud Applications), mainly focuses on application and service portability and automated management of the generic cloud application components. This paper proposes the extension of the TOSCA standard, TOSCAdata, that focuses on the modeling of data pipeline-based cloud applications. Kee** the requirements of modern data pipeline cloud applications, TOSCAdata provides a number of TOSCA models that are independently deployable, schedulable, scalable, and re-usable, while effectively handling the flow and transformation of data in a pipeline manner. We also demonstrate the applicability of proposed TOSCAdata models by taking a web-based cloud application in the context of tourism promotion as a use case scenario.
△ Less
Submitted 20 December, 2021; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Desktop to Cloud Migration of Scientific Computing Experiments
Authors:
Satish Narayana Srirama,
Pelle Jakovits,
Vladislav Ivaništšev
Abstract:
Scientific computing applications usually need huge amounts of computational power. The cloud provides interesting high-performance computing solutions, with its promise of virtually infinite resources on demand. However, migrating scientific computing problems to clouds and the re-creation of software environment on the vendor-supplied OS and cloud instances is often a laborious task. It is also…
▽ More
Scientific computing applications usually need huge amounts of computational power. The cloud provides interesting high-performance computing solutions, with its promise of virtually infinite resources on demand. However, migrating scientific computing problems to clouds and the re-creation of software environment on the vendor-supplied OS and cloud instances is often a laborious task. It is also assumed that the scientist who is performing the experiments has significant knowledge of computer science, cloud computing and the migration procedure, which is often not true. Considering these obstacles, we have designed a tool suite that migrates the complete software environment directly to the cloud. The developed desktop-to-cloud-migration (D2CM) tool supports transformation and migration of virtual machine images, reusable deployment description and life-cycle management for applications to be hosted on Amazon Cloud or compatible infrastructure such as Eucalyptus. The paper also presents an electrochemical case study and computational experiments targeted at designing modern supercapacitors. These experiments have extensively used the tool in drawing domain specific results. Detailed analysis of the case showed that D2CM tool not only simplifies the migration procedure for the scientists, but also helps them in optimizing the calculations and compute clusters, by providing them a new dimension -- cost-to-value of computational experiments.
△ Less
Submitted 25 November, 2015;
originally announced November 2015.