-
Edge Computing for IoT
Authors:
Balqees Talal Hasan,
Ali Kadhum Idrees
Abstract:
Over the past few years, The idea of edge computing has seen substantial expansion in both academic and industrial circles. This computing approach has garnered attention due to its integrating role in advancing various state-of-the-art technologies such as Internet of Things (IoT) , 5G, artificial intelligence, and augmented reality. In this chapter, we introduce computing paradigms for IoT, offe…
▽ More
Over the past few years, The idea of edge computing has seen substantial expansion in both academic and industrial circles. This computing approach has garnered attention due to its integrating role in advancing various state-of-the-art technologies such as Internet of Things (IoT) , 5G, artificial intelligence, and augmented reality. In this chapter, we introduce computing paradigms for IoT, offering an overview of the current cutting-edge computing approaches that can be used with IoT. Furthermore, we go deeper into edge computing paradigms, specifically focusing on cloudlet and mobile edge computing. After that, we investigate the architecture of edge computing-based IoT, its advantages, and the technologies that make Edge computing-based IoT possible, including artificial intelligence and lightweight virtualization. Additionally, we review real-life case studies of how edge computing is applied in IoT-based Intelligent Systems, including areas like healthcare, manufacturing, agriculture, and transportation. Finally, we discuss current research obstacles and outline potential future directions for further investigation in this domain.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Federated Learning for Iot/Edge/Fog Computing Systems
Authors:
Balqees Talal Hasan,
Ali Kadhum Idrees
Abstract:
With the help of a new architecture called Edge/Fog (E/F) computing, cloud computing services can now be extended nearer to data generator devices. E/F computing in combination with Deep Learning (DL) is a promisedtechnique that is vastly applied in numerous fields. To train their models, data producers in conventional DL architectures with E/F computing enable them to repeatedly transmit and comm…
▽ More
With the help of a new architecture called Edge/Fog (E/F) computing, cloud computing services can now be extended nearer to data generator devices. E/F computing in combination with Deep Learning (DL) is a promisedtechnique that is vastly applied in numerous fields. To train their models, data producers in conventional DL architectures with E/F computing enable them to repeatedly transmit and communicate data with third-party servers, like Edge/Fog or cloud servers. Due to the extensive bandwidth needs, legal issues, and privacy risks, this architecture is frequently impractical. Through a centralized server, the models can be co-trained by FL through distributed clients, including cars, hospitals, and mobile phones, while preserving data localization. As it facilitates group learning and model optimization, FL can therefore be seen as a motivating element in the E/F computing paradigm. Although FL applications in E/F computing environments have been considered in previous studies, FL execution and hurdles in the E/F computing framework have not been thoroughly covered. In order to identify advanced solutions, this chapter will provide a review of the application of FL in E/F computing systems. We think that by doing this chapter, researchers will learn more about how E/F computing and FL enable related concepts and technologies. Some case studies about the implementation of federated learning in E/F computing are being investigated. The open issues and future research directions are introduced.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Performance Evaluation of Unified Parallel C for Molecular Dynamics
Authors:
Kamran Idrees,
Christoph Niethammer,
Aniello Esposito,
Colin W. Glass
Abstract:
Partitioned Global Address Space (PGAS) integrates the concepts of shared memory programming and the control of data distribution and locality provided by message passing into a single parallel programming model. The purpose of allying distributed data with shared memory is to cultivate a locality-aware shared memory paradigm. PGAS is comprised of a single shared address space, which is partitione…
▽ More
Partitioned Global Address Space (PGAS) integrates the concepts of shared memory programming and the control of data distribution and locality provided by message passing into a single parallel programming model. The purpose of allying distributed data with shared memory is to cultivate a locality-aware shared memory paradigm. PGAS is comprised of a single shared address space, which is partitioned among threads. Each thread has a portion of the shared address space in local memory and therefore it can exploit data locality by mainly doing computation on local data. Unified Parallel C (UPC) is a parallel extension of ISO C and an implementation of the PGAS model. In this paper, we evaluate the performance of UPC based on a real-world scenario from Molecular Dynamics.
△ Less
Submitted 12 March, 2016;
originally announced March 2016.
-
Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems
Authors:
Huan Zhou,
Kamran Idrees,
José Gracia
Abstract:
The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In thi…
▽ More
The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers.
△ Less
Submitted 7 March, 2016;
originally announced March 2016.
-
Effective use of the PGAS Paradigm: Driving Transformations and Self-Adaptive Behavior in DASH-Applications
Authors:
Kamran Idrees,
Tobias Fuchs,
Colin W. Glass
Abstract:
DASH is a library of distributed data structures and algorithms designed for running the applications on modern HPC architectures, composed of hierarchical network interconnections and stratified memory. DASH implements a PGAS (partitioned global address space) model in the form of C++ templates, built on top of DART -- a run-time system with an abstracted tier above existing one-sided communicati…
▽ More
DASH is a library of distributed data structures and algorithms designed for running the applications on modern HPC architectures, composed of hierarchical network interconnections and stratified memory. DASH implements a PGAS (partitioned global address space) model in the form of C++ templates, built on top of DART -- a run-time system with an abstracted tier above existing one-sided communication libraries.
In order to facilitate the application development process for exploiting the hierarchical organization of HPC machines, DART allows to reorder the placement of the computational units. In this paper we present an automatic, hierarchical units map** technique (using a similar approach to the Hilbert curve transformation) to reorder the placement of DART units on the Cray XC40 machine Hazel Hen at HLRS. To evaluate the performance of new units map** which takes into the account the topology of allocated compute nodes, we perform latency benchmark for a 3D stencil code. The technique of units map** is generic and can be be adopted in other DART communication substrates and on other hardware platforms.
Furthermore, high--level features of DASH are presented, enabling more complex automatic transformations and optimizations in the future.
△ Less
Submitted 4 March, 2016;
originally announced March 2016.
-
DART-MPI: An MPI-based Implementation of a PGAS Runtime System
Authors:
Huan Zhou,
Yousri Mhedheb,
Kamran Idrees,
Colin W. Glass,
José Gracia,
Karl Fürlinger,
Jie Tao
Abstract:
A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of develo** parallel applications, because no explicit communication has to be specified in the program for data exchange between…
▽ More
A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of develo** parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.
△ Less
Submitted 7 July, 2015;
originally announced July 2015.