-
Federated Learning for Digital Twin-Based Vehicular Networks: Architecture and Challenges
Authors:
Latif U. Khan,
Ehzaz Mustafa,
Junaid Shuja,
Faisal Rehman,
Kashif Bilal,
Zhu Han,
Choong Seon Hong
Abstract:
Emerging intelligent transportation applications, such as accident reporting, lane change assistance, collision avoidance, and infotainment, will be based on diverse requirements (e.g., latency, reliability, quality of physical experience). To fulfill such requirements, there is a significant need to deploy a digital twin-based intelligent transportation system. Although the twin-based implementat…
▽ More
Emerging intelligent transportation applications, such as accident reporting, lane change assistance, collision avoidance, and infotainment, will be based on diverse requirements (e.g., latency, reliability, quality of physical experience). To fulfill such requirements, there is a significant need to deploy a digital twin-based intelligent transportation system. Although the twin-based implementation of vehicular networks can offer performance optimization. Modeling twins is a significantly challenging task. Machine learning (ML) can be a preferable solution to model such a virtual model, and specifically federated learning (FL) is a distributed learning scheme that can better preserve privacy compared to centralized ML. Although FL can offer performance enhancement, it requires careful design. Therefore, in this article, we present an overview of FL for the twin-based vehicular network. A general architecture showing FL for the twin-based vehicular network is proposed. Our proposed architecture consists of two spaces, such as twin space and a physical space. The physical space consists of all the physical entities (e.g., cars and edge servers) required for vehicular networks, whereas the twin space refers to the logical space that is used for the deployment of twins. A twin space can be implemented either using edge servers and cloud servers. We also outline a few use cases of FL for the twin-based vehicular network. Finally, the paper is concluded and an outlook on open challenges is presented.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
An Intelligent Resource Reservation for Crowdsourced Live Video Streaming Applications in Geo-Distributed Cloud Environment
Authors:
Emna Baccour,
Fatima Haouari,
Aiman Erbad,
Amr Mohamed,
Kashif Bilal,
Mohsen Guizani,
Mounir Hamdi
Abstract:
Crowdsourced live video streaming (livecast) services such as Facebook Live, YouNow, Douyu and Twitch are gaining more momentum recently. Allocating the limited resources in a cost-effective manner while maximizing the Quality of Service (QoS) through real-time delivery and the provision of the appropriate representations for all viewers is a challenging problem. In our paper, we introduce a machi…
▽ More
Crowdsourced live video streaming (livecast) services such as Facebook Live, YouNow, Douyu and Twitch are gaining more momentum recently. Allocating the limited resources in a cost-effective manner while maximizing the Quality of Service (QoS) through real-time delivery and the provision of the appropriate representations for all viewers is a challenging problem. In our paper, we introduce a machine-learning based predictive resource allocation framework for geo-distributed cloud sites, considering the delay and quality constraints to guarantee the maximum QoS for viewers and the minimum cost for content providers. First, we present an offline optimization that decides the required transcoding resources in distributed regions near the viewers with a trade-off between the QoS and the overall cost. Second, we use machine learning to build forecasting models that proactively predict the approximate transcoding resources to be reserved at each cloud site ahead of time. Finally, we develop a Greedy Nearest and Cheapest algorithm (GNCA) to perform the resource allocation of real-time broadcasted videos on the rented resources. Extensive simulations have shown that GNCA outperforms the state-of-the art resource allocation approaches for crowdsourced live streaming by achieving more than 20% gain in terms of system cost while serving the viewers with relatively lower latency.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Applying Machine Learning Techniques for Caching in Edge Networks: A Comprehensive Survey
Authors:
Junaid Shuja,
Kashif Bilal,
Waleed Alasmary,
Hassan Sinky,
Eisa Alanazi
Abstract:
Edge networking is a complex and dynamic computing paradigm that aims to push cloud resources closer to the end user improving responsiveness and reducing backhaul traffic. User mobility, preferences, and content popularity are the dominant dynamic features of edge networks. Temporal and social features of content, such as the number of views and likes are leveraged to estimate the popularity of c…
▽ More
Edge networking is a complex and dynamic computing paradigm that aims to push cloud resources closer to the end user improving responsiveness and reducing backhaul traffic. User mobility, preferences, and content popularity are the dominant dynamic features of edge networks. Temporal and social features of content, such as the number of views and likes are leveraged to estimate the popularity of content from a global perspective. However, such estimates should not be mapped to an edge network with particular social and geographic characteristics. In next generation edge networks, i.e., 5G and beyond 5G, machine learning techniques can be applied to predict content popularity based on user preferences, cluster users based on similar content interests, and optimize cache placement and replacement strategies provided a set of constraints and predictions about the state of the network. These applications of machine learning can help identify relevant content for an edge network. This article investigates the application of machine learning techniques for in-network caching in edge networks. We survey recent state-of-the-art literature and formulate a comprehensive taxonomy based on (a) machine learning technique (method, objective, and features), (b) caching strategy (policy, location, and replacement), and (c) edge network (type and delivery strategy). A comparative analysis of the state-of-the-art literature is presented with respect to the parameters identified in the taxonomy. Moreover, we debate research challenges and future directions for optimal caching decisions and the application of machine learning in edge networks.
△ Less
Submitted 3 November, 2020; v1 submitted 21 June, 2020;
originally announced June 2020.
-
FacebookVideoLive18: A Live Video Streaming Dataset for Streams Metadata and Online Viewers Locations
Authors:
Emna Baccour,
Aiman Erbad,
Kashif Bilal,
Amr Mohamed,
Mohsen Guizani,
Mounir Hamdi
Abstract:
With the advancement in personal smart devices and pervasive network connectivity, users are no longer passive content consumers, but also contributors in producing new contents. This expansion in live services requires a detailed analysis of broadcasters' and viewers' behavior to maximize users' Quality of Experience (QoE). In this paper, we present a dataset gathered from one of the popular live…
▽ More
With the advancement in personal smart devices and pervasive network connectivity, users are no longer passive content consumers, but also contributors in producing new contents. This expansion in live services requires a detailed analysis of broadcasters' and viewers' behavior to maximize users' Quality of Experience (QoE). In this paper, we present a dataset gathered from one of the popular live streaming platforms: Facebook. In this dataset, we stored more than 1,500,000 live stream records collected in June and July 2018. These data include public live videos from all over the world. However, Facebook live API does not offer the possibility to collect online videos with their fine grained data. The API allows to get the general data of a stream, only if we know its ID (identifier). Therefore, using the live map website provided by Facebook and showing the locations of online streams and locations of viewers, we extracted video IDs and different coordinates along with general metadata. Then, having these IDs and using the API, we can collect the fine grained metadata of public videos that might be useful for the research community. We also present several preliminary analyses to describe and identify the patterns of the streams and viewers. Such fine grained details will enable the multimedia community to recreate real-world scenarios particularly for resource allocation, caching, computation, and transcoding in edge networks. Existing datasets do not provide the locations of the viewers, which limits the efforts made to allocate the multimedia resources as close as possible to viewers and to offer better QoE.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Proactive Video Chunks Caching and Processing for Latency and Cost Minimization in Edge Networks
Authors:
Emna Baccour,
Aiman Erbad,
Amr Mohamed,
Kashif Bilal,
Mohsen Guizani
Abstract:
Recently, the growing demand for rich multimedia content such as Video on Demand (VoD) has made the data transmission from content delivery networks (CDN) to end-users quite challenging. Edge networks have been proposed as an extension to CDN networks to alleviate this excessive data transfer through caching and to delegate the computation tasks to edge servers. To maximize the caching efficiency…
▽ More
Recently, the growing demand for rich multimedia content such as Video on Demand (VoD) has made the data transmission from content delivery networks (CDN) to end-users quite challenging. Edge networks have been proposed as an extension to CDN networks to alleviate this excessive data transfer through caching and to delegate the computation tasks to edge servers. To maximize the caching efficiency in the edge networks, different Mobile Edge Computing (MEC) servers assist each others to effectively select which content to store and the appropriate computation tasks to process. In this paper, we adopt a collaborative caching and transcoding model for VoD in MEC networks. However, unlike other models in the literature, different chunks of the same video are not fetched and cached in the same MEC server. Instead, neighboring servers will collaborate to store and transcode different video chunks and consequently optimize the limited resources usage. Since we are dealing with chunks caching and processing, we propose to maximize the edge efficiency by studying the viewers watching pattern and designing a probabilistic model where chunks popularities are evaluated. Based on this model, popularity-aware policies, namely Proactive caching policy (PcP) and Cache replacement Policy (CrP), are introduced to cache only highest probably requested chunks. In addition to PcP and CrP, an online algorithm (PCCP) is proposed to schedule the collaborative caching and processing. The evaluation results prove that our model and policies give better performance than approaches using conventional replacement policies. This improvement reaches up to 50% in some cases.
△ Less
Submitted 16 December, 2018;
originally announced December 2018.
-
Parallel Protein Community Detection in Large-scale PPI Networks Based on Multi-source Learning
Authors:
Jianguo Chen,
Kenli Li,
Kashif Bilal,
Ahmed A. Metwally,
Keqin Li,
Philip S. Yu
Abstract:
Protein interactions constitute the fundamental building block of almost every life activity. Identifying protein communities from Protein-Protein Interaction (PPI) networks is essential to understand the principles of cellular organization and explore the causes of various diseases. It is critical to integrate multiple data resources to identify reliable protein communities that have biological s…
▽ More
Protein interactions constitute the fundamental building block of almost every life activity. Identifying protein communities from Protein-Protein Interaction (PPI) networks is essential to understand the principles of cellular organization and explore the causes of various diseases. It is critical to integrate multiple data resources to identify reliable protein communities that have biological significance and improve the performance of community detection methods for large-scale PPI networks. In this paper, we propose a Multi-source Learning based Protein Community Detection (MLPCD) algorithm by integrating Gene Expression Data (GED) and a parallel solution of MLPCD using cloud computing technology. To effectively discover the biological functions of proteins that participating in different cellular processes, GED under different conditions is integrated with the original PPI network to reconstruct a Weighted-PPI (WPPI) network. To flexibly identify protein communities of different scales, we define community modularity and functional cohesion measurements and detect protein communities from WPPI using an agglomerative method. In addition, we respectively compare the detected communities with known protein complexes and evaluate the functional enrichment of protein function modules using Gene Ontology annotations. Moreover, we implement a parallel version of the MLPCD algorithm on the Apache Spark platform to enhance the performance of the algorithm for large-scale realistic PPI networks. Extensive experimental results indicate the superiority and notable advantages of the MLPCD algorithm over the relevant algorithms in terms of accuracy and performance.
△ Less
Submitted 17 October, 2018;
originally announced November 2018.
-
A Parallel Patient Treatment Time Prediction Algorithm and its Applications in Hospital Queuing-Recommendation in a Big Data Environment
Authors:
Jianguo Chen,
Kenli Li,
Zhuo Tang,
Kashif Bilal,
Keqin Li
Abstract:
Effective patient queue management to minimize patient wait delays and patient overcrowding is one of the major challenges faced by hospitals. Unnecessary and annoying waits for long periods result in substantial human resource and time wastage and increase the frustration endured by patients. For each patient in the queue, the total treatment time of all patients before him is the time that he mu…
▽ More
Effective patient queue management to minimize patient wait delays and patient overcrowding is one of the major challenges faced by hospitals. Unnecessary and annoying waits for long periods result in substantial human resource and time wastage and increase the frustration endured by patients. For each patient in the queue, the total treatment time of all patients before him is the time that he must wait. It would be convenient and preferable if the patients could receive the most efficient treatment plan and know the predicted waiting time through a mobile application that updates in real-time. Therefore, we propose a Patient Treatment Time Prediction (PTTP) algorithm to predict the waiting time for each treatment task for a patient. We use realistic patient data from various hospitals to obtain a patient treatment time model for each task. Based on this large-scale, realistic dataset, the treatment time for each patient in the current queue of each task is predicted. Based on the predicted waiting time, a Hospital Queuing-Recommendation (HQR) system is developed. HQR calculates and predicts an efficient and convenient treatment plan recommended for the patient. Because of the large-scale, realistic dataset and the requirement for real-time response, the PTTP algorithm and HQR system mandate efficiency and low-latency response. We use an Apache Spark-based cloud implementation at the National Supercomputing Center in Changsha (NSCC) to achieve the aforementioned goals. Extensive experimentation and simulation results demonstrate the effectiveness and applicability of our proposed model to recommend an effective treatment plan for patients to minimize their wait times in hospitals.
△ Less
Submitted 17 October, 2018;
originally announced November 2018.
-
A Periodicity-based Parallel Time Series Prediction Algorithm in Cloud Computing Environments
Authors:
Jianguo Chen,
Kenli Li,
Huigui Rong,
Kashif Bilal,
Keqin Li,
Philip S. Yu
Abstract:
In the era of big data, practical applications in various domains continually generate large-scale time-series data. Among them, some data show significant or potential periodicity characteristics, such as meteorological and financial data. It is critical to efficiently identify the potential periodic patterns from massive time-series data and provide accurate predictions. In this paper, a Periodi…
▽ More
In the era of big data, practical applications in various domains continually generate large-scale time-series data. Among them, some data show significant or potential periodicity characteristics, such as meteorological and financial data. It is critical to efficiently identify the potential periodic patterns from massive time-series data and provide accurate predictions. In this paper, a Periodicity-based Parallel Time Series Prediction (PPTSP) algorithm for large-scale time-series data is proposed and implemented in the Apache Spark cloud computing environment. To effectively handle the massive historical datasets, a Time Series Data Compression and Abstraction (TSDCA) algorithm is presented, which can reduce the data scale as well as accurately extracting the characteristics. Based on this, we propose a Multi-layer Time Series Periodic Pattern Recognition (MTSPPR) algorithm using the Fourier Spectrum Analysis (FSA) method. In addition, a Periodicity-based Time Series Prediction (PTSP) algorithm is proposed. Data in the subsequent period are predicted based on all previous period models, in which a time attenuation factor is introduced to control the impact of different periods on the prediction results. Moreover, to improve the performance of the proposed algorithms, we propose a parallel solution on the Apache Spark platform, using the Streaming real-time computing module. To efficiently process the large-scale time-series datasets in distributed computing environments, Distributed Streams (DStreams) and Resilient Distributed Datasets (RDDs) are used to store and calculate these datasets. Extensive experimental results show that our PPTSP algorithm has significant advantages compared with other algorithms in terms of prediction accuracy and performance.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
A Disease Diagnosis and Treatment Recommendation System Based on Big Data Mining and Cloud Computing
Authors:
Jianguo Chen,
Kenli Li,
Huigui Rong,
Kashif Bilal,
Nan Yang,
Keqin Li
Abstract:
It is crucial to provide compatible treatment schemes for a disease according to various symptoms at different stages. However, most classification methods might be ineffective in accurately classifying a disease that holds the characteristics of multiple treatment stages, various symptoms, and multi-pathogenesis. Moreover, there are limited exchanges and cooperative actions in disease diagnoses a…
▽ More
It is crucial to provide compatible treatment schemes for a disease according to various symptoms at different stages. However, most classification methods might be ineffective in accurately classifying a disease that holds the characteristics of multiple treatment stages, various symptoms, and multi-pathogenesis. Moreover, there are limited exchanges and cooperative actions in disease diagnoses and treatments between different departments and hospitals. Thus, when new diseases occur with atypical symptoms, inexperienced doctors might have difficulty in identifying them promptly and accurately. Therefore, to maximize the utilization of the advanced medical technology of developed hospitals and the rich medical knowledge of experienced doctors, a Disease Diagnosis and Treatment Recommendation System (DDTRS) is proposed in this paper. First, to effectively identify disease symptoms more accurately, a Density-Peaked Clustering Analysis (DPCA) algorithm is introduced for disease-symptom clustering. In addition, association analyses on Disease-Diagnosis (D-D) rules and Disease-Treatment (D-T) rules are conducted by the Apriori algorithm separately. The appropriate diagnosis and treatment schemes are recommended for patients and inexperienced doctors, even if they are in a limited therapeutic environment. Moreover, to reach the goals of high performance and low latency response, we implement a parallel solution for DDTRS using the Apache Spark cloud platform. Extensive experimental results demonstrate that the proposed DDTRS realizes disease-symptom clustering effectively and derives disease treatment recommendations intelligently and accurately.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment
Authors:
Jianguo Chen,
Kenli Li,
Zhuo Tang,
Kashif Bilal,
Shui Yu,
Chuliang Weng,
Keqin Li
Abstract:
With the emergence of the big data age, the issue of how to obtain valuable knowledge from a dataset efficiently and accurately has attracted increasingly attention from both academia and industry. This paper presents a Parallel Random Forest (PRF) algorithm for big data on the Apache Spark platform. The PRF algorithm is optimized based on a hybrid approach combining data-parallel and task-paralle…
▽ More
With the emergence of the big data age, the issue of how to obtain valuable knowledge from a dataset efficiently and accurately has attracted increasingly attention from both academia and industry. This paper presents a Parallel Random Forest (PRF) algorithm for big data on the Apache Spark platform. The PRF algorithm is optimized based on a hybrid approach combining data-parallel and task-parallel optimization. From the perspective of data-parallel optimization, a vertical data-partitioning method is performed to reduce the data communication cost effectively, and a data-multiplexing method is performed is performed to allow the training dataset to be reused and diminish the volume of data. From the perspective of task-parallel optimization, a dual parallel approach is carried out in the training process of RF, and a task Directed Acyclic Graph (DAG) is created according to the parallel training process of PRF and the dependence of the Resilient Distributed Datasets (RDD) objects. Then, different task schedulers are invoked for the tasks in the DAG. Moreover, to improve the algorithm's accuracy for large, high-dimensional, and noisy data, we perform a dimension-reduction approach in the training process and a weighted voting approach in the prediction process prior to parallelization. Extensive experimental results indicate the superiority and notable advantages of the PRF algorithm over the relevant algorithms implemented by Spark MLlib and other studies in terms of the classification accuracy, performance, and scalability.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks
Authors:
Jianguo Chen,
Kenli Li,
Kashif Bilal,
Xu Zhou,
Keqin Li,
Philip S. Yu
Abstract:
Benefitting from large-scale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very time-consuming, where large amounts of training samples and iterative operations are required to obtain high-quality weight parameters. In this paper, we focus on the time-consumi…
▽ More
Benefitting from large-scale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very time-consuming, where large amounts of training samples and iterative operations are required to obtain high-quality weight parameters. In this paper, we focus on the time-consuming training process of large-scale CNNs and propose a Bi-layered Parallel Training (BPT-CNN) architecture in distributed computing environments. BPT-CNN consists of two main components: (a) an outer-layer parallel training for multiple CNN subnetworks on separate data subsets, and (b) an inner-layer parallel training for each subnetwork. In the outer-layer parallelism, we address critical issues of distributed and parallel computing, including data communication, synchronization, and workload balance. A heterogeneous-aware Incremental Data Partitioning and Allocation (IDPA) strategy is proposed, where large-scale training datasets are partitioned and allocated to the computing nodes in batches according to their computing power. To minimize the synchronization waiting during the global weight update process, an Asynchronous Global Weight Update (AGWU) strategy is proposed. In the inner-layer parallelism, we further accelerate the training process for each CNN subnetwork on each computer, where computation steps of convolutional layer and the local weight training are parallelized based on task-parallelism. We introduce task decomposition and scheduling strategies with the objectives of thread-level load balancing and minimum waiting time for critical paths. Extensive experimental results indicate that the proposed BPT-CNN effectively improves the training performance of CNNs while maintaining the accuracy.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.