-
Matrix Factorization for Cache Optimization in Content Delivery Networks (CDN)
Authors:
Adolf Kamuzora,
Wadie Skaf,
Ermiyas Birihanu,
Jiyan Mahmud,
Péter Kiss,
Tamás Jursonovics,
Peter Pogrzeba,
Imre Lendák,
Tomáš Horváth
Abstract:
Content delivery networks (CDNs) are key components of high throughput, low latency services on the internet. CDN cache servers have limited storage and bandwidth and implement state-of-the-art cache admission and eviction algorithms to select the most popular and relevant content for the customers served. The aim of this study was to utilize state-of-the-art recommender system techniques for pred…
▽ More
Content delivery networks (CDNs) are key components of high throughput, low latency services on the internet. CDN cache servers have limited storage and bandwidth and implement state-of-the-art cache admission and eviction algorithms to select the most popular and relevant content for the customers served. The aim of this study was to utilize state-of-the-art recommender system techniques for predicting ratings for cache content in CDN. Matrix factorization was used in predicting content popularity which is valuable information in content eviction and content admission algorithms run on CDN edge servers. A custom implemented matrix factorization class and MyMediaLite were utilized. The input CDN logs were received from a European telecommunication service provider. We built a matrix factorization model with that data and utilized grid search to tune its hyper-parameters. Experimental results indicate that there is promise about the proposed approaches and we showed that a low root mean square error value can be achieved on the real-life CDN log data.
△ Less
Submitted 5 October, 2022;
originally announced November 2022.
-
Client Error Clustering Approaches in Content Delivery Networks (CDN)
Authors:
Ermiyas Birihanu,
Jiyan Mahmud,
Péter Kiss,
Adolf Kamuzora,
Wadie Skaf,
Tomáš Horváth,
Tamás Jursonovics,
Peter Pogrzeba,
Imre Lendák
Abstract:
Content delivery networks (CDNs) are the backbone of the Internet and are key in delivering high quality video on demand (VoD), web content and file services to billions of users. CDNs usually consist of hierarchically organized content servers positioned as close to the customers as possible. CDN operators face a significant challenge when analyzing billions of web server and proxy logs generated…
▽ More
Content delivery networks (CDNs) are the backbone of the Internet and are key in delivering high quality video on demand (VoD), web content and file services to billions of users. CDNs usually consist of hierarchically organized content servers positioned as close to the customers as possible. CDN operators face a significant challenge when analyzing billions of web server and proxy logs generated by their systems. The main objective of this study was to analyze the applicability of various clustering methods in CDN error log analysis. We worked with real-life CDN proxy logs, identified key features included in the logs (e.g., content type, HTTP status code, time-of-day, host) and clustered the log lines corresponding to different host types offering live TV, video on demand, file caching and web content. Our experiments were run on a dataset consisting of proxy logs collected over a 7-day period from a single, physical CDN server running multiple types of services (VoD, live TV, file). The dataset consisted of 2.2 billion log lines. Our analysis showed that CDN error clustering is a viable approach towards identifying recurring errors and improving overall quality of service.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Towards Automatic Forecasting: Evaluation of Time-Series Forecasting Models for Chickenpox Cases Estimation in Hungary
Authors:
Wadie Skaf,
Arzu Tosayeva,
Dániel T. Várkonyi
Abstract:
Time-Series Forecasting is a powerful data modeling discipline that analyzes historical observations to predict future values of a time-series. It has been utilized in numerous applications, including but not limited to economics, meteorology, and health. In this paper, we use time-series forecasting techniques to model and predict the future incidence of chickenpox. To achieve this, we implement…
▽ More
Time-Series Forecasting is a powerful data modeling discipline that analyzes historical observations to predict future values of a time-series. It has been utilized in numerous applications, including but not limited to economics, meteorology, and health. In this paper, we use time-series forecasting techniques to model and predict the future incidence of chickenpox. To achieve this, we implement and simulate multiple models and data preprocessing techniques on a Hungary-collected dataset. We demonstrate that the LSTM model outperforms all other models in the vast majority of the experiments in terms of county-level forecasting, whereas the SARIMAX model performs best at the national level. We also demonstrate that the performance of the traditional data preprocessing method is inferior to that of the data preprocessing method that we have proposed.
△ Less
Submitted 4 October, 2022; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Denoising Architecture for Unsupervised Anomaly Detection in Time-Series
Authors:
Wadie Skaf,
Tomáš Horváth
Abstract:
Anomalies in time-series provide insights of critical scenarios across a range of industries, from banking and aerospace to information technology, security, and medicine. However, identifying anomalies in time-series data is particularly challenging due to the imprecise definition of anomalies, the frequent absence of labels, and the enormously complex temporal correlations present in such data.…
▽ More
Anomalies in time-series provide insights of critical scenarios across a range of industries, from banking and aerospace to information technology, security, and medicine. However, identifying anomalies in time-series data is particularly challenging due to the imprecise definition of anomalies, the frequent absence of labels, and the enormously complex temporal correlations present in such data. The LSTM Autoencoder is an Encoder-Decoder scheme for Anomaly Detection based on Long Short Term Memory Networks that learns to reconstruct time-series behavior and then uses reconstruction error to identify abnormalities. We introduce the Denoising Architecture as a complement to this LSTM Encoder-Decoder model and investigate its effect on real-world as well as artificially generated datasets. We demonstrate that the proposed architecture increases both the accuracy and the training speed, thereby, making the LSTM Autoencoder more efficient for unsupervised anomaly detection tasks.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.