Search | arXiv e-print repository

doi 10.1109/IC2E59103.2023.00030

Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

Authors: Dominik Scheinert, Fabian Casares, Morgan K. Geldenhuys, Kevin Styp-Rekowski, Odej Kao

Abstract: Stream processing has become a critical component in the architecture of modern applications. With the exponential growth of data generation from sources such as the Internet of Things, business intelligence, and telecommunications, real-time processing of unbounded data streams has become a necessity. DSP systems provide a solution to this challenge, offering high horizontal scalability, fault-to… ▽ More Stream processing has become a critical component in the architecture of modern applications. With the exponential growth of data generation from sources such as the Internet of Things, business intelligence, and telecommunications, real-time processing of unbounded data streams has become a necessity. DSP systems provide a solution to this challenge, offering high horizontal scalability, fault-tolerant execution, and the ability to process data streams from multiple sources in a single DSP job. Often enough though, data streams need to be enriched with extra information for correct processing, which introduces additional dependencies and potential bottlenecks. In this paper, we present an in-depth evaluation of data enrichment methods for DSP systems and identify the different use cases for stream processing in modern systems. Using a representative DSP system and conducting the evaluation in a realistic cloud environment, we found that outsourcing enrichment data to the DSP system can improve performance for specific use cases. However, this increased resource consumption highlights the need for stream processing solutions specifically designed for the performance-intensive workloads of cloud-based applications. △ Less

Submitted 23 November, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: 10 pages, 13 figures, 2 tables

Journal ref: IEEE IC2E (2023) 202-211

arXiv:2211.07619 [pdf, other]

Federated Learning for Autoencoder-based Condition Monitoring in the Industrial Internet of Things

Authors: Soeren Becker, Kevin Styp-Rekowski, Oliver Vincent Leon Stoll, Odej Kao

Abstract: Enabled by the increasing availability of sensor data monitored from production machinery, condition monitoring and predictive maintenance methods are key pillars for an efficient and robust manufacturing production cycle in the Industrial Internet of Things. The employment of machine learning models to detect and predict deteriorating behavior by analyzing a variety of data collected across sever… ▽ More Enabled by the increasing availability of sensor data monitored from production machinery, condition monitoring and predictive maintenance methods are key pillars for an efficient and robust manufacturing production cycle in the Industrial Internet of Things. The employment of machine learning models to detect and predict deteriorating behavior by analyzing a variety of data collected across several industrial environments shows promising results in recent works, yet also often requires transferring the sensor data to centralized servers located in the cloud. Moreover, although collaborating and sharing knowledge between industry sites yields large benefits, especially in the area of condition monitoring, it is often prohibited due to data privacy issues. To tackle this situation, we propose an Autoencoder-based Federated Learning method utilizing vibration sensor data from rotating machines, that allows for a distributed training on edge devices, located on-premise and close to the monitored machines. Preserving data privacy and at the same time exonerating possibly unreliable network connections of remote sites, our approach enables knowledge transfer across organizational boundaries, without sharing the monitored data. We conducted an evaluation utilizing two real-world datasets as well as multiple testbeds and the results indicate that our method enables a competitive performance compared to previous results, while significantly reducing the resource and network utilization. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Accepted for 2022 IEEE International Conference on Big Data (IEEE BigData 2022)

arXiv:2210.08897 [pdf, other]

doi 10.1109/ICDMW58026.2022.00142

Macaw: The Machine Learning Magnetometer Calibration Workflow

Authors: Jonathan Bader, Kevin Styp-Rekowski, Leon Doehler, Soeren Becker, Odej Kao

Abstract: In Earth Systems Science, many complex data pipelines combine different data sources and apply data filtering and analysis steps. Typically, such data analysis processes are historically grown and implemented with many sequentially executed scripts. Scientific workflow management systems (SWMS) allow scientists to use their existing scripts and provide support for parallelization, reusability, mon… ▽ More In Earth Systems Science, many complex data pipelines combine different data sources and apply data filtering and analysis steps. Typically, such data analysis processes are historically grown and implemented with many sequentially executed scripts. Scientific workflow management systems (SWMS) allow scientists to use their existing scripts and provide support for parallelization, reusability, monitoring, or failure handling. However, many scientists still rely on their sequentially called scripts and do not profit from the out-of-the-box advantages a SWMS can provide. In this work, we transform the data analysis processes of a Machine Learning-based approach to calibrate the platform magnetometers of non-dedicated satellites utilizing neural networks into a workflow called Macaw (MAgnetometer CAlibration Workflow). We provide details on the workflow and the steps needed to port these scripts to a scientific workflow. Our experimental evaluation compares the original sequential script executions on the original HPC cluster with our workflow implementation on a commodity cluster. Our results show that through porting, our implementation decreased the allocated CPU hours by 50.2% and the memory hours by 59.5%, leading to significantly less resource wastage. Further, through parallelizing single tasks, we reduced the runtime by 17.5%. △ Less

Submitted 18 July, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

Comments: Paper accepted in 2022 IEEE International Conference on Data Mining Workshops (ICDMW)

arXiv:2101.11539 [pdf, other]

Autoencoder-based Condition Monitoring and Anomaly Detection Method for Rotating Machines

Authors: Sabtain Ahmad, Kevin Styp-Rekowski, Sasho Nedelkoski, Odej Kao

Abstract: Rotating machines like engines, pumps, or turbines are ubiquitous in modern day societies. Their mechanical parts such as electrical engines, rotors, or bearings are the major components and any failure in them may result in their total shutdown. Anomaly detection in such critical systems is very important to monitor the system's health. As the requirement to obtain a dataset from rotating machine… ▽ More Rotating machines like engines, pumps, or turbines are ubiquitous in modern day societies. Their mechanical parts such as electrical engines, rotors, or bearings are the major components and any failure in them may result in their total shutdown. Anomaly detection in such critical systems is very important to monitor the system's health. As the requirement to obtain a dataset from rotating machines where all possible faults are explicitly labeled is difficult to satisfy, we propose a method that focuses on the normal behavior of the machine instead. We propose an autoencoder model-based method for condition monitoring of rotating machines by using an anomaly detection approach. The method learns the characteristics of a rotating machine using the normal vibration signals to model the healthy state of the machine. A threshold-based approach is then applied to the reconstruction error of unseen data, thus enabling the detection of unseen anomalies. The proposed method can directly extract the salient features from raw vibration signals and eliminate the need for manually engineered features. We demonstrate the effectiveness of the proposed method by employing two rotating machine datasets and the quality of the automatically learned features is compared with a set of handcrafted features by training an Isolation Forest model on either of these two sets. Experimental results on two real-world datasets indicate that our proposed solution gives promising results, achieving an average F1-score of 99.6%. △ Less

Submitted 27 January, 2021; originally announced January 2021.

arXiv:2101.10037 [pdf, other]

Optimizing Convergence for Iterative Learning of ARIMA for Stationary Time Series

Authors: Kevin Styp-Rekowski, Florian Schmidt, Odej Kao

Abstract: Forecasting of time series in continuous systems becomes an increasingly relevant task due to recent developments in IoT and 5G. The popular forecasting model ARIMA is applied to a large variety of applications for decades. An online variant of ARIMA applies the Online Newton Step in order to learn the underlying process of the time series. This optimization method has pitfalls concerning the comp… ▽ More Forecasting of time series in continuous systems becomes an increasingly relevant task due to recent developments in IoT and 5G. The popular forecasting model ARIMA is applied to a large variety of applications for decades. An online variant of ARIMA applies the Online Newton Step in order to learn the underlying process of the time series. This optimization method has pitfalls concerning the computational complexity and convergence. Thus, this work focuses on the computational less expensive Online Gradient Descent optimization method, which became popular for learning of neural networks in recent years. For the iterative training of such models, we propose a new approach combining different Online Gradient Descent learners (such as Adam, AMSGrad, Adagrad, Nesterov) to achieve fast convergence. The evaluation on synthetic data and experimental datasets show that the proposed approach outperforms the existing methods resulting in an overall lower prediction error. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Showing 1–5 of 5 results for author: Styp-Rekowski, K