Search | arXiv e-print repository

doi 10.1145/3603166.3632540

Scheduling of Distributed Applications on the Computing Continuum: A Survey

Authors: Narges Mehran, Dragi Kimovski, Hermann Hellwagner, Dumitru Roman, Ahmet Soylu, Radu Prodan

Abstract: The demand for distributed applications has significantly increased over the past decade, with improvements in machine learning techniques fueling this growth. These applications predominantly utilize Cloud data centers for high-performance computing and Fog and Edge devices for low-latency communication for small-size machine learning model training and inference. The challenge of executing appli… ▽ More The demand for distributed applications has significantly increased over the past decade, with improvements in machine learning techniques fueling this growth. These applications predominantly utilize Cloud data centers for high-performance computing and Fog and Edge devices for low-latency communication for small-size machine learning model training and inference. The challenge of executing applications with different requirements on heterogeneous devices requires effective methods for solving NP-hard resource allocation and application scheduling problems. The state-of-the-art techniques primarily investigate conflicting objectives, such as the completion time, energy consumption, and economic cost of application execution on the Cloud, Fog, and Edge computing infrastructure. Therefore, in this work, we review these research works considering their objectives, methods, and evaluation tools. Based on the review, we provide a discussion on the scheduling methods in the Computing Continuum. △ Less

Submitted 20 January, 2024; originally announced May 2024.

Comments: 7 pages, 3 figures, 3 tables

arXiv:2401.03319 [pdf, other]

Comparison of Microservice Call Rate Predictions for Replication in the Cloud

Authors: Narges Mehran, Arman Haghighi, Pedram Aminharati, Nikolay Nikolov, Ahmet Soylu, Dumitru Roman, Radu Prodan

Abstract: Today, many users deploy their microservice-based applications with various interconnections on a cluster of Cloud machines, subject to stochastic changes due to dynamic user requirements. To address this problem, we compare three machine learning (ML) models for predicting the microservice call rates based on the microservice times and aiming at estimating the scalability requirements. We apply t… ▽ More Today, many users deploy their microservice-based applications with various interconnections on a cluster of Cloud machines, subject to stochastic changes due to dynamic user requirements. To address this problem, we compare three machine learning (ML) models for predicting the microservice call rates based on the microservice times and aiming at estimating the scalability requirements. We apply the linear regression (LR), multilayer perception (MLP), and gradient boosting regression (GBR) models on the Alibaba microservice traces. The prediction results reveal that the LR model reaches a lower training time than the GBR and MLP models. However, the GBR reduces the mean absolute error and the mean absolute percentage error compared to LR and MLP models. Moreover, the prediction results show that the required number of replicas for each microservice by the gradient boosting model is close to the actual test data without any prediction. △ Less

Submitted 29 October, 2023; originally announced January 2024.

Comments: 7 pages, 5 figures, 4 tables

arXiv:2308.01094 [pdf, other]

Scaling Data Science Solutions with Semantics and Machine Learning: Bosch Case

Authors: Baifan Zhou, Nikolay Nikolov, Zhuoxun Zheng, Xianghui Luo, Ognjen Savkovic, Dumitru Roman, Ahmet Soylu, Evgeny Kharlamov

Abstract: Industry 4.0 and Internet of Things (IoT) technologies unlock unprecedented amount of data from factory production, posing big data challenges in volume and variety. In that context, distributed computing solutions such as cloud systems are leveraged to parallelise the data processing and reduce computation time. As the cloud systems become increasingly popular, there is increased demand that more… ▽ More Industry 4.0 and Internet of Things (IoT) technologies unlock unprecedented amount of data from factory production, posing big data challenges in volume and variety. In that context, distributed computing solutions such as cloud systems are leveraged to parallelise the data processing and reduce computation time. As the cloud systems become increasingly popular, there is increased demand that more users that were originally not cloud experts (such as data scientists, domain experts) deploy their solutions on the cloud systems. However, it is non-trivial to address both the high demand for cloud system users and the excessive time required to train them. To this end, we propose SemCloud, a semantics-enhanced cloud system, that couples cloud system with semantic technologies and machine learning. SemCloud relies on domain ontologies and map**s for data integration, and parallelises the semantic data integration and data analysis on distributed computing nodes. Furthermore, SemCloud adopts adaptive Datalog rules and machine learning for automated resource configuration, allowing non-cloud experts to use the cloud system. The system has been evaluated in industrial use case with millions of data, thousands of repeated runs, and domain users, showing promising results. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: Paper accepted at ISWC2023 In-Use track

arXiv:2305.17951 [pdf, other]

doi 10.1109/COMPSAC57700.2023.00038

ContrastNER: Contrastive-based Prompt Tuning for Few-shot NER

Authors: Amirhossein Layegh, Amir H. Payberah, Ahmet Soylu, Dumitru Roman, Mihhail Matskin

Abstract: Prompt-based language models have produced encouraging results in numerous applications, including Named Entity Recognition (NER) tasks. NER aims to identify entities in a sentence and provide their types. However, the strong performance of most available NER approaches is heavily dependent on the design of discrete prompts and a verbalizer to map the model-predicted outputs to entity categories,… ▽ More Prompt-based language models have produced encouraging results in numerous applications, including Named Entity Recognition (NER) tasks. NER aims to identify entities in a sentence and provide their types. However, the strong performance of most available NER approaches is heavily dependent on the design of discrete prompts and a verbalizer to map the model-predicted outputs to entity categories, which are complicated undertakings. To address these challenges, we present ContrastNER, a prompt-based NER framework that employs both discrete and continuous tokens in prompts and uses a contrastive learning approach to learn the continuous prompts and forecast entity types. The experimental results demonstrate that ContrastNER obtains competitive performance to the state-of-the-art NER methods in high-resource settings and outperforms the state-of-the-art models in low-resource circumstances without requiring extensive manual prompt engineering and verbalizer design. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 9 pages, 5 figures, COMPSAC2023

Journal ref: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)

arXiv:2203.12222 [pdf]

The Harmony Index: a Utilitarian Metric for Measuring Effectiveness in Mixed-Skill Teams

Authors: Darryl Roman, Noah Ari, Johnathan Mell

Abstract: As teamwork becomes ever-more important in a new age of remote work, it is critical to develop metrics to quantitatively evaluate how effective teams are. This is especially true with mixed-modality teams, such as those that include a human and an agent or human and robot. We propose a novel utilitarian metric, the Harmony Index, which quantifies the effectiveness of team members by classifying th… ▽ More As teamwork becomes ever-more important in a new age of remote work, it is critical to develop metrics to quantitatively evaluate how effective teams are. This is especially true with mixed-modality teams, such as those that include a human and an agent or human and robot. We propose a novel utilitarian metric, the Harmony Index, which quantifies the effectiveness of team members by classifying them into four sub-types based on the result of their teaming on overall effectiveness. This index is evaluated using a real-world dataset of over 1 million interactions, and potential future uses of this index are explored in the realm of team science. △ Less

Submitted 30 March, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

Comments: 6 pages. 5 figures. Appears in Proceedings of the AAAI SSS-22 Symposium `Closing the Assessment Loop: Communicating Proficiency and Intent in Human-Robot Teaming`

arXiv:2102.00837 [pdf, other]

Machine learning pipeline for battery state of health estimation

Authors: Darius Roman, Saurabh Saxena, Valentin Robu, Michael Pecht, David Flynn

Abstract: Lithium-ion batteries are ubiquitous in modern day applications ranging from portable electronics to electric vehicles. Irrespective of the application, reliable real-time estimation of battery state of health (SOH) by on-board computers is crucial to the safe operation of the battery, ultimately safeguarding asset integrity. In this paper, we design and evaluate a machine learning pipeline for es… ▽ More Lithium-ion batteries are ubiquitous in modern day applications ranging from portable electronics to electric vehicles. Irrespective of the application, reliable real-time estimation of battery state of health (SOH) by on-board computers is crucial to the safe operation of the battery, ultimately safeguarding asset integrity. In this paper, we design and evaluate a machine learning pipeline for estimation of battery capacity fade - a metric of battery health - on 179 cells cycled under various conditions. The pipeline estimates battery SOH with an associated confidence interval by using two parametric and two non-parametric algorithms. Using segments of charge voltage and current curves, the pipeline engineers 30 features, performs automatic feature selection and calibrates the algorithms. When deployed on cells operated under the fast-charging protocol, the best model achieves a root mean squared percent error of 0.45\%. This work provides insights into the design of scalable data-driven models for battery SOH estimation, emphasising the value of confidence bounds around the prediction. The pipeline methodology combines experimental data with machine learning modelling and can be generalized to other critical components that require real-time estimation of SOH. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: Peer review, pre-print to be published in Nature Machine Intelligence - 32 pages and 24 figures (including supplementary material)

ACM Class: C.4; I.5.1; I.2.6

arXiv:1606.07256 [pdf, ps, other]

Saliency Driven Object recognition in egocentric videos with deep CNN

Authors: Philippe Pérez de San Roman, Jenny Benois-Pineau, Jean-Philippe Domenger, Florent Paclet, Daniel Cataert, Aymar de Rugy

Abstract: The problem of object recognition in natural scenes has been recently successfully addressed with Deep Convolutional Neuronal Networks giving a significant break-through in recognition scores. The computational efficiency of Deep CNNs as a function of their depth, allows for their use in real-time applications. One of the key issues here is to reduce the number of windows selected from images to b… ▽ More The problem of object recognition in natural scenes has been recently successfully addressed with Deep Convolutional Neuronal Networks giving a significant break-through in recognition scores. The computational efficiency of Deep CNNs as a function of their depth, allows for their use in real-time applications. One of the key issues here is to reduce the number of windows selected from images to be submitted to a Deep CNN. This is usually solved by preliminary segmentation and selection of specific windows, having outstanding "objectiveness" or other value of indicators of possible location of objects. In this paper we propose a Deep CNN approach and the general framework for recognition of objects in a real-time scenario and in an egocentric perspective. Here the window of interest is built on the basis of visual attention map computed over gaze fixations measured by a glass-worn eye-tracker. The application of this set-up is an interactive user-friendly environment for upper-limb amputees. Vision has to help the subject to control his worn neuro-prosthesis in case of a small amount of remaining muscles when the EMG control becomes unefficient. The recognition results on a specifically recorded corpus of 151 videos with simple geometrical objects show the mAP of 64,6\% and the computational time at the generalization lower than a time of a visual fixation on the object-of-interest. △ Less

Submitted 23 June, 2016; originally announced June 2016.

Comments: 20 pages, 8 figures, 3 tables, Submitted to the Journal of Computer Vision and Image Understanding

arXiv:1509.03045 [pdf, other]

Empirical Big Data Research: A Systematic Literature Map**

Authors: Bjørn Magnus Mathisen, Leendert Wienhofen, Dumitru Roman

Abstract: Background: Big Data is a relatively new field of research and technology, and literature reports a wide variety of concepts labeled with Big Data. The maturity of a research field can be measured in the number of publications containing empirical results. In this paper we present the current status of empirical research in Big Data. Method: We employed a systematic map** method with which we ma… ▽ More Background: Big Data is a relatively new field of research and technology, and literature reports a wide variety of concepts labeled with Big Data. The maturity of a research field can be measured in the number of publications containing empirical results. In this paper we present the current status of empirical research in Big Data. Method: We employed a systematic map** method with which we mapped the collected research according to the labels Variety, Volume and Velocity. In addition, we addressed the application areas of Big Data. Results: We found that 151 of the assessed 1778 contributions contain a form of empirical result and can be mapped to one or more of the 3 V's and 59 address an application area. Conclusions: The share of publications containing empirical results is well below the average compared to computer science research as a whole. In order to mature the research on Big Data, we recommend applying empirical methods to strengthen the confidence in the reported results. Based on our trend analysis we consider Volume and Variety to be the most promising uncharted area in Big Data. △ Less

Submitted 12 October, 2016; v1 submitted 10 September, 2015; originally announced September 2015.

Comments: Submitted to Springer journal Data Science and Engineering

Showing 1–8 of 8 results for author: Roman, D