Resource Usage Estimation of Data Stream Processing Workloads in Datacenter Clouds
Authors:
Alireza Khoshkbarforoushha,
Rajiv Ranjan,
Raj Gaire,
Prem P. Jayaraman,
John Hosking,
Ehsan Abbasnejad
Abstract:
Real-time computation of data streams over affordable virtualized infrastructure resources is an important form of data in motion processing architecture. However, processing such data streams while ensuring strict guarantees on quality of services is problematic due to: (i) uncertain stream arrival pattern; (ii) need of processing different types of continuous queries; and (iii) variable resource…
▽ More
Real-time computation of data streams over affordable virtualized infrastructure resources is an important form of data in motion processing architecture. However, processing such data streams while ensuring strict guarantees on quality of services is problematic due to: (i) uncertain stream arrival pattern; (ii) need of processing different types of continuous queries; and (iii) variable resource consumption behavior of continuous queries. Recent work has explored the use of statistical techniques for resource estimation of SQL queries and OLTP workloads. All these techniques approximate resource usage for each query as a single point value. However, in data stream processing workloads in which data flows through the graph of operators endlessly and poses performance and resource demand fluctuations, the single point resource estimation is inadequate. Because it is neither expressive enough nor does it capture the multi-modal nature of the target data. To this end, we present a novel technique which uses mixture density networks, a combined structure of neural networks and mixture models, to estimate the whole spectrum of resource usage as probability density functions. The proposed approach is a flexible and convenient means of modeling unknown distribution models. We have validated the models using both the linear road benchmark and the TPC-H, observing high accuracy under a number of error metrics: mean-square error, continuous ranked probability score, and negative log predictive density.
△ Less
Submitted 28 January, 2015;
originally announced January 2015.
Metrics for BPEL Process Reusability Analysis in a Workflow System
Authors:
A. Khoshkbarforoushha,
P. Jamshidi,
M. Fahmideh,
L. Wang,
R. Ranjan
Abstract:
This work proposes a quantitative metric to analyze potential reusability of a BPEL (Business Process Execution Language) Process. The approach is based on Description and Logic Mismatch Probability of a BPEL Process that will be reused within potential contexts. The mismatch probabilities have been consolidated to a metric formula for quantifying the probability of potential reuse of BPEL process…
▽ More
This work proposes a quantitative metric to analyze potential reusability of a BPEL (Business Process Execution Language) Process. The approach is based on Description and Logic Mismatch Probability of a BPEL Process that will be reused within potential contexts. The mismatch probabilities have been consolidated to a metric formula for quantifying the probability of potential reuse of BPEL processes. An initial empirical evaluation suggests that the proposed metric properly predict potential reusability of BPEL processes. According to the experiment, there exists a significant statistical correlation between results of the metric and the experts judgements. This indicates a predictive dependency between the proposed metric and potential reusability of BPEL processes as a measuring stick for this phenomena. If future studies ascertain these findings by replicating this experiment, the practical implications of such a metric are early detection of the design flaws and aiding architects to judge various design alternatives.
△ Less
Submitted 6 February, 2014;
originally announced May 2014.