Search | arXiv e-print repository

Carbon-Aware End-to-End Data Movement

Authors: Jacob Goldverg, Hasibul Jamil, Elvis Rodriguez, Tevfik Kosar

Abstract: The latest trends in the adoption of cloud, edge, and distributed computing, as well as a rise in applying AI/ML workloads, have created a need to measure, monitor, and reduce the carbon emissions of these compute-intensive workloads and the associated communication costs. The data movement over networks has considerable carbon emission that has been neglected due to the difficulty in measuring th… ▽ More The latest trends in the adoption of cloud, edge, and distributed computing, as well as a rise in applying AI/ML workloads, have created a need to measure, monitor, and reduce the carbon emissions of these compute-intensive workloads and the associated communication costs. The data movement over networks has considerable carbon emission that has been neglected due to the difficulty in measuring the carbon footprint of a given end-to-end network path. We present a novel network carbon footprint measuring mechanism and propose three ways in which users can optimize scheduling network-intensive tasks to enable carbon savings through shifting tasks in time, space, and overlay networks based on the geographic carbon intensity. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2403.09445 [pdf, other]

Benchmarking Distributed Coordination Systems: A Survey and Analysis

Authors: Bekir Turkkan, Tevfik Kosar, Aleksey Charapko, Ailidani Ailijiang, Murat Demirbas

Abstract: Coordination services and protocols are critical components of distributed systems and are essential for providing consistency, fault tolerance, and scalability. However, due to lack of a standard benchmarking tool for distributed coordination services, coordination service developers/researchers either use a NoSQL standard benchmark and omit evaluating consistency, distribution, and fault-toleran… ▽ More Coordination services and protocols are critical components of distributed systems and are essential for providing consistency, fault tolerance, and scalability. However, due to lack of a standard benchmarking tool for distributed coordination services, coordination service developers/researchers either use a NoSQL standard benchmark and omit evaluating consistency, distribution, and fault-tolerance; or create their own ad-hoc microbenchmarks and skip comparability with other services. In this paper, we analyze and compare known and widely used distributed coordination services, their evaluations, and the tools used to benchmark those systems. We identify important requirements of distributed coordination service benchmarking, like the metrics and parameters that need to be evaluated and their evaluation setups and tools. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2402.09392 [pdf, other]

LL-GABR: Energy Efficient Live Video Streaming Using Reinforcement Learning

Authors: Adithya Raman, Bekir Turkkan, Tevfik Kosar

Abstract: Over the recent years, research and development in adaptive bitrate (ABR) algorithms for live video streaming have been successful in improving users' quality of experience (QoE) by reducing latency to near real-time levels while delivering higher bitrate videos with minimal rebuffering time. However, the QoE models used by these ABR algorithms do not take into account that a large portion of live… ▽ More Over the recent years, research and development in adaptive bitrate (ABR) algorithms for live video streaming have been successful in improving users' quality of experience (QoE) by reducing latency to near real-time levels while delivering higher bitrate videos with minimal rebuffering time. However, the QoE models used by these ABR algorithms do not take into account that a large portion of live video streaming clients use mobile devices where a higher bitrate does not necessarily translate into higher perceived quality. Ignoring perceived quality results in playing videos at higher bitrates without a significant increase in perceptual video quality and becomes a burden for battery-constrained mobile devices due to higher energy consumption. In this paper, we propose LL-GABR, a deep reinforcement learning approach that models the QoE using perceived video quality instead of bitrate and uses energy consumption along with other metrics like latency, rebuffering events, and smoothness. LL-GABR makes no assumptions about the underlying video, environment, or network settings and can operate flexibly on different video titles, each having a different bitrate encoding ladder without additional re-training, unlike existing learning-based ABRs. Trace-driven experimental results show that LL-GABR outperforms the state-of-the-art approaches by up to 44% in terms of perceptual QoE and a 73% increase in energy efficiency as a result of reducing net energy consumption by 11%. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 10 pages, 3 figures, 3 Tables

arXiv:2310.14449 [pdf, other]

doi 10.1145/3571697.3571704

Qualitative analysis of the relationship between design smells and software engineering challenges

Authors: Asif Imran, Tevfik Kosar

Abstract: Software design debt aims to elucidate the rectification attempts of the present design flaws and studies the influence of those to the cost and time of the software. Design smells are a key cause of incurring design debt. Although the impact of design smells on design debt have been predominantly considered in current literature, how design smells are caused due to not following software engineer… ▽ More Software design debt aims to elucidate the rectification attempts of the present design flaws and studies the influence of those to the cost and time of the software. Design smells are a key cause of incurring design debt. Although the impact of design smells on design debt have been predominantly considered in current literature, how design smells are caused due to not following software engineering best practices require more exploration. This research provides a tool which is used for design smell detection in Java software by analyzing large volume of source codes. More specifically, 409,539 Lines of Code (LoC) and 17,760 class files of open source Java software are analyzed here. Obtained results show desirable precision values ranging from 81.01\% to 93.43\%. Based on the output of the tool, a study is conducted to relate the cause of the detected design smells to two software engineering challenges namely "irregular team meetings" and "scope creep". As a result, the gained information will provide insight to the software engineers to take necessary steps of design remediation actions. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:1910.05428

Journal ref: 2022 The 3rd European Symposium on Software Engineering (ESSE 2022)

arXiv:2310.14444 [pdf, other]

URegM: a unified prediction model of resource consumption for refactoring software smells in open source cloud

Authors: Asif Imran, Tevfik Kosar

Abstract: The low cost and rapid provisioning capabilities have made the cloud a desirable platform to launch complex scientific applications. However, resource utilization optimization is a significant challenge for cloud service providers, since the earlier focus is provided on optimizing resources for the applications that run on the cloud, with a low emphasis being provided on optimizing resource utiliz… ▽ More The low cost and rapid provisioning capabilities have made the cloud a desirable platform to launch complex scientific applications. However, resource utilization optimization is a significant challenge for cloud service providers, since the earlier focus is provided on optimizing resources for the applications that run on the cloud, with a low emphasis being provided on optimizing resource utilization of the cloud computing internal processes. Code refactoring has been associated with improving the maintenance and understanding of software code. However, analyzing the impact of the refactoring source code of the cloud and studying its impact on cloud resource usage require further analysis. In this paper, we propose a framework called Unified Regression Modelling (URegM) which predicts the impact of code smell refactoring on cloud resource usage. We test our experiments in a real-life cloud environment using a complex scientific application as a workload. Results show that URegM is capable of accurately predicting resource consumption due to code smell refactoring. This will permit cloud service providers with advanced knowledge about the impact of refactoring code smells on resource consumption, thus allowing them to plan their resource provisioning and code refactoring more effectively. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Journal ref: 2022 The 3rd European Symposium on Software Engineering (ESSE 2022)

arXiv:2310.11406 [pdf, other]

GreenNFV: Energy-Efficient Network Function Virtualization with Service Level Agreement Constraints

Authors: MD S Q Zulkar Nine, Tevfik Kosar, Fatih Bulut, **ho Hwang

Abstract: Network Function Virtualization (NFV) platforms consume significant energy, introducing high operational costs in edge and data centers. This paper presents a novel framework called GreenNFV that optimizes resource usage for network function chains using deep reinforcement learning. GreenNFV optimizes resource parameters such as CPU sharing ratio, CPU frequency scaling, last-level cache (LLC) allo… ▽ More Network Function Virtualization (NFV) platforms consume significant energy, introducing high operational costs in edge and data centers. This paper presents a novel framework called GreenNFV that optimizes resource usage for network function chains using deep reinforcement learning. GreenNFV optimizes resource parameters such as CPU sharing ratio, CPU frequency scaling, last-level cache (LLC) allocation, DMA buffer size, and packet batch size. GreenNFV learns the resource scheduling model from the benchmark experiments and takes Service Level Agreements (SLAs) into account to optimize resource usage models based on the different throughput and energy consumption requirements. Our evaluation shows that GreenNFV models achieve high transfer throughput and low energy consumption while satisfying various SLA constraints. Specifically, GreenNFV with Throughput SLA can achieve $4.4\times$ higher throughput and $1.5\times$ better energy efficiency over the baseline settings, whereas GreenNFV with Energy SLA can achieve $3\times$ higher throughput while reducing energy consumption by 50%. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2307.12146 [pdf, other]

CloudScent: a model for code smell analysis in open-source cloud

Authors: Raj Narendra Shah, Sameer Ahmed Mohamed, Asif Imran, Tevfik Kosar

Abstract: The low cost and rapid provisioning capabilities have made open-source cloud a desirable platform to launch industrial applications. However, as open-source cloud moves towards maturity, it still suffers from quality issues like code smells. Although, a great emphasis has been provided on the economic benefits of deploying open-source cloud, low importance has been provided to improve the quality… ▽ More The low cost and rapid provisioning capabilities have made open-source cloud a desirable platform to launch industrial applications. However, as open-source cloud moves towards maturity, it still suffers from quality issues like code smells. Although, a great emphasis has been provided on the economic benefits of deploying open-source cloud, low importance has been provided to improve the quality of the source code of the cloud itself to ensure its maintainability in the industrial scenario. Code refactoring has been associated with improving the maintenance and understanding of software code by removing code smells. However, analyzing what smells are more prevalent in cloud environment and designing a tool to define and detect those smells require further attention. In this paper, we propose a model called CloudScent which is an open source mechanism to detect smells in open-source cloud. We test our experiments in a real-life cloud environment using OpenStack. Results show that CloudScent is capable of accurately detecting 8 code smells in cloud. This will permit cloud service providers with advanced knowledge about the smells prevalent in open-source cloud platform, thus allowing for timely code refactoring and improving code quality of the cloud platforms. △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2306.15763 [pdf, other]

Predicting the Impact of Batch Refactoring Code Smells on Application Resource Consumption

Authors: Asif Imran, Tevfik Kosar, Jaroslaw Zola, Muhammed Fatih Bulut

Abstract: Automated batch refactoring has become a de-facto mechanism to restructure software that may have significant design flaws negatively impacting the code quality and maintainability. Although automated batch refactoring techniques are known to significantly improve overall software quality and maintainability, their impact on resource utilization is not well studied. This paper aims to bridge the g… ▽ More Automated batch refactoring has become a de-facto mechanism to restructure software that may have significant design flaws negatively impacting the code quality and maintainability. Although automated batch refactoring techniques are known to significantly improve overall software quality and maintainability, their impact on resource utilization is not well studied. This paper aims to bridge the gap between batch refactoring code smells and consumption of resources. It determines the relationship between software code smell batch refactoring, and resource consumption. Next, it aims to design algorithms to predict the impact of code smell refactoring on resource consumption. This paper investigates 16 code smell types and their joint effect on resource utilization for 31 open source applications. It provides a detailed empirical analysis of the change in application CPU and memory utilization after refactoring specific code smells in isolation and in batches. This analysis is then used to train regression algorithms to predict the impact of batch refactoring on CPU and memory utilization before making any refactoring decisions. Experimental results also show that our ANN-based regression model provides highly accurate predictions for the impact of batch refactoring on resource consumption. It allows the software developers to intelligently decide which code smells they should refactor jointly to achieve high code quality and maintainability without increasing the application resource utilization. This paper responds to the important and urgent need of software engineers across a broad range of software applications, who are looking to refactor code smells and at the same time improve resource consumption. Finally, it brings forward the concept of resource aware code smell refactoring to the most crucial software applications. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2211.11949 [pdf, other]

A Reinforcement Learning Approach to Optimize Available Network Bandwidth Utilization

Authors: Hasibul Jamil, Elvis Rodrigues, Jacob Goldverg, Tevfik Kosar

Abstract: Efficient data transfers over high-speed, long-distance shared networks require proper utilization of available network bandwidth. Using parallel TCP streams enables an application to utilize network parallelism and can improve transfer throughput; however, finding the optimum number of parallel TCP streams is challenging due to nondeterministic background traffic sharing the same network. Additio… ▽ More Efficient data transfers over high-speed, long-distance shared networks require proper utilization of available network bandwidth. Using parallel TCP streams enables an application to utilize network parallelism and can improve transfer throughput; however, finding the optimum number of parallel TCP streams is challenging due to nondeterministic background traffic sharing the same network. Additionally, the non-stationary, multi-objectiveness, and partially-observable nature of network signals in the host systems add extra complexity in finding the current network condition. In this work, we present a novel approach to finding the optimum number of parallel TCP streams using deep reinforcement learning (RL). We devise a learning-based algorithm capable of generalizing different network conditions and utilizing the available network bandwidth intelligently. Contrary to rule-based heuristics that do not generalize well in unknown network scenarios, our RL-based solution can dynamically discover and adapt the parallel TCP stream numbers to maximize the network bandwidth utilization without congesting the network and ensure fairness among contending transfers. We extensively evaluated our RL-based algorithm's performance, comparing it with several state-of-the-art online optimization algorithms. The results show that our RL-based algorithm can find near-optimal solutions 40% faster while achieving up to 15% higher throughput. We also show that, unlike a greedy algorithm, our devised RL-based algorithm can avoid network congestion and fairly share the available network resources among contending transfers. △ Less

Submitted 30 November, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: Submitted to ICC 2023, converted to 12 pages , conference submission was for 7 pages

ACM Class: C.4; C.2.3; I.2.6

arXiv:2204.07601 [pdf, ps, other]

doi 10.1109/ICCCN54977.2022.9868866

Energy-Efficient Data Transfer Optimization via Decision-Tree Based Uncertainty Reduction

Authors: Hasibul Jamil, Lavone Rodolph, Jacob Goldverg, Tevfik Kosar

Abstract: The increase and rapid growth of data produced by scientific instruments, the Internet of Things (IoT), and social media is causing data transfer performance and resource consumption to garner much attention in the research community. The network infrastructure and end systems that enable this extensive data movement use a substantial amount of electricity, measured in terawatt-hours per year. Man… ▽ More The increase and rapid growth of data produced by scientific instruments, the Internet of Things (IoT), and social media is causing data transfer performance and resource consumption to garner much attention in the research community. The network infrastructure and end systems that enable this extensive data movement use a substantial amount of electricity, measured in terawatt-hours per year. Managing energy consumption within the core networking infrastructure is an active research area, but there is a limited amount of work on reducing power consumption at the end systems during active data transfers. This paper presents a novel two-phase dynamic throughput and energy optimization model that utilizes an offline decision-search-tree based clustering technique to encapsulate and categorize historical data transfer log information and an online search optimization algorithm to find the best application and kernel layer parameter combination to maximize the achieved data transfer throughput while minimizing the energy consumption. Our model also incorporates an ensemble method to reduce aleatoric uncertainty in finding optimal application and kernel layer parameters during the offline analysis phase. The experimental evaluation results show that our decision-tree based model outperforms the state-of-the-art solutions in this area by achieving 117% higher throughput on average and also consuming 19% less energy at the end systems during active data transfers. △ Less

Submitted 24 April, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: 10 pages accepted to be published in IEEE ICCCN2022

Journal ref: 2022 International Conference on Computer Communications and Networks (ICCCN)

arXiv:2105.14157 [pdf, other]

SMURF: Efficient and Scalable Metadata Access for Distributed Applications

Authors: Bing Zhang, Tevfik Kosar

Abstract: In parallel with big data processing and analysis dominating the usage of distributed and cloud infrastructures, the demand for distributed metadata access and transfer has increased. In many application domains, the volume of data generated exceeds petabytes, while the corresponding metadata amounts to terabytes or even more. This paper proposes a novel solution for efficient and scalable metadat… ▽ More In parallel with big data processing and analysis dominating the usage of distributed and cloud infrastructures, the demand for distributed metadata access and transfer has increased. In many application domains, the volume of data generated exceeds petabytes, while the corresponding metadata amounts to terabytes or even more. This paper proposes a novel solution for efficient and scalable metadata access for distributed applications across wide-area networks, dubbed SMURF. Our solution combines novel pipelining and concurrent transfer mechanisms with reliability, provides distributed continuum caching and prefetching strategies to sidestep fetching latency, and achieves scalable and high-performance metadata fetch/prefetch services in the cloud. We also study the phenomenon of semantic locality in real trace logs, which is not well utilized in metadata access prediction. We implement a novel prefetch predictor based on this observation and compare it with three existing state-of-the-art prefetch schemes on Yahoo! Hadoop audit traces. By effectively caching and prefetching metadata based on the access patterns, our continuum caching and prefetching mechanism significantly improves local cache hit rate and reduces the average fetching latency. We replayed approximately 20 Million metadata access operations from real audit traces, in which our system achieved 90% accuracy during prefetch prediction and reduced the average fetch latency by 50% compared to the state-of-the-art mechanisms. △ Less

Submitted 28 May, 2021; originally announced May 2021.

arXiv:2104.01192 [pdf, other]

Energy-saving Cross-layer Optimization of Big Data Transfer Based on Historical Log Analysis

Authors: Lavone Rodolph, MD S Q Zulkar Nine, Luigi Di Tacchio, Tevfik Kosar

Abstract: With the proliferation of data movement across the Internet, global data traffic per year has already exceeded the Zettabyte scale. The network infrastructure and end-systems facilitating the vast data movement consume an extensive amount of electricity, measured in terawatt-hours per year. This massive energy footprint costs the world economy billions of dollars partially due to energy consumed a… ▽ More With the proliferation of data movement across the Internet, global data traffic per year has already exceeded the Zettabyte scale. The network infrastructure and end-systems facilitating the vast data movement consume an extensive amount of electricity, measured in terawatt-hours per year. This massive energy footprint costs the world economy billions of dollars partially due to energy consumed at the network end-systems. Although extensive research has been done on managing power consumption within the core networking infrastructure, there is little research on reducing the power consumption at the end-systems during active data transfers. This paper presents a novel cross-layer optimization framework, called Cross-LayerHLA, to minimize energy consumption at the end-systems by applying machine learning techniques to historical transfer logs and extracting the hidden relationships between different parameters affecting both the performance and resource utilization. It utilizes offline analysis to improve online learning and dynamic tuning of application-level and kernel-level parameters with minimal overhead. This approach minimizes end-system energy consumption and maximizes data transfer throughput. Our experimental results show that Cross-LayerHLA outperforms other state-of-the-art solutions in this area. △ Less

Submitted 2 April, 2021; originally announced April 2021.

arXiv:2008.06214 [pdf, other]

doi 10.18293/SEKE2020-138

The Impact of Auto-Refactoring Code Smells on the Resource Utilization of Cloud Software

Authors: Asif Imran, Tevfik Kosar

Abstract: Cloud-based software-as-a-service (SaaS) have gained popularity due to their low cost and elasticity. However, like other software, SaaS applications suffer from code smells, which can drastically affect functionality and resource usage. Code smell is any design in the source code that indicates a deeper problem. The software community deploys automated refactoring to eliminate smells which can im… ▽ More Cloud-based software-as-a-service (SaaS) have gained popularity due to their low cost and elasticity. However, like other software, SaaS applications suffer from code smells, which can drastically affect functionality and resource usage. Code smell is any design in the source code that indicates a deeper problem. The software community deploys automated refactoring to eliminate smells which can improve performance and also decrease the usage of critical resources. However, studies that analyze the impact of automatic refactoring smells in SaaS on resources such as CPU and memory have been conducted to a limited extent. Here, we aim to fill that gap and study the impact on resource usage of SaaS applications due to automatic refactoring of seven classic code smells: god class, feature envy, type checking, cyclic dependency, shotgun surgery, god method, and spaghetti code. We specified six real-life SaaS applications from Github called Zimbra, OneDataShare, GraphHopper, Hadoop, JENA, and JAMES which ran on Openstack cloud. Results show that refactoring smells by tools like JDeodrant and JSparrow have widely varying impacts on the CPU and memory consumption of the tested applications based on the type of smell refactored. We present the resource utilization impact of each smell and also discuss the potential reasons leading to that effect. △ Less

Submitted 14 August, 2020; originally announced August 2020.

Journal ref: In SEKE (pp. 299-304) 2020

arXiv:1910.06109 [pdf, other]

Software Sustainability: A Systematic Literature Review and Comprehensive Analysis

Authors: Asif Imran, Tevfik Kosar

Abstract: Software Engineering is a constantly evolving subject area that faces new challenges every day as it tries to automate newer business processes. One of the key challenges to the success of a software solution is attaining sustainability. The inability of numerous software to sustain for the desired time-length is caused by limited consideration given towards sustainability during the stages of sof… ▽ More Software Engineering is a constantly evolving subject area that faces new challenges every day as it tries to automate newer business processes. One of the key challenges to the success of a software solution is attaining sustainability. The inability of numerous software to sustain for the desired time-length is caused by limited consideration given towards sustainability during the stages of software development. This review aims to present a detailed and inclusive study covering both the technical and non-technical challenges and approaches of software sustainability. A systematic and comprehensive literature review was conducted based on 107 relevant studies that were selected using the Evidence-Based Software Engineering (EBSE) technique. The study showed that sustainability can be achieved by conducting specific activities at the technical and non-technical levels. The technical level consists of software design, coding, and user experience attributes. The non-technical level consists of documentation, sustainability manifestos, training of software engineers, funding software projects, and leadership skills of project managers to achieve sustainability. This paper groups the existing research efforts based on the above aspects. Next, how those aspects affect open and closed source software is tabulated. Based on the findings of this review, it is seen that both technical and non-technical sustainability aspects are equally important, taking one into contention and ignoring the other will threaten the sustenance of software products. △ Less

Submitted 10 October, 2019; originally announced October 2019.

Comments: none

arXiv:1910.05428 [pdf, ps, other]

Design Smell Analysis for Develo** and Established Open Source Java Software

Authors: Asif Imran, Tevfik Kosar

Abstract: Software design smells are design attributes which violate the fundamental design principles. Design smells are a key cause of design debt. Although the activities of design smell identification and measurement are predominantly considered in current literature, those which identify and communicate which design smells occur more frequently in newly develo** software and which ones are more domin… ▽ More Software design smells are design attributes which violate the fundamental design principles. Design smells are a key cause of design debt. Although the activities of design smell identification and measurement are predominantly considered in current literature, those which identify and communicate which design smells occur more frequently in newly develo** software and which ones are more dominant in established software have been studied to a limited extent. This research describes a mechanism for identifying the design smells that are more prevalent in develo** and established software respectively. A tool is provided which is used for design smell detection by analyzing large volumes of source code. More specifically, 164,609 Lines of Code (LoC) and 5,712 class files of six develo** and 244,930 LoC and 12,048 class files of five established open-source Java software are analyzed. Obtained results show that out of the 4,020 occurrences of smells that were made for nine preselected types of design smells, 1,643 design smells were detected for develo** software, which mainly consisted of four specific types of smells. For established software, 2,397 design smells were observed which predominantly consisted of four other types of smells. The remaining design smell was equally prevalent in both develo** and established software. Desirable precision values ranging from 72.9% to 84.1% were obtained for the tool. △ Less

Submitted 11 October, 2019; originally announced October 2019.

Comments: none

arXiv:1904.05867 [pdf, other]

Energy-Efficient High-Throughput Data Transfers via Dynamic CPU Frequency and Core Scaling

Authors: Luigi Di Tacchio, Zulkar Nine, Tevfik Kosar, Fatih M. Bulut, **ho Hwang

Abstract: The energy footprint of global data movement has surpassed 100 terawatt hours, costing more than 20 billion US dollars to the world economy. Depending on the number of switches, routers, and hubs between the source and destination nodes, the networking infrastructure consumes 10% - 75% of the total energy during active data transfers, and the rest is consumed by the end systems. Even though there… ▽ More The energy footprint of global data movement has surpassed 100 terawatt hours, costing more than 20 billion US dollars to the world economy. Depending on the number of switches, routers, and hubs between the source and destination nodes, the networking infrastructure consumes 10% - 75% of the total energy during active data transfers, and the rest is consumed by the end systems. Even though there has been extensive research on reducing the power consumption at the networking infrastructure, the work focusing on saving energy at the end systems has been limited to the tuning of a few application level parameters such as parallelism, pipelining, and concurrency. In this paper, we introduce three novel application-level parameter tuning algorithms which employ dynamic CPU frequency and core scaling, combining heuristics and runtime measurements to achieve energy efficient data transfers. Experimental results show that our proposed algorithms outperform the state-of-the-art solutions, achieving up to 48% reduced energy consumption and 80% higher throughput. △ Less

Submitted 11 April, 2019; originally announced April 2019.

arXiv:1812.11255 [pdf, other]

A Two-Phase Dynamic Throughput Optimization Model for Big Data Transfers

Authors: Zulkar Nine, Tevfik Kosar

Abstract: The amount of data moved over dedicated and non-dedicated network links increases much faster than the increase in the network capacity, but the current solutions fail to guarantee even the promised achievable transfer throughputs. In this paper, we propose a novel dynamic throughput optimization model based on mathematical modeling with offline knowledge discovery/analysis and adaptive online dec… ▽ More The amount of data moved over dedicated and non-dedicated network links increases much faster than the increase in the network capacity, but the current solutions fail to guarantee even the promised achievable transfer throughputs. In this paper, we propose a novel dynamic throughput optimization model based on mathematical modeling with offline knowledge discovery/analysis and adaptive online decision making. In offline analysis, we mine historical transfer logs to perform knowledge discovery about the transfer characteristics. Online phase uses the discovered knowledge from the offline analysis along with real-time investigation of the network condition to optimize the protocol parameters. As real-time investigation is expensive and provides partial knowledge about the current network status, our model uses historical knowledge about the network and data to reduce the real-time investigation overhead while ensuring near optimal throughput for each transfer. Our novel approach is tested over different networks with different datasets and outperformed its closest competitor by 1.7x and the default case by 5x. It also achieved up to 93% accuracy compared with the optimal achievable throughput possible on those networks. △ Less

Submitted 28 December, 2018; originally announced December 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1707.09455

arXiv:1810.05892 [pdf, other]

GreenDataFlow: Minimizing the Energy Footprint of Global Data Movement

Authors: MD S Q Zulkar Nine, Luigi Di Tacchio, Asif Imran, Tevfik Kosar, M. Fatih Bulut, **ho Hwang

Abstract: The global data movement over Internet has an estimated energy footprint of 100 terawatt hours per year, costing the world economy billions of dollars. The networking infrastructure together with source and destination nodes involved in the data transfer contribute to overall energy consumption. Although considerable amount of research has rendered power management techniques for the networking in… ▽ More The global data movement over Internet has an estimated energy footprint of 100 terawatt hours per year, costing the world economy billions of dollars. The networking infrastructure together with source and destination nodes involved in the data transfer contribute to overall energy consumption. Although considerable amount of research has rendered power management techniques for the networking infrastructure, there has not been much prior work focusing on energy-aware data transfer solutions for minimizing the power consumed at the end-systems. In this paper, we introduce a novel application-layer solution based on historical analysis and real-time tuning called GreenDataFlow, which aims to achieve high data transfer throughput while kee** the energy consumption at the minimal levels. GreenDataFlow supports service level agreements (SLAs) which give the service providers and the consumers the ability to fine tune their goals and priorities in this optimization process. Our experimental results show that GreenDataFlow outperforms the closest competing state-of-the art solution in this area 50% for energy saving and 2.5x for the achieved end-to-end performance. △ Less

Submitted 13 October, 2018; originally announced October 2018.

arXiv:1805.10499 [pdf, other]

Data-Aware Approximate Workflow Scheduling

Authors: Dengpan Yin, Tevfik Kosar

Abstract: Optimization of data placement in complex scientific workflows has become very crucial since the large amounts of data generated by these workflows significantly increases the turnaround time of the end-to-end application. It is almost impossible to make an optimal scheduling for the end-to-end workflow without considering the intermediate data movement. In order to reduce the complexity of the wo… ▽ More Optimization of data placement in complex scientific workflows has become very crucial since the large amounts of data generated by these workflows significantly increases the turnaround time of the end-to-end application. It is almost impossible to make an optimal scheduling for the end-to-end workflow without considering the intermediate data movement. In order to reduce the complexity of the workflow-scheduling problem, most of the existing work constrains the problem space by some unrealistic assumptions, which result in non-optimal scheduling in practice. In this study, we propose a genetic data-aware algorithm for the end-to-end workflow scheduling problem. Distinct from the past research, we develop a novel data-aware evaluation function for each chromosome, a common augmenting crossover operator and a simple but effective mutation operator. Our experiments on different workflow structures show that the proposed GA based approach gives a scheduling close to the optimal one. △ Less

Submitted 26 May, 2018; originally announced May 2018.

arXiv:1805.08616 [pdf, other]

Energy-Efficient Mobile Network I/O Optimization at the Application Layer

Authors: Kemal Guner, MD S Q Zulkar Nine, Tevfik Kosar, Fatih Bulut

Abstract: Mobile data traffic (cellular + WiFi) will exceed PC Internet traffic by 2020. As the number of smartphone users and the amount of data transferred per smartphone grow exponentially, limited battery power is becoming an increasingly critical problem for mobile devices which depend on the network I/O. Despite the growing body of research in power management techniques for the mobile devices at the… ▽ More Mobile data traffic (cellular + WiFi) will exceed PC Internet traffic by 2020. As the number of smartphone users and the amount of data transferred per smartphone grow exponentially, limited battery power is becoming an increasingly critical problem for mobile devices which depend on the network I/O. Despite the growing body of research in power management techniques for the mobile devices at the hardware layer as well as the lower layers of the networking stack, there has been little work focusing on saving energy at the application layer for the mobile systems during network I/O. In this paper, to the best of our knowledge, we are first to provide an in-depth analysis of the effects of application-layer data transfer protocol parameters on the energy consumption of mobile phones. We propose a novel model, called FastHLA, that can achieve significant energy savings at the application layer during mobile network I/O without sacrificing the performance. In many cases, our model achieves performance increase and energy saving simultaneously. △ Less

Submitted 19 May, 2018; originally announced May 2018.

Comments: arXiv admin note: text overlap with arXiv:1805.03970 and substantial text overlap with arXiv:1707.06826

arXiv:1805.03970 [pdf, other]

Energy-Efficient Mobile Network I/O

Authors: Kemal Guner, Tevfik Kosar

Abstract: By year 2020, the number of smartphone users globally will reach 3 Billion and the mobile data traffic (cellular + WiFi) will exceed PC Internet traffic the first time. As the number of smartphone users and the amount of data transferred per smartphone grow exponentially, limited battery power is becoming an increasingly critical problem for mobile devices which heavily depend on network I/O. Desp… ▽ More By year 2020, the number of smartphone users globally will reach 3 Billion and the mobile data traffic (cellular + WiFi) will exceed PC Internet traffic the first time. As the number of smartphone users and the amount of data transferred per smartphone grow exponentially, limited battery power is becoming an increasingly critical problem for mobile devices which heavily depend on network I/O. Despite the growing body of research in power management techniques for the mobile devices at the hardware layer as well as the lower layers of the networking stack, there has been little work focusing on saving energy at the application layer for the mobile systems during network I/O. In this paper, we show that significant energy savings can be achieved with application-layer solutions at the mobile systems during data transfer with no performance penalty. In many cases, performance increase and energy savings can be achieved simultaneously. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1707.06826

arXiv:1712.02944 [pdf, other]

OneDataShare: A Vision for Cloud-hosted Data Transfer Scheduling and Optimization as a Service

Authors: Asif Imran, Md S Q Zulkar Nine, Kemal Guner, Tevfik Kosar

Abstract: Fast, reliable, and efficient data transmission across wide-area networks is a predominant bottleneck for data-intensive cloud applications. This paper introduces OneDataShare, which is designed to eliminate the issues plaguing effective cloud-based data transfers of varying file sizes and across incompatible transfer end-points. The vision of OneDataShare is to achieve high-speed data communicati… ▽ More Fast, reliable, and efficient data transmission across wide-area networks is a predominant bottleneck for data-intensive cloud applications. This paper introduces OneDataShare, which is designed to eliminate the issues plaguing effective cloud-based data transfers of varying file sizes and across incompatible transfer end-points. The vision of OneDataShare is to achieve high-speed data communication, interoperability between multiple transfer protocols, and accurate estimation of delivery time for advance planning, thereby maximizing user-profit through improved and faster data analysis for business intelligence. The paper elaborates on the desirable features of OneDataShare as a cloud-hosted data transfer scheduling and optimization service, and how it is aligned with the vision of harnessing the power of the cloud and distributed computing. Experimental evaluation and comparison with existing real-life file transfer services show that the transfer throughout achieved by OneDataShare is 6.5 times greater. △ Less

Submitted 8 December, 2017; originally announced December 2017.

arXiv:1708.05425 [pdf, other]

A Heuristic Approach to Protocol Tuning for High Performance Data Transfers

Authors: Engin Arslan, Tevfik Kosar

Abstract: Obtaining optimal data transfer performance is of utmost importance to today's data-intensive distributed applications and wide-area data replication services. Doing so necessitates effectively utilizing available network bandwidth and resources, yet in practice transfers seldom reach the levels of utilization they potentially could. Tuning protocol parameters such as pipelining, parallelism, and… ▽ More Obtaining optimal data transfer performance is of utmost importance to today's data-intensive distributed applications and wide-area data replication services. Doing so necessitates effectively utilizing available network bandwidth and resources, yet in practice transfers seldom reach the levels of utilization they potentially could. Tuning protocol parameters such as pipelining, parallelism, and concurrency can significantly increase utilization and performance, however determining the best settings for these parameters is a difficult problem, as network conditions can vary greatly between sites and over time. Nevertheless, it is an important problem, since poor tuning can cause either under- or over-utilization of network resources and thus degrade transfer performance. In this paper, we present three algorithms for application-level tuning of different protocol parameters for maximizing transfer throughput in wide-area networks. Our algorithms dynamically tune the number of parallel data streams per file (for large file optimization), the level of control channel pipelining (for small file optimization), and the number of concurrent file transfers to increase I/O throughput (a technique useful for all types of files). The proposed heuristic algorithms improve the transfer throughput up to 10x compared to the baseline and 7x compared to the state of the art solutions. △ Less

Submitted 17 August, 2017; originally announced August 2017.

arXiv:1708.03053 [pdf, other]

Application Level High Speed Transfer Optimization Based on Historical Analysis and Real-time Tuning

Authors: Engin Arslan, Tevfik Kosar

Abstract: Data-intensive scientific and commercial applications increasingly require frequent movement of large datasets from one site to the other(s). Despite growing network capacities, these data movements rarely achieve the promised data transfer rates of the underlying physical network due to poorly tuned data transfer protocols. Accurately and efficiently tuning the data transfer protocol parameters i… ▽ More Data-intensive scientific and commercial applications increasingly require frequent movement of large datasets from one site to the other(s). Despite growing network capacities, these data movements rarely achieve the promised data transfer rates of the underlying physical network due to poorly tuned data transfer protocols. Accurately and efficiently tuning the data transfer protocol parameters in a dynamically changing network environment is a major challenge and remains as an open research problem. In this paper, we present predictive end-to-end data transfer optimization algorithms based on historical data analysis and real-time background traffic probing, dubbed HARP. Most of the previous work in this area are solely based on real time network probing which results either in an excessive sampling overhead or fails to accurately predict the optimal transfer parameters. Combining historical data analysis with real time sampling enables our algorithms to tune the application level data transfer parameters accurately and efficiently to achieve close-to-optimal end-to-end data transfer throughput with very low overhead. Our experimental analysis over a variety of network settings shows that HARP outperforms existing solutions by up to 50% in terms of the achieved throughput. △ Less

Submitted 9 August, 2017; originally announced August 2017.

arXiv:1707.09455 [pdf, other]

Data Transfer Optimization Based on Offline Knowledge Discovery and Adaptive Real-time Sampling

Authors: MD S Q Zulkar Nine, Kemal Guner, Ziyun Huang, Xiangyu Wang, **hui Xu, Tevfik Kosar

Abstract: The amount of data moved over dedicated and non-dedicated network links increases much faster than the increase in the network capacity, but the current solutions fail to guarantee even the promised achievable transfer throughputs. In this paper, we propose a novel dynamic throughput optimization model based on mathematical modeling with offline knowledge discovery/analysis and adaptive online dec… ▽ More The amount of data moved over dedicated and non-dedicated network links increases much faster than the increase in the network capacity, but the current solutions fail to guarantee even the promised achievable transfer throughputs. In this paper, we propose a novel dynamic throughput optimization model based on mathematical modeling with offline knowledge discovery/analysis and adaptive online decision making. In offline analysis, we mine historical transfer logs to perform knowledge discovery about the transfer characteristics. Online phase uses the discovered knowledge from the offline analysis along with real-time investigation of the network condition to optimize the protocol parameters. As real-time investigation is expensive and provides partial knowledge about the current network status, our model uses historical knowledge about the network and data to reduce the real-time investigation overhead while ensuring near optimal throughput for each transfer. Our network and data agnostic solution is tested over different networks and achieved up to 93% accuracy compared with the optimal achievable throughput possible on those networks. △ Less

Submitted 27 November, 2017; v1 submitted 28 July, 2017; originally announced July 2017.

arXiv:1707.06826 [pdf, other]

Energy-Performance Trade-offs in Mobile Data Transfers

Authors: Kemal Guner, Tevfik Kosar

Abstract: By year 2020, the number of smartphone users globally will reach 3 Billion and the mobile data traffic (cellular + WiFi) will exceed PC internet traffic the first time. As the number of smartphone users and the amount of data transferred per smartphone grow exponentially, limited battery power is becoming an increasingly critical problem for mobile devices which increasingly depend on network I/O.… ▽ More By year 2020, the number of smartphone users globally will reach 3 Billion and the mobile data traffic (cellular + WiFi) will exceed PC internet traffic the first time. As the number of smartphone users and the amount of data transferred per smartphone grow exponentially, limited battery power is becoming an increasingly critical problem for mobile devices which increasingly depend on network I/O. Despite the growing body of research in power management techniques for the mobile devices at the hardware layer as well as the lower layers of the networking stack, there has been little work focusing on saving energy at the application layer for the mobile systems during network I/O. In this paper, to the best of our knowledge, we are first to provide an in depth analysis of the effects of application layer data transfer protocol parameters on the energy consumption of mobile phones. We show that significant energy savings can be achieved with application layer solutions at the mobile systems during data transfer with no or minimal performance penalty. In many cases, performance increase and energy savings can be achieved simultaneously. △ Less

Submitted 21 July, 2017; originally announced July 2017.

arXiv:1707.05730 [pdf, other]

Energy-Efficient Data Transfer Algorithms for HTTP-Based Services

Authors: Tevfik Kosar, Ismail Alan

Abstract: According to recent statistics, more than 1 zettabytes of data is moved over the Internet annually, which consumes several terawatt hours of electricity, and costs billions of US dollars to the world economy. HTTP protocol is used in the majority of these data transfers, accounting for 70% of the global Internet traffic. We claim that HTTP transfers, and the services based on HTTP, can become more… ▽ More According to recent statistics, more than 1 zettabytes of data is moved over the Internet annually, which consumes several terawatt hours of electricity, and costs billions of US dollars to the world economy. HTTP protocol is used in the majority of these data transfers, accounting for 70% of the global Internet traffic. We claim that HTTP transfers, and the services based on HTTP, can become more energy efficient without any performance degradation by application-level tuning of certain protocol parameters. In this paper, we analyze several application-level parameters that affect the throughput and energy consumption in HTTP data transfers, such as the level of parallelism, concurrency, and pipelining. We introduce SLA-based algorithms which can decide the best combination of these parameters based on user-defined energy efficiency and performance criteria. Our experimental results show that up to 80% energy savings can be achieved at the client and server hosts during HTTP data transfers and the end-to-end data throughput can be increased at the same time. △ Less

Submitted 18 July, 2017; originally announced July 2017.

arXiv:1703.08905 [pdf, other]

WPaxos: Wide Area Network Flexible Consensus

Authors: Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Tevfik Kosar

Abstract: WPaxos is a multileader Paxos protocol that provides low-latency and high-throughput consensus across wide-area network (WAN) deployments. WPaxos uses multileaders, and partitions the object-space among these multileaders. Unlike statically partitioned multiple Paxos deployments, WPaxos is able to adapt to the changing access locality through object stealing. Multiple concurrent leaders coinciding… ▽ More WPaxos is a multileader Paxos protocol that provides low-latency and high-throughput consensus across wide-area network (WAN) deployments. WPaxos uses multileaders, and partitions the object-space among these multileaders. Unlike statically partitioned multiple Paxos deployments, WPaxos is able to adapt to the changing access locality through object stealing. Multiple concurrent leaders coinciding in different zones steal ownership of objects from each other using phase-1 of Paxos, and then use phase-2 to commit update-requests on these objects locally until they are stolen by other leaders. To achieve fast phase-2 commits, WPaxos adopts the flexible quorums idea in a novel manner, and appoints phase-2 acceptors to be close to their respective leaders. We implemented WPaxos and evaluated it on WAN deployments across 5 AWS regions. The dynamic partitioning of the object-space and emphasis on zone-local commits allow WPaxos to significantly outperform both partitioned Paxos deployments and leaderless Paxos approaches. △ Less

Submitted 3 April, 2019; v1 submitted 26 March, 2017; originally announced March 2017.

arXiv:1303.0722 [pdf, other]

EasyTime++: A case study of incremental domain-specific language development

Authors: Iztok Fister Jr., Tomaž Kosar, Iztok Fister, Marjan Mernik

Abstract: EasyTime is a domain-specific language (DSL) for measuring time during sports competitions. A distinguishing feature of DSLs is that they are much more amenable to change, and EasyTime is no exception in this regard. This paper introduces two new EasyTime features: classifications of competitors into categories, and the inclusion of competitions where the number of laps must be dynamically determi… ▽ More EasyTime is a domain-specific language (DSL) for measuring time during sports competitions. A distinguishing feature of DSLs is that they are much more amenable to change, and EasyTime is no exception in this regard. This paper introduces two new EasyTime features: classifications of competitors into categories, and the inclusion of competitions where the number of laps must be dynamically determined. It shows how such extensions can be incrementally added into the base-language reusing most of the language specifications. Two case studies are presented showing the suitability of this approach. △ Less

Submitted 4 March, 2013; originally announced March 2013.

Journal ref: Information technology and control, 42(1), 77--85, 2013

arXiv:1208.4126 [pdf, other]

Upgrading EasyTime: from a textual to a visual language

Authors: Iztok Fister Jr., Tomaž Kosar, Marjan Mernik, Iztok Fister

Abstract: Measuring time in mass sports competitions is usually performed using expensive measuring devices. Unfortunately, these solutions are not acceptable by many organizers of sporting competitions. In order to make the measuring time as cheap as possible, the domain-specific language (DSL) EasyTime was proposed. In practice, it has been proven to be universal, flexible, and efficient. It can even redu… ▽ More Measuring time in mass sports competitions is usually performed using expensive measuring devices. Unfortunately, these solutions are not acceptable by many organizers of sporting competitions. In order to make the measuring time as cheap as possible, the domain-specific language (DSL) EasyTime was proposed. In practice, it has been proven to be universal, flexible, and efficient. It can even reduce the number of required measuring devices. On the other hand, programming in EasyTime is not easy, because it requires a domain-expert to program in a textual manner. In this paper, the domain-specific modeling language (DSML) EasyTime II is proposed, which simplifies the programming of the measuring system. First, the DSL EasyTime domain analysis is presented. Then, the development of DSML is described in detail. Finally, the DSML was tested by regular organizers of a sporting competition. This test showed that DSML can be used by end-users without any previous programming knowledge. △ Less

Submitted 20 August, 2012; originally announced August 2012.

Journal ref: I. Fister Jr., T. Kosar, M. Mernik, I. Fister, Upgrading EasyTime: from a textual to a visual language, In Proceedings of the 21st International Electrotechnical and Computer Science Conference, Portorož, Slovenia, 2012

Showing 1–30 of 30 results for author: Kosar, T