Search | arXiv e-print repository

Scheduling in Data Intensive and Network Aware (DIANA) Grid Environments

Authors: Richard McClatchey, Ashiq Anjum, Heinz Stockinger, Arshad Ali, Ian Willers, Michael Thomas

Abstract: In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind of scheduling, in which there is no consideration of network characteristics, can lead to performance degradation in a Grid environment and may re… ▽ More In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind of scheduling, in which there is no consideration of network characteristics, can lead to performance degradation in a Grid environment and may result in large processing queues and job execution delays due to site overloads. In this paper we describe a Data Intensive and Network Aware (DIANA) meta-scheduling approach, which takes into account data, processing power and network characteristics when making scheduling decisions across multiple sites. Through a practical implementation on a Grid testbed, we demonstrate that queue and execution times of data-intensive jobs can be significantly improved when we introduce our proposed DIANA scheduler. The basic scheduling decisions are dictated by a weighting factor for each potential target location which is a calculated function of network characteristics, processing cycles and data location and size. The job scheduler provides a global ranking of the computing resources and then selects an optimal one on the basis of this overall access and execution cost. The DIANA approach considers the Grid as a combination of active network elements and takes network characteristics as a first class criterion in the scheduling decision matrix along with computation and data. The scheduler can then make informed decisions by taking into account the changing state of the network, locality and size of the data and the pool of available processing cycles. △ Less

Submitted 5 July, 2007; originally announced July 2007.

Comments: 22 pages, 14 figures. Early draft of paper to be submitted to Journal of Grid Computing

ACM Class: H.2.4; J.3

arXiv:0707.0743 [pdf]

doi 10.1109/E-SCIENCE.2006.261173

DIANA Scheduling Hierarchies for Optimizing Bulk Job Scheduling

Authors: A. Anjum, R. McClatchey, H. Stockinger, A. Ali, I. Willers, M. Thomas, M. Sagheer, K. Hasham, O. Alvi

Abstract: The use of meta-schedulers for resource management in large-scale distributed systems often leads to a hierarchy of schedulers. In this paper, we discuss why existing meta-scheduling hierarchies are sometimes not sufficient for Grid systems due to their inability to re-organise jobs already scheduled locally. Such a job re-organisation is required to adapt to evolving loads which are common in h… ▽ More The use of meta-schedulers for resource management in large-scale distributed systems often leads to a hierarchy of schedulers. In this paper, we discuss why existing meta-scheduling hierarchies are sometimes not sufficient for Grid systems due to their inability to re-organise jobs already scheduled locally. Such a job re-organisation is required to adapt to evolving loads which are common in heavily used Grid infrastructures. We propose a peer-to-peer scheduling model and evaluate it using case studies and mathematical modelling. We detail the DIANA (Data Intensive and Network Aware) scheduling algorithm and its queue management system for co** with the load distribution and for supporting bulk job scheduling. We demonstrate that such a system is beneficial for dynamic, distributed and self-organizing resource management and can assist in optimizing load or job distribution in complex Grid infrastructures. △ Less

Submitted 5 July, 2007; originally announced July 2007.

Comments: 8 pages, 9 figures. Presented at the 2nd IEEE Int Conference on eScience & Grid Computing. Amsterdam Netherlands, December 2006

ACM Class: H.2.4; J.3

arXiv:0707.0742 [pdf]

Mobile Computing in Physics Analysis - An Indicator for eScience

Authors: A. Ali, A. Anjum, T. Azim, J. Bunn, A. Ikram, R. McClatchey, H. Newman, C. Steenberg, M. Thomas, I. Willers

Abstract: This paper presents the design and implementation of a Grid-enabled physics analysis environment for handheld and other resource-limited computing devices as one example of the use of mobile devices in eScience. Handheld devices offer great potential because they provide ubiquitous access to data and round-the-clock connectivity over wireless links. Our solution aims to provide users of handheld… ▽ More This paper presents the design and implementation of a Grid-enabled physics analysis environment for handheld and other resource-limited computing devices as one example of the use of mobile devices in eScience. Handheld devices offer great potential because they provide ubiquitous access to data and round-the-clock connectivity over wireless links. Our solution aims to provide users of handheld devices the capability to launch heavy computational tasks on computational and data Grids, monitor the jobs status during execution, and retrieve results after job completion. Users carry their jobs on their handheld devices in the form of executables (and associated libraries). Users can transparently view the status of their jobs and get back their outputs without having to know where they are being executed. In this way, our system is able to act as a high-throughput computing environment where devices ranging from powerful desktop machines to small handhelds can employ the power of the Grid. The results shown in this paper are readily applicable to the wider eScience community. △ Less

Submitted 5 July, 2007; originally announced July 2007.

Comments: 8 pages, 7 figures. Presented at the 3rd Int Conf on Mobile Computing & Ubiquitous Networking (ICMU06. London October 2006

ACM Class: H.2.4; J.3

arXiv:0707.0740 [pdf]

A Multi Interface Grid Discovery System

Authors: A. Ali, A. Anjum, J. Bunn, F. Khan, R. McClatchey, H. Newman, C. Steenberg, M. Thomas, Ian Willers

Abstract: Discovery Systems (DS) can be considered as entry points for global loosely coupled distributed systems. An efficient Discovery System in essence increases the performance, reliability and decision making capability of distributed systems. With the rapid increase in scale of distributed applications, existing solutions for discovery systems are fast becoming either obsolete or incapable of handl… ▽ More Discovery Systems (DS) can be considered as entry points for global loosely coupled distributed systems. An efficient Discovery System in essence increases the performance, reliability and decision making capability of distributed systems. With the rapid increase in scale of distributed applications, existing solutions for discovery systems are fast becoming either obsolete or incapable of handling such complexity. They are particularly ineffective when handling service lifetimes and providing up-to-date information, poor at enabling dynamic service access and they can also impose unwanted restrictions on interfaces to widely available information repositories. In this paper we present essential the design characteristics, an implementation and a performance analysis for a discovery system capable of overcoming these deficiencies in large, globally distributed environments. △ Less

Submitted 5 July, 2007; originally announced July 2007.

Comments: 2 pages, 4 figures. Presented at the Grid 2006 conference, Barcelona Spain

ACM Class: H.2.4; J.3

arXiv:cs/0608048 [pdf]

doi 10.1109/TNS.2006.886047

Bulk Scheduling with the DIANA Scheduler

Authors: Ashiq Anjum, Richard McClatchey, Arshad Ali, Ian Willers

Abstract: Results from the research and development of a Data Intensive and Network Aware (DIANA) scheduling engine, to be used primarily for data intensive sciences such as physics analysis, are described. In Grid analyses, tasks can involve thousands of computing, data handling, and network resources. The central problem in the scheduling of these resources is the coordinated management of computation a… ▽ More Results from the research and development of a Data Intensive and Network Aware (DIANA) scheduling engine, to be used primarily for data intensive sciences such as physics analysis, are described. In Grid analyses, tasks can involve thousands of computing, data handling, and network resources. The central problem in the scheduling of these resources is the coordinated management of computation and data at multiple locations and not just data replication or movement. However, this can prove to be a rather costly operation and efficient sing can be a challenge if compute and data resources are mapped without considering network costs. We have implemented an adaptive algorithm within the so-called DIANA Scheduler which takes into account data location and size, network performance and computation capability in order to enable efficient global scheduling. DIANA is a performance-aware and economy-guided Meta Scheduler. It iteratively allocates each job to the site that is most likely to produce the best performance as well as optimizing the global queue for any remaining jobs. Therefore it is equally suitable whether a single job is being submitted or bulk scheduling is being performed. Results indicate that considerable performance improvements can be gained by adopting the DIANA scheduling approach. △ Less

Submitted 8 August, 2006; originally announced August 2006.

Comments: 12 pages, 11 figures. To be published in the IEEE Transactions in Nuclear Science, IEEE Press. 2006

ACM Class: H.2.4; J.3

arXiv:cs/0602026 [pdf]

Bulk Scheduling with DIANA Scheduler

Authors: Ashiq Anjum, Richard McClatchey, Arshad Ali, Ian Willers

Abstract: Results from and progress on the development of a Data Intensive and Network Aware (DIANA) Scheduling engine, primarily for data intensive sciences such as physics analysis, are described. Scientific analysis tasks can involve thousands of computing, data handling, and network resources and the size of the input and output files and the amount of overall storage space allocated to a user necessa… ▽ More Results from and progress on the development of a Data Intensive and Network Aware (DIANA) Scheduling engine, primarily for data intensive sciences such as physics analysis, are described. Scientific analysis tasks can involve thousands of computing, data handling, and network resources and the size of the input and output files and the amount of overall storage space allocated to a user necessarily can have significant bearing on the scheduling of data intensive applications. If the input or output files must be retrieved from a remote location, then the time required transferring the files must also be taken into consideration when scheduling compute resources for the given application. The central problem in this study is the coordinated management of computation and data at multiple locations and not simply data movement. However, this can be a very costly operation and efficient scheduling can be a challenge if compute and data resources are mapped without network cost. We have implemented an adaptive algorithm within the DIANA Scheduler which takes into account data location and size, network performance and computation capability to make efficient global scheduling decisions. DIANA is a performance-aware as well as an economy-guided Meta Scheduler. It iteratively allocates each job to the site that is likely to produce the best performance as well as optimizing the global queue for any remaining pending jobs. Therefore it is equally suitable whether a single job is being submitted or bulk scheduling is being performed. Results suggest that considerable performance improvements are to be gained by adopting the DIANA scheduling approach. △ Less

Submitted 7 February, 2006; originally announced February 2006.

Comments: 4 pages, 5 figures. Accepted by the Computing for High Energy Physics Conference. Mumbai, Indai. February 2006

ACM Class: H.2.4; J.3

arXiv:cs/0504034 [pdf]

Heterogeneous Relational Databases for a Grid-enabled Analysis Environment

Authors: Arshad Ali, Ashiq Anjum, Tahir Azim, Julian Bunn, Saima Iqbal, Richard McClatchey, Harvey Newman, S. Yousaf Shah, Tony Solomonides, Conrad Steenberg, Michael Thomas, Frank van Lingen, Ian Willers

Abstract: Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a… ▽ More Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid. △ Less

Submitted 10 April, 2005; originally announced April 2005.

Comments: 8 pages, 6 figures, 1 table. Workshop on Web and Grid Services for Scientific Data Analysis at the Int Conf on Parallel Processing (ICPP05). Norway June 2005

ACM Class: H2.4; J.3

arXiv:cs/0504033 [pdf]

Resource Management Services for a Grid Analysis Environment

Authors: Arshad Ali, Ashiq Anjum, Tahir Azim, Julian Bunn, Atif Mehmood, Richard McClatchey, Harvey Newman, Waqas ur Rehman, Conrad Steenberg, Michael Thomas, Frank van Lingen, Ian Willers, Muhammad Adeel Zafar

Abstract: Selecting optimal resources for submitting jobs on a computational Grid or accessing data from a data grid is one of the most important tasks of any Grid middleware. Most modern Grid software today satisfies this responsibility and gives a best-effort performance to solve this problem. Almost all decisions regarding scheduling and data access are made by the software automatically, giving users… ▽ More Selecting optimal resources for submitting jobs on a computational Grid or accessing data from a data grid is one of the most important tasks of any Grid middleware. Most modern Grid software today satisfies this responsibility and gives a best-effort performance to solve this problem. Almost all decisions regarding scheduling and data access are made by the software automatically, giving users little or no control over the entire process. To solve this problem, a more interactive set of services and middleware is desired that provides users more information about Grid weather, and gives them more control over the decision making process. This paper presents a set of services that have been developed to provide more interactive resource management capabilities within the Grid Analysis Environment (GAE) being developed collaboratively by Caltech, NUST and several other institutes. These include a steering service, a job monitoring service and an estimator service that have been designed and written using a common Grid-enabled Web Services framework named Clarens. The paper also presents a performance analysis of the developed services to show that they have indeed resulted in a more interactive and powerful system for user-centric Grid-enabled physics analysis. △ Less

Submitted 10 April, 2005; originally announced April 2005.

Comments: 8 pages, 7 figures. Workshop on Web and Grid Services for Scientific Data Analysis at the Int Conf on Parallel Processing (ICPP05). Norway June 2005

ACM Class: H2.4; J.3

arXiv:cs/0407014

A Grid-enabled Interface to Condor for Interactive Analysis on Handheld and Resource-limited Devices

Authors: Arshad Ali, Ashiq Anjum, Tahir Azim, Julian Bunn, Ahsan Ikram, Richard McClatchey, Harvey Newman, Conrad Steenberg, Michael Thomas, Ian Willers

Abstract: This paper was withdrawn by the authors. This paper was withdrawn by the authors. △ Less

Submitted 30 September, 2004; v1 submitted 5 July, 2004; originally announced July 2004.

Comments: This paper has been withdrawn

ACM Class: H2.4; J.3

arXiv:cs/0407013 [pdf]

Distributed Analysis and Load Balancing System for Grid Enabled Analysis on Hand-held devices using Multi-Agents Systems

Authors: Naveed Ahmad, Arshad Ali, Ashiq Anjum, Tahir Azim, Julian Bunn, Ali Hassan, Ahsan Ikram, Frank van Lingen, Richard McClatchey, Harvey Newman, Conrad Steenberg, Michael Thomas, Ian Willers

Abstract: Handheld devices, while growing rapidly, are inherently constrained and lack the capability of executing resource hungry applications. This paper presents the design and implementation of distributed analysis and load-balancing system for hand-held devices using multi-agents system. This system enables low resource mobile handheld devices to act as potential clients for Grid enabled applications… ▽ More Handheld devices, while growing rapidly, are inherently constrained and lack the capability of executing resource hungry applications. This paper presents the design and implementation of distributed analysis and load-balancing system for hand-held devices using multi-agents system. This system enables low resource mobile handheld devices to act as potential clients for Grid enabled applications and analysis environments. We propose a system, in which mobile agents will transport, schedule, execute and return results for heavy computational jobs submitted by handheld devices. Moreover, in this way, our system provides high throughput computing environment for hand-held devices. △ Less

Submitted 5 July, 2004; originally announced July 2004.

Comments: 4 pages, 3 figures. Proceedings of the 3rd International Conference on Grid and Cooperative Computing (GCC 2004)

ACM Class: H2.4; J.3

arXiv:cs/0407012

A Taxonomy and Survey of Grid Resource Planning and Reservation Systems for Grid Enabled Analysis Environment

Authors: Arshad Ali, Ashiq Anjum, Atif Mehmood, Richard McClatchey, Ian Willers, Julian Bunn, Harvey Newman, Michael Thomas, Conrad Steenberg

Abstract: The concept of coupling geographically distributed resources for solving large scale problems is becoming increasingly popular forming what is popularly called grid computing. Management of resources in the Grid environment becomes complex as the resources are geographically distributed, heterogeneous in nature and owned by different individuals and organizations each having their own resource m… ▽ More The concept of coupling geographically distributed resources for solving large scale problems is becoming increasingly popular forming what is popularly called grid computing. Management of resources in the Grid environment becomes complex as the resources are geographically distributed, heterogeneous in nature and owned by different individuals and organizations each having their own resource management policies and different access and cost models. There have been many projects that have designed and implemented the resource management systems with a variety of architectures and services. In this paper we have presented the general requirements that a Resource Management system should satisfy. The taxonomy has also been defined based on which survey of resource management systems in different existing Grid projects has been conducted to identify the key areas where these systems lack the desired functionality. △ Less

Submitted 14 January, 2018; v1 submitted 5 July, 2004; originally announced July 2004.

Comments: This article was submitted in error in 2004. The author list is incorrect and the body of the paper should be attributed to another paper. We request withdrawal of the paper forthwith to avoid inconsistency in our records

ACM Class: H2.4; J.3

arXiv:cs/0402007 [pdf, ps, other]

An Integrated Approach for Extraction of Objects from XML and Transformation to Heterogeneous Object Oriented Databases

Authors: Uzair Ahmad, Mohammad Waseem Hassan, Arshad Ali, Richard McClatchey, Ian Willers

Abstract: CERN's (European Organization for Nuclear Research) WISDOM project uses XML for the replication of data between different data repositories in a heterogeneous operating system environment. For exchanging data from Web-resident databases, the data needs to be transformed into XML and back to the database format. Many different approaches are employed to do this transformation. This paper addresse… ▽ More CERN's (European Organization for Nuclear Research) WISDOM project uses XML for the replication of data between different data repositories in a heterogeneous operating system environment. For exchanging data from Web-resident databases, the data needs to be transformed into XML and back to the database format. Many different approaches are employed to do this transformation. This paper addresses issues that make this job more efficient and robust than existing approaches. It incorporates the World Wide Web Consortium (W3C) XML Schema specification in the database-XML relationship. Incorporation of the XML Schema exhibits significant improvements in XML content usage and reduces the limitations of DTD-based database XML services. Secondly the paper explores the possibility of database independent transformation of data between XML and different databases. It proposes a standard XML format that every serialized object should follow. This makes it possible to use objects of heterogeneous database seamlessly using XML. △ Less

Submitted 2 February, 2004; originally announced February 2004.

Comments: 4 pages, 5 figures. Presented at the 5th Int Conf on Enterprise Information Systems, ICEIS'03. Angers France April 2003

ACM Class: H2.4

Showing 1–12 of 12 results for author: Willers, I