-
Creating A Galactic Plane Atlas With Amazon Web Services
Authors:
G. Bruce Berriman,
Ewa Deelman,
John Good,
Gideon Juve,
Jamie Kinney,
Ann Merrihew,
Mats Rynge
Abstract:
This paper describes by example how astronomers can use cloud-computing resources offered by Amazon Web Services (AWS) to create new datasets at scale. We have created from existing surveys an atlas of the Galactic Plane at 16 wavelengths from 1 μm to 24 μm with pixels co-registered at spatial sampling of 1 arcsec. We explain how open source tools support management and operation of a virtual clus…
▽ More
This paper describes by example how astronomers can use cloud-computing resources offered by Amazon Web Services (AWS) to create new datasets at scale. We have created from existing surveys an atlas of the Galactic Plane at 16 wavelengths from 1 μm to 24 μm with pixels co-registered at spatial sampling of 1 arcsec. We explain how open source tools support management and operation of a virtual cluster on AWS platforms to process data at scale, and describe the technical issues that users will need to consider, such as optimization of resources, resource costs, and management of virtual machine instances.
△ Less
Submitted 23 December, 2013;
originally announced December 2013.
-
A Tale Of 160 Scientists, Three Applications, A Workshop and A Cloud
Authors:
G. Bruce Berriman,
Carolyn Brinkworth,
Dawn Gelino,
Dennis K. Wittman,
Ewa Deelman,
Gideon Juve,
Mats Rynge,
Jamie Kinney
Abstract:
The NASA Exoplanet Science Institute (NExScI) hosts the annual Sagan Workshops, thematic meetings aimed at introducing researchers to the latest tools and methodologies in exoplanet research. The theme of the Summer 2012 workshop, held from July 23 to July 27 at Caltech, was to explore the use of exoplanet light curves to study planetary system architectures and atmospheres. A major part of the wo…
▽ More
The NASA Exoplanet Science Institute (NExScI) hosts the annual Sagan Workshops, thematic meetings aimed at introducing researchers to the latest tools and methodologies in exoplanet research. The theme of the Summer 2012 workshop, held from July 23 to July 27 at Caltech, was to explore the use of exoplanet light curves to study planetary system architectures and atmospheres. A major part of the workshop was to use hands-on sessions to instruct attendees in the use of three open source tools for the analysis of light curves, especially from the Kepler mission. Each hands-on session involved the 160 attendees using their laptops to follow step-by-step tutorials given by experts. We describe how we used the Amazon Elastic Cloud 2 to run these applications.
△ Less
Submitted 16 November, 2012;
originally announced November 2012.
-
Data Sharing Options for Scientific Workflows on Amazon EC2
Authors:
Gideon Juve,
Ewa Deelman,
Karan Vahi,
Gaurang Mehta,
Bruce Berriman,
Benjamin P. Berman,
Phil Maechling
Abstract:
Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often s…
▽ More
Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon's EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.
△ Less
Submitted 22 October, 2010;
originally announced October 2010.
-
The Application of Cloud Computing to Astronomy: A Study of Cost and Performance
Authors:
G. Bruce Berriman,
Ewa Deelman,
Gideon Juve,
Moira Regelson,
Peter Plavchan
Abstract:
Cloud computing is a powerful new technology that is widely used in the business world. Recently, we have been investigating the benefits it offers to scientific computing. We have used three workflow applications to compare the performance of processing data on the Amazon EC2 cloud with the performance on the Abe high-performance cluster at the National Center for Supercomputing Applications (NCS…
▽ More
Cloud computing is a powerful new technology that is widely used in the business world. Recently, we have been investigating the benefits it offers to scientific computing. We have used three workflow applications to compare the performance of processing data on the Amazon EC2 cloud with the performance on the Abe high-performance cluster at the National Center for Supercomputing Applications (NCSA). We show that the Amazon EC2 cloud offers better performance and value for processor- and memory-limited applications than for I/O-bound applications. We provide an example of how the cloud is well suited to the generation of a science product: an atlas of periodograms for the 210,000 light curves released by the NASA Kepler Mission. This atlas will support the identification of periodic signals, including those due to transiting exoplanets, in the Kepler data sets.
△ Less
Submitted 22 October, 2010;
originally announced October 2010.
-
The Application of Cloud Computing to the Creation of Image Mosaics and Management of Their Provenance
Authors:
G. Bruce Berriman,
Ewa Deelman,
Paul Groth,
Gideon Juve
Abstract:
We have used the Montage image mosaic engine to investigate the cost and performance of processing images on the Amazon EC2 cloud, and to inform the requirements that higher-level products impose on provenance management technologies. We will present a detailed comparison of the performance of Montage on the cloud and on the Abe high performance cluster at the National Center for Supercomputing Ap…
▽ More
We have used the Montage image mosaic engine to investigate the cost and performance of processing images on the Amazon EC2 cloud, and to inform the requirements that higher-level products impose on provenance management technologies. We will present a detailed comparison of the performance of Montage on the cloud and on the Abe high performance cluster at the National Center for Supercomputing Applications (NCSA). Because Montage generates many intermediate products, we have used it to understand the science requirements that higher-level products impose on provenance management technologies. We describe experiments with provenance management technologies such as the "Provenance Aware Service Oriented Architecture" (PASOA).
△ Less
Submitted 24 June, 2010;
originally announced June 2010.
-
Pipeline-Centric Provenance Model
Authors:
Paul Groth,
Ewa Deelman,
Gideon Juve,
Gaurang Mehta,
Bruce Berriman
Abstract:
In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronom…
▽ More
In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application.
△ Less
Submitted 24 May, 2010;
originally announced May 2010.
-
Scientific Workflow Applications on Amazon EC2
Authors:
Gideon Juve,
Ewa Deelman,
Karan Vahi,
Gaurang Mehta,
Bruce Berriman,
Benjamin P. Berman,
Phil Maechling
Abstract:
The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and "pay as you go" usage-based pricing, it is not…
▽ More
The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and "pay as you go" usage-based pricing, it is not clear whether they are able to deliver the performance required for scientific applications at a reasonable price. In this paper we examine the performance and cost of clouds from the perspective of scientific workflow applications. We use three characteristic workflows to compare the performance of a commercial cloud with that of a typical HPC system, and we analyze the various costs associated with running those workflows in the cloud. We find that the performance of clouds is not unreasonable given the hardware resources provided, and that performance comparable to HPC systems can be achieved given similar resources. We also find that the cost of running workflows on a commercial cloud can be reduced by storing data in the cloud rather than transferring it from outside.
△ Less
Submitted 15 May, 2010;
originally announced May 2010.