-
Processing All-Sky Images At Scale On The Amazon Cloud: A HiPS Example
Authors:
G. Bruce Berriman,
John C. Good
Abstract:
We report here on a project that has developed a practical approach to processing all-sky image collections on cloud platforms, using as an exemplar application the creation of three-color Hierarchical Progressive Survey (HiPS) maps of the 2MASS data set with the Montage Image Mosaic Engine on Amazon Web Services. We will emphasize issues that must be considered by scientists wishing to use cloud…
▽ More
We report here on a project that has developed a practical approach to processing all-sky image collections on cloud platforms, using as an exemplar application the creation of three-color Hierarchical Progressive Survey (HiPS) maps of the 2MASS data set with the Montage Image Mosaic Engine on Amazon Web Services. We will emphasize issues that must be considered by scientists wishing to use cloud platforms to perform such parallel processing, so providing a guide for scientists wishing to exploit cloud platforms for similar large-scale processing. A HiPS map is based on the HEALPix sky-tiling scheme. Progressive zooming of a HiPS map reveals an image sampled at ever smaller or larger spatial scales that are defined by the HEALPix standard. Briefly, the approach used by Montage involves creating a base mosaic at the lowest required HEALPix level, usually chosen to match as closely as possible the spatial sampling of the input images, then cutting out the HiPS cells in PNG format from this mosaic. The process is repeated at successive HEALPix levels to create a nested collection of FITS files, from which PNG files are created that are shown in HiPS viewers. Stretching FITS files to produce PNGs is based on an image histogram. For composite regions (up and including the whole sky), the histograms for each tile can be combined to create a composite histogram for the region. Using this single histogram for each of the individual FITS files means all the PNGs are on the same brightness scale and displaying them side by side in a HiPS viewer produces a continuous uniform map across the entire sky.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Enabling real-time multi-messenger astrophysics discoveries with deep learning
Authors:
E. A. Huerta,
Gabrielle Allen,
Igor Andreoni,
Javier M. Antelis,
Etienne Bachelet,
Bruce Berriman,
Federica Bianco,
Rahul Biswas,
Matias Carrasco,
Kyle Chard,
Minsik Cho,
Philip S. Cowperthwaite,
Zachariah B. Etienne,
Maya Fishbach,
Francisco Förster,
Daniel George,
Tom Gibbs,
Matthew Graham,
William Gropp,
Robert Gruendl,
Anushri Gupta,
Roland Haas,
Sarah Habib,
Elise Jennings,
Margaret W. G. Johnson
, et al. (35 additional authors not shown)
Abstract:
Multi-messenger astrophysics is a fast-growing, interdisciplinary field that combines data, which vary in volume and speed of data processing, from many different instruments that probe the Universe using different cosmic messengers: electromagnetic waves, cosmic rays, gravitational waves and neutrinos. In this Expert Recommendation, we review the key challenges of real-time observations of gravit…
▽ More
Multi-messenger astrophysics is a fast-growing, interdisciplinary field that combines data, which vary in volume and speed of data processing, from many different instruments that probe the Universe using different cosmic messengers: electromagnetic waves, cosmic rays, gravitational waves and neutrinos. In this Expert Recommendation, we review the key challenges of real-time observations of gravitational wave sources and their electromagnetic and astroparticle counterparts, and make a number of recommendations to maximize their potential for scientific discovery. These recommendations refer to the design of scalable and computationally efficient machine learning algorithms; the cyber-infrastructure to numerically simulate astrophysical sources, and to process and interpret multi-messenger astrophysics data; the management of gravitational wave detections to trigger real-time alerts for electromagnetic and astroparticle follow-ups; a vision to harness future developments of machine learning and cyber-infrastructure resources to cope with the big-data requirements; and the need to build a community of experts to realize the goals of multi-messenger astrophysics.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Deep Learning for Multi-Messenger Astrophysics: A Gateway for Discovery in the Big Data Era
Authors:
Gabrielle Allen,
Igor Andreoni,
Etienne Bachelet,
G. Bruce Berriman,
Federica B. Bianco,
Rahul Biswas,
Matias Carrasco Kind,
Kyle Chard,
Minsik Cho,
Philip S. Cowperthwaite,
Zachariah B. Etienne,
Daniel George,
Tom Gibbs,
Matthew Graham,
William Gropp,
Anushri Gupta,
Roland Haas,
E. A. Huerta,
Elise Jennings,
Daniel S. Katz,
Asad Khan,
Volodymyr Kindratenko,
William T. C. Kramer,
Xin Liu,
Ashish Mahabal
, et al. (23 additional authors not shown)
Abstract:
This report provides an overview of recent work that harnesses the Big Data Revolution and Large Scale Computing to address grand computational challenges in Multi-Messenger Astrophysics, with a particular emphasis on real-time discovery campaigns. Acknowledging the transdisciplinary nature of Multi-Messenger Astrophysics, this document has been prepared by members of the physics, astronomy, compu…
▽ More
This report provides an overview of recent work that harnesses the Big Data Revolution and Large Scale Computing to address grand computational challenges in Multi-Messenger Astrophysics, with a particular emphasis on real-time discovery campaigns. Acknowledging the transdisciplinary nature of Multi-Messenger Astrophysics, this document has been prepared by members of the physics, astronomy, computer science, data science, software and cyberinfrastructure communities who attended the NSF-, DOE- and NVIDIA-funded "Deep Learning for Multi-Messenger Astrophysics: Real-time Discovery at Scale" workshop, hosted at the National Center for Supercomputing Applications, October 17-19, 2018. Highlights of this report include unanimous agreement that it is critical to accelerate the development and deployment of novel, signal-processing algorithms that use the synergy between artificial intelligence (AI) and high performance computing to maximize the potential for scientific discovery with Multi-Messenger Astrophysics. We discuss key aspects to realize this endeavor, namely (i) the design and exploitation of scalable and computationally efficient AI algorithms for Multi-Messenger Astrophysics; (ii) cyberinfrastructure requirements to numerically simulate astrophysical sources, and to process and interpret Multi-Messenger Astrophysics data; (iii) management of gravitational wave detections and triggers to enable electromagnetic and astro-particle follow-ups; (iv) a vision to harness future developments of machine and deep learning and cyberinfrastructure resources to cope with the scale of discovery in the Big Data Era; (v) and the need to build a community that brings domain experts together with data scientists on equal footing to maximize and accelerate discovery in the nascent field of Multi-Messenger Astrophysics.
△ Less
Submitted 1 February, 2019;
originally announced February 2019.
-
Best Practices for a Future Open Code Policy: Experiences and Vision of the Astrophysics Source Code Library
Authors:
Lior Shamir,
Bruce Berriman,
Peter Teuben,
Robert Nemiroff,
Alice Allen
Abstract:
We are members of the Astrophysics Source Code Library's Advisory Committee and its editor-in-chief. The Astrophysics Source Code Library (ASCL, ascl.net) is a successful initiative that advocates for open research software and provides an infrastructure for registering, discovering, sharing, and citing this software. Started in 1999, the ASCL has been expanding in recent years, with an average of…
▽ More
We are members of the Astrophysics Source Code Library's Advisory Committee and its editor-in-chief. The Astrophysics Source Code Library (ASCL, ascl.net) is a successful initiative that advocates for open research software and provides an infrastructure for registering, discovering, sharing, and citing this software. Started in 1999, the ASCL has been expanding in recent years, with an average of over 200 codes added each year, and now houses over 1,600 code entries.
△ Less
Submitted 1 February, 2018;
originally announced February 2018.
-
Software metadata: How much is enough?
Authors:
Alice Allen,
Peter Teuben,
G. Bruce Berriman,
Kimberly DuPrie,
Keith Shortridge,
Rein Warmels
Abstract:
Broad efforts are underway to capture metadata about research software and retain it across services; notable in this regard is the CodeMeta project. What metadata are important to have about (research) software? What metadata are useful for searching for codes? What would you like to learn about astronomy software? This BoF sought to gather information on metadata most desired by researchers and…
▽ More
Broad efforts are underway to capture metadata about research software and retain it across services; notable in this regard is the CodeMeta project. What metadata are important to have about (research) software? What metadata are useful for searching for codes? What would you like to learn about astronomy software? This BoF sought to gather information on metadata most desired by researchers and users of astro software and others interested in registering, indexing, capturing, and doing research on this software. Information from this BoF could conceivably result in changes to the Astrophysics Source Code Library (ASCL) or other resources for the benefit of the community or provide input into other projects concerned with software metadata.
△ Less
Submitted 6 December, 2017;
originally announced December 2017.
-
Implementing Ideas for Improving Software Citation and Credit
Authors:
Peter Teuben,
Alice Allen,
G. Bruce Berriman,
Kimberly DuPrie,
Jessica Mink,
Thomas Robitaille,
Keith Shortridge,
Mark Taylor,
Rein Warmels
Abstract:
Improving software citation and credit continues to be a topic of interest across and within many disciplines, with numerous efforts underway. In this Birds of a Feather (BoF) session, we started with a list of actionable ideas from last year's BoF and other similar efforts and worked alone or in small groups to begin implementing them. Work was captured in a common Google document; the session or…
▽ More
Improving software citation and credit continues to be a topic of interest across and within many disciplines, with numerous efforts underway. In this Birds of a Feather (BoF) session, we started with a list of actionable ideas from last year's BoF and other similar efforts and worked alone or in small groups to begin implementing them. Work was captured in a common Google document; the session organizers will disseminate or otherwise put this information to use in or for the community in collaboration with those who contributed.
△ Less
Submitted 18 November, 2016;
originally announced November 2016.
-
Improving Software Citation and Credit
Authors:
Alice Allen,
G. Bruce Berriman,
Kimberly DuPrie,
Jessica Mink,
Robert Nemiroff,
Thomas Robitaille,
Lior Shamir,
Keith Shortridge,
Mark Taylor,
Peter Teuben,
John Wallin
Abstract:
The past year has seen movement on several fronts for improving software citation, including the Center for Open Science's Transparency and Openness Promotion (TOP) Guidelines, the Software Publishing Special Interest Group that was started at January's AAS meeting in Seattle at the request of that organization's Working Group on Astronomical Software, a Sloan-sponsored meeting at GitHub in San Fr…
▽ More
The past year has seen movement on several fronts for improving software citation, including the Center for Open Science's Transparency and Openness Promotion (TOP) Guidelines, the Software Publishing Special Interest Group that was started at January's AAS meeting in Seattle at the request of that organization's Working Group on Astronomical Software, a Sloan-sponsored meeting at GitHub in San Francisco to begin work on a cohesive research software citation-enabling platform, the work of Force11 to "transform and improve" research communication, and WSSSPE's ongoing efforts that include software publication, citation, credit, and sustainability.
Brief reports on these efforts were shared at the BoF, after which participants discussed ideas for improving software citation, generating a list of recommendations to the community of software authors, journal publishers, ADS, and research authors. The discussion, recommendations, and feedback will help form recommendations for software citation to those publishers represented in the Software Publishing Special Interest Group and the broader community.
△ Less
Submitted 24 December, 2015;
originally announced December 2015.
-
Learning from FITS: Limitations in use in modern astronomical research
Authors:
Brian Thomas,
Tim Jenness,
Frossie Economou,
Perry Greenfield,
Paul Hirst,
David S. Berry,
Erik Bray,
Norman Gray,
Demitri Muna,
James Turner,
Miguel de Val-Borro,
Juande Santander-Vela,
David Shupe,
John Good,
G. Bruce Berriman,
Slava Kitaeff,
Jonathan Fay,
Omar Laurino,
Anastasia Alexov,
Walter Landry,
Joe Masters,
Adam Brazier,
Reinhold Schaaf,
Kevin Edwards,
Russell O. Redman
, et al. (13 additional authors not shown)
Abstract:
The Flexible Image Transport System (FITS) standard has been a great boon to astronomy, allowing observatories, scientists and the public to exchange astronomical information easily. The FITS standard, however, is showing its age. Developed in the late 1970s, the FITS authors made a number of implementation choices that, while common at the time, are now seen to limit its utility with modern data.…
▽ More
The Flexible Image Transport System (FITS) standard has been a great boon to astronomy, allowing observatories, scientists and the public to exchange astronomical information easily. The FITS standard, however, is showing its age. Developed in the late 1970s, the FITS authors made a number of implementation choices that, while common at the time, are now seen to limit its utility with modern data. The authors of the FITS standard could not anticipate the challenges which we are facing today in astronomical computing. Difficulties we now face include, but are not limited to, addressing the need to handle an expanded range of specialized data product types (data models), being more conducive to the networked exchange and storage of data, handling very large datasets, and capturing significantly more complex metadata and data relationships.
There are members of the community today who find some or all of these limitations unworkable, and have decided to move ahead with storing data in other formats. If this fragmentation continues, we risk abandoning the advantages of broad interoperability, and ready archivability, that the FITS format provides for astronomy. In this paper we detail some selected important problems which exist within the FITS standard today. These problems may provide insight into deeper underlying issues which reside in the format and we provide a discussion of some lessons learned. It is not our intention here to prescribe specific remedies to these issues; rather, it is to call attention of the FITS and greater astronomical computing communities to these problems in the hope that it will spur action to address them.
△ Less
Submitted 10 February, 2015; v1 submitted 3 February, 2015;
originally announced February 2015.
-
Astrophysics Source Code Library Enhancements
Authors:
Robert J. Hanisch,
Alice Allen,
G. Bruce Berriman,
Kimberly DuPrie,
Jessica Mink,
Robert J. Nemiroff,
Judy Schmidt,
Lior Shamir,
Keith Shortridge,
Mark Taylor,
Peter J. Teuben,
John Wallin
Abstract:
The Astrophysics Source Code Library (ASCL; ascl.net) is a free online registry of codes used in astronomy research; it currently contains over 900 codes and is indexed by ADS. The ASCL has recently moved a new infrastructure into production. The new site provides a true database for the code entries and integrates the WordPress news and information pages and the discussion forum into one site. Pr…
▽ More
The Astrophysics Source Code Library (ASCL; ascl.net) is a free online registry of codes used in astronomy research; it currently contains over 900 codes and is indexed by ADS. The ASCL has recently moved a new infrastructure into production. The new site provides a true database for the code entries and integrates the WordPress news and information pages and the discussion forum into one site. Previous capabilities are retained and permalinks to ascl.net continue to work. This improvement offers more functionality and flexibility than the previous site, is easier to maintain, and offers new possibilities for collaboration. This presentation covers these recent changes to the ASCL.
△ Less
Submitted 7 November, 2014;
originally announced November 2014.
-
Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)
Authors:
Daniel S. Katz,
Sou-Cheng T. Choi,
Hilmar Lapp,
Ketan Maheshwari,
Frank Löffler,
Matthew Turk,
Marcus D. Hanwell,
Nancy Wilkins-Diehr,
James Hetherington,
James Howison,
Shel Swenson,
Gabrielle D. Allen,
Anne C. Elster,
Bruce Berriman,
Colin Venters
Abstract:
Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)…
▽ More
Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1) was held in November 2013 in conjunction with the SC13 Conference. The workshop featured keynote presentations and a large number (54) of solicited extended abstracts that were grouped into three themes and presented via panels. A set of collaborative notes of the presentations and discussion was taken during the workshop.
Unique perspectives were captured about issues such as comprehensive documentation, development and deployment practices, software licenses and career paths for developers. Attribution systems that account for evidence of software contribution and impact were also discussed. These include mechanisms such as Digital Object Identifiers, publication of "software papers", and the use of online systems, for example source code repositories like GitHub.
This paper summarizes the issues and shared experiences that were discussed, including cross-cutting issues and use cases. It joins a nascent literature seeking to understand what drives software work in science, and how it is impacted by the reward systems of science. These incentives can determine the extent to which developers are motivated to build software for the long-term, for the use of others, and whether to work collaboratively or separately. It also explores community building, leadership, and dynamics in relation to successful scientific software.
△ Less
Submitted 12 June, 2014; v1 submitted 29 April, 2014;
originally announced April 2014.
-
Ideas for Advancing Code Sharing (A Different Kind of Hack Day)
Authors:
Peter Teuben,
Alice Allen,
Bruce Berriman,
Kimberly DuPrie,
Robert J. Hanisch,
Jessica Mink,
Robert Nemiroff,
Lior Shamir,
Keith Shortridge,
Mark Taylor,
John Wallin
Abstract:
How do we as a community encourage the reuse of software for telescope operations, data processing, and calibration? How can we support making codes used in research available for others to examine? Continuing the discussion from last year Bring out your codes! BoF session, participants separated into groups to brainstorm ideas to mitigate factors which inhibit code sharing and nurture those which…
▽ More
How do we as a community encourage the reuse of software for telescope operations, data processing, and calibration? How can we support making codes used in research available for others to examine? Continuing the discussion from last year Bring out your codes! BoF session, participants separated into groups to brainstorm ideas to mitigate factors which inhibit code sharing and nurture those which encourage code sharing. The BoF concluded with the sharing of ideas that arose from the brainstorming sessions and a brief summary by the moderator.
△ Less
Submitted 27 December, 2013;
originally announced December 2013.
-
Creating A Galactic Plane Atlas With Amazon Web Services
Authors:
G. Bruce Berriman,
Ewa Deelman,
John Good,
Gideon Juve,
Jamie Kinney,
Ann Merrihew,
Mats Rynge
Abstract:
This paper describes by example how astronomers can use cloud-computing resources offered by Amazon Web Services (AWS) to create new datasets at scale. We have created from existing surveys an atlas of the Galactic Plane at 16 wavelengths from 1 μm to 24 μm with pixels co-registered at spatial sampling of 1 arcsec. We explain how open source tools support management and operation of a virtual clus…
▽ More
This paper describes by example how astronomers can use cloud-computing resources offered by Amazon Web Services (AWS) to create new datasets at scale. We have created from existing surveys an atlas of the Galactic Plane at 16 wavelengths from 1 μm to 24 μm with pixels co-registered at spatial sampling of 1 arcsec. We explain how open source tools support management and operation of a virtual cluster on AWS platforms to process data at scale, and describe the technical issues that users will need to consider, such as optimization of resources, resource costs, and management of virtual machine instances.
△ Less
Submitted 23 December, 2013;
originally announced December 2013.
-
Astrophysics Source Code Library: Incite to Cite!
Authors:
Kimberly DuPrie,
Alice Allen,
Bruce Berriman,
Robert J. Hanisch,
Jessica Mink,
Robert J. Nemiroff,
Lior Shamir,
Keith Shortridge,
Mark B. Taylor,
Peter Teuben,
John F. Wallin
Abstract:
The Astrophysics Source Code Library (ASCL, http://ascl.net/) is an online registry of over 700 source codes that are of interest to astrophysicists, with more being added regularly. The ASCL actively seeks out codes as well as accepting submissions from the code authors, and all entries are citable and indexed by ADS. All codes have been used to generate results published in or submitted to a ref…
▽ More
The Astrophysics Source Code Library (ASCL, http://ascl.net/) is an online registry of over 700 source codes that are of interest to astrophysicists, with more being added regularly. The ASCL actively seeks out codes as well as accepting submissions from the code authors, and all entries are citable and indexed by ADS. All codes have been used to generate results published in or submitted to a refereed journal and are available either via a download site or froman identified source. In addition to being the largest directory of scientist-written astrophysics programs available, the ASCL is also an active participant in the reproducible research movement with presentations at various conferences, numerous blog posts and a journal article. This poster provides a description of the ASCL and the changes that we are starting to see in the astrophysics community as a result of the work we are doing.
△ Less
Submitted 23 December, 2013;
originally announced December 2013.
-
Practices in source code sharing in astrophysics
Authors:
Lior Shamir,
John F. Wallin,
Alice Allen,
Bruce Berriman,
Peter Teuben,
Robert J. Nemiroff,
Jessica Mink,
Robert J. Hanisch,
Kimberly DuPrie
Abstract:
While software and algorithms have become increasingly important in astronomy, the majority of authors who publish computational astronomy research do not share the source code they develop, making it difficult to replicate and reuse the work. In this paper we discuss the importance of sharing scientific source code with the entire astrophysics community, and propose that journals require authors…
▽ More
While software and algorithms have become increasingly important in astronomy, the majority of authors who publish computational astronomy research do not share the source code they develop, making it difficult to replicate and reuse the work. In this paper we discuss the importance of sharing scientific source code with the entire astrophysics community, and propose that journals require authors to make their code publicly available when a paper is published. That is, we suggest that a paper that involves a computer program not be accepted for publication unless the source code becomes publicly available. The adoption of such a policy by editors, editorial boards, and reviewers will improve the ability to replicate scientific results, and will also make the computational astronomy methods more available to other researchers who wish to apply them to their data.
△ Less
Submitted 24 April, 2013;
originally announced April 2013.
-
Astrophysics Source Code Library
Authors:
Alice Allen,
Kimberly DuPrie,
Bruce Berriman,
Robert J. Hanisch,
Jessica Mink,
Peter J. Teuben
Abstract:
The Astrophysics Source Code Library (ASCL), founded in 1999, is a free on-line registry for source codes of interest to astronomers and astrophysicists. The library is housed on the discussion forum for Astronomy Picture of the Day (APOD) and can be accessed at http://ascl.net. The ASCL has a comprehensive listing that covers a significant number of the astrophysics source codes used to generate…
▽ More
The Astrophysics Source Code Library (ASCL), founded in 1999, is a free on-line registry for source codes of interest to astronomers and astrophysicists. The library is housed on the discussion forum for Astronomy Picture of the Day (APOD) and can be accessed at http://ascl.net. The ASCL has a comprehensive listing that covers a significant number of the astrophysics source codes used to generate results published in or submitted to refereed journals and continues to grow. The ASCL currently has entries for over 500 codes; its records are citable and are indexed by ADS. The editors of the ASCL and members of its Advisory Committee were on hand at a demonstration table in the ADASS poster room to present the ASCL, accept code submissions, show how the ASCL is starting to be used by the astrophysics community, and take questions on and suggestions for improving the resource.
△ Less
Submitted 9 December, 2012;
originally announced December 2012.
-
Bring out your codes! Bring out your codes! (Increasing Software Visibility and Re-use)
Authors:
Alice Allen,
Bruce Berriman,
Robert Brunner,
Dan Burger,
Kimberly DuPrie,
Robert J. Hanisch,
Robert Mann,
Jessica Mink,
Christer Sandin,
Keith Shortridge,
Peter Teuben
Abstract:
Progress is being made in code discoverability and preservation, but as discussed at ADASS XXI, many codes still remain hidden from public view. With the Astrophysics Source Code Library (ASCL) now indexed by the SAO/NASA Astrophysics Data System (ADS), the introduction of a new journal, Astronomy & Computing, focused on astrophysics software, and the increasing success of education efforts such a…
▽ More
Progress is being made in code discoverability and preservation, but as discussed at ADASS XXI, many codes still remain hidden from public view. With the Astrophysics Source Code Library (ASCL) now indexed by the SAO/NASA Astrophysics Data System (ADS), the introduction of a new journal, Astronomy & Computing, focused on astrophysics software, and the increasing success of education efforts such as Software Carpentry and SciCoder, the community has the opportunity to set a higher standard for its science by encouraging the release of software for examination and possible reuse. We assembled representatives of the community to present issues inhibiting code release and sought suggestions for tackling these factors.
The session began with brief statements by panelists; the floor was then opened for discussion and ideas. Comments covered a diverse range of related topics and points of view, with apparent support for the propositions that algorithms should be readily available, code used to produce published scientific results should be made available, and there should be discovery mechanisms to allow these to be found easily. With increased use of resources such as GitHub (for code availability), ASCL (for code discovery), and a stated strong preference from the new journal Astronomy & Computing for code release, we expect to see additional progress over the next few years.
△ Less
Submitted 9 December, 2012;
originally announced December 2012.
-
A Tale Of 160 Scientists, Three Applications, A Workshop and A Cloud
Authors:
G. Bruce Berriman,
Carolyn Brinkworth,
Dawn Gelino,
Dennis K. Wittman,
Ewa Deelman,
Gideon Juve,
Mats Rynge,
Jamie Kinney
Abstract:
The NASA Exoplanet Science Institute (NExScI) hosts the annual Sagan Workshops, thematic meetings aimed at introducing researchers to the latest tools and methodologies in exoplanet research. The theme of the Summer 2012 workshop, held from July 23 to July 27 at Caltech, was to explore the use of exoplanet light curves to study planetary system architectures and atmospheres. A major part of the wo…
▽ More
The NASA Exoplanet Science Institute (NExScI) hosts the annual Sagan Workshops, thematic meetings aimed at introducing researchers to the latest tools and methodologies in exoplanet research. The theme of the Summer 2012 workshop, held from July 23 to July 27 at Caltech, was to explore the use of exoplanet light curves to study planetary system architectures and atmospheres. A major part of the workshop was to use hands-on sessions to instruct attendees in the use of three open source tools for the analysis of light curves, especially from the Kepler mission. Each hands-on session involved the 160 attendees using their laptops to follow step-by-step tutorials given by experts. We describe how we used the Amazon Elastic Cloud 2 to run these applications.
△ Less
Submitted 16 November, 2012;
originally announced November 2012.
-
Collaborative Astronomical Image Mosaics
Authors:
Daniel S. Katz,
G. Bruce Berriman,
Robert G. Mann
Abstract:
This chapter describes how astronomical imaging survey data have become a vital part of modern astronomy, how these data are archived and then served to the astronomical community through on-line data access portals. The Virtual Observatory, now under development, aims to make all these data accessible through a uniform set of interfaces. This chapter also describes the scientific need for one com…
▽ More
This chapter describes how astronomical imaging survey data have become a vital part of modern astronomy, how these data are archived and then served to the astronomical community through on-line data access portals. The Virtual Observatory, now under development, aims to make all these data accessible through a uniform set of interfaces. This chapter also describes the scientific need for one common image processing task, that of composing individual images into large scale mosaics and introduces Montage as a tool for this task. Montage, as distributed, can be used in four ways: as a single thread/process on a single CPU, in parallel using MPI to distribute similar tasks across a parallel computer, in parallel using grid tools (Pegasus/DAGMan) to distributed tasks across a grid, or in parallel using a script-driven approach (Swift). An on-request web based Montage service is available for users who do not need to build a local version. We also introduce some work on a new scripted version of Montage, which offers ease of customization for users. Then, we discuss various ideas where Web 2.0 technologies can help the Montage community.
△ Less
Submitted 23 November, 2010;
originally announced November 2010.
-
Data Sharing Options for Scientific Workflows on Amazon EC2
Authors:
Gideon Juve,
Ewa Deelman,
Karan Vahi,
Gaurang Mehta,
Bruce Berriman,
Benjamin P. Berman,
Phil Maechling
Abstract:
Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often s…
▽ More
Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon's EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.
△ Less
Submitted 22 October, 2010;
originally announced October 2010.
-
The Application of Cloud Computing to Astronomy: A Study of Cost and Performance
Authors:
G. Bruce Berriman,
Ewa Deelman,
Gideon Juve,
Moira Regelson,
Peter Plavchan
Abstract:
Cloud computing is a powerful new technology that is widely used in the business world. Recently, we have been investigating the benefits it offers to scientific computing. We have used three workflow applications to compare the performance of processing data on the Amazon EC2 cloud with the performance on the Abe high-performance cluster at the National Center for Supercomputing Applications (NCS…
▽ More
Cloud computing is a powerful new technology that is widely used in the business world. Recently, we have been investigating the benefits it offers to scientific computing. We have used three workflow applications to compare the performance of processing data on the Amazon EC2 cloud with the performance on the Abe high-performance cluster at the National Center for Supercomputing Applications (NCSA). We show that the Amazon EC2 cloud offers better performance and value for processor- and memory-limited applications than for I/O-bound applications. We provide an example of how the cloud is well suited to the generation of a science product: an atlas of periodograms for the 210,000 light curves released by the NASA Kepler Mission. This atlas will support the identification of periodic signals, including those due to transiting exoplanets, in the Kepler data sets.
△ Less
Submitted 22 October, 2010;
originally announced October 2010.
-
The Application of Cloud Computing to the Creation of Image Mosaics and Management of Their Provenance
Authors:
G. Bruce Berriman,
Ewa Deelman,
Paul Groth,
Gideon Juve
Abstract:
We have used the Montage image mosaic engine to investigate the cost and performance of processing images on the Amazon EC2 cloud, and to inform the requirements that higher-level products impose on provenance management technologies. We will present a detailed comparison of the performance of Montage on the cloud and on the Abe high performance cluster at the National Center for Supercomputing Ap…
▽ More
We have used the Montage image mosaic engine to investigate the cost and performance of processing images on the Amazon EC2 cloud, and to inform the requirements that higher-level products impose on provenance management technologies. We will present a detailed comparison of the performance of Montage on the cloud and on the Abe high performance cluster at the National Center for Supercomputing Applications (NCSA). Because Montage generates many intermediate products, we have used it to understand the science requirements that higher-level products impose on provenance management technologies. We describe experiments with provenance management technologies such as the "Provenance Aware Service Oriented Architecture" (PASOA).
△ Less
Submitted 24 June, 2010;
originally announced June 2010.
-
Pipeline-Centric Provenance Model
Authors:
Paul Groth,
Ewa Deelman,
Gideon Juve,
Gaurang Mehta,
Bruce Berriman
Abstract:
In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronom…
▽ More
In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application.
△ Less
Submitted 24 May, 2010;
originally announced May 2010.
-
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
Authors:
Joseph C. Jacob,
Daniel S. Katz,
G. Bruce Berriman,
John Good,
Anastasia C. Laity,
Ewa Deelman,
Carl Kesselman,
Gurmeet Singh,
Mei-Hui Su,
Thomas A. Prince,
Roy Williams
Abstract:
Montage is a portable software toolkit for constructing custom, science-grade mosaics by composing multiple astronomical images. The mosaics constructed by Montage preserve the astrometry (position) and photometry (intensity) of the sources in the input images. The mosaic to be constructed is specified by the user in terms of a set of parameters, including dataset and wavelength to be used, locati…
▽ More
Montage is a portable software toolkit for constructing custom, science-grade mosaics by composing multiple astronomical images. The mosaics constructed by Montage preserve the astrometry (position) and photometry (intensity) of the sources in the input images. The mosaic to be constructed is specified by the user in terms of a set of parameters, including dataset and wavelength to be used, location and size on the sky, coordinate system and projection, and spatial sampling rate. Many astronomical datasets are massive, and are stored in distributed archives that are, in most cases, remote with respect to the available computational resources. Montage can be run on both single- and multi-processor computers, including clusters and grids. Standard grid tools are used to run Montage in the case where the data or computers used to construct a mosaic are located remotely on the Internet. This paper describes the architecture, algorithms, and usage of Montage as both a software toolkit and as a grid portal. Timing results are provided to show how Montage performance scales with number of processors on a cluster computer. In addition, we compare the performance of two methods of running Montage in parallel on a grid.
△ Less
Submitted 24 May, 2010;
originally announced May 2010.
-
The Role of Provenance Management in Accelerating the Rate of Astronomical Research
Authors:
G. Bruce Berriman,
Ewa Deelman
Abstract:
The availability of vast quantities of data through electronic archives has transformed astronomical research. It has also enabled the creation of new products, models and simulations, often from distributed input data and models, that are themselves made electronically available. These products will only provide maximal long-term value to astronomers when accompanied by records of their provenanc…
▽ More
The availability of vast quantities of data through electronic archives has transformed astronomical research. It has also enabled the creation of new products, models and simulations, often from distributed input data and models, that are themselves made electronically available. These products will only provide maximal long-term value to astronomers when accompanied by records of their provenance; that is, records of the data and processes used in the creation of such products. We use the creation of image mosaics with the Montage grid-enabled mosaic engine to emphasize the necessity of provenance management and to understand the science requirements that higher-level products impose on provenance management technologies. We describe experiments with one technology, the "Provenance Aware Service Oriented Architecture" (PASOA), that stores provenance information at each step in the computation of a mosaic. The results inform the technical specifications of provenance management systems, including the need for extensible systems built on common standards. Finally, we describe examples of provenance management technology emerging from the fields of geophysics and oceanography that have applicability to astronomy applications.
△ Less
Submitted 19 May, 2010;
originally announced May 2010.
-
Scientific Workflow Applications on Amazon EC2
Authors:
Gideon Juve,
Ewa Deelman,
Karan Vahi,
Gaurang Mehta,
Bruce Berriman,
Benjamin P. Berman,
Phil Maechling
Abstract:
The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and "pay as you go" usage-based pricing, it is not…
▽ More
The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and "pay as you go" usage-based pricing, it is not clear whether they are able to deliver the performance required for scientific applications at a reasonable price. In this paper we examine the performance and cost of clouds from the perspective of scientific workflow applications. We use three characteristic workflows to compare the performance of a commercial cloud with that of a typical HPC system, and we analyze the various costs associated with running those workflows in the cloud. We find that the performance of clouds is not unreasonable given the hardware resources provided, and that performance comparable to HPC systems can be achieved given similar resources. We also find that the cost of running workflows on a commercial cloud can be reduced by storing data in the cloud rather than transferring it from outside.
△ Less
Submitted 15 May, 2010;
originally announced May 2010.
-
Metadata and provenance management
Authors:
Ewa Deelman,
Bruce Berriman,
Ann Chervenak,
Oscar Corcho,
Paul Groth,
Luc Moreau
Abstract:
Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata an…
▽ More
Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata and provenance in the context of the data lifecycle. It also gives an overview of the approaches to metadata and provenance management, followed by examples of how applications use metadata and provenance in their scientific processes.
△ Less
Submitted 14 May, 2010;
originally announced May 2010.