-
The Dark Energy Survey Data Management System
Authors:
Joseph J. Mohr,
Wayne Barkhouse,
Cristina Beldica,
Emmanuel Bertin,
Y. Dora Cai,
Luiz da Costa,
J. Anthony Darnell,
Gregory E. Daues,
Michael Jarvis,
Michelle Gower,
Huan Lin,
leandro Martelli,
Eric Neilsen,
Chow-Choong Ngeow,
Ricardo Ogando,
Alex Parga,
Erin Sheldon,
Douglas Tucker,
Nikolay Kuropatkin,
Chris Stoughton
Abstract:
The Dark Energy Survey collaboration will study cosmic acceleration with a 5000 deg2 griZY survey in the southern sky over 525 nights from 2011-2016. The DES data management (DESDM) system will be used to process and archive these data and the resulting science ready data products. The DESDM system consists of an integrated archive, a processing framework, an ensemble of astronomy codes and a da…
▽ More
The Dark Energy Survey collaboration will study cosmic acceleration with a 5000 deg2 griZY survey in the southern sky over 525 nights from 2011-2016. The DES data management (DESDM) system will be used to process and archive these data and the resulting science ready data products. The DESDM system consists of an integrated archive, a processing framework, an ensemble of astronomy codes and a data access framework. We are develo** the DESDM system for operation in the high performance computing (HPC) environments at NCSA and Fermilab. Operating the DESDM system in an HPC environment offers both speed and flexibility. We will employ it for our regular nightly processing needs, and for more compute-intensive tasks such as large scale image coaddition campaigns, extraction of weak lensing shear from the full survey dataset, and massive seasonal reprocessing of the DES data. Data products will be available to the Collaboration and later to the public through a virtual-observatory compatible web portal. Our approach leverages investments in publicly available HPC systems, greatly reducing hardware and maintenance costs to the project, which must deploy and maintain only the storage, database platforms and orchestration and web portal nodes that are specific to DESDM. In Fall 2007, we tested the current DESDM system on both simulated and real survey data. We used Teragrid to process 10 simulated DES nights (3TB of raw data), ingesting and calibrating approximately 250 million objects into the DES Archive database. We also used DESDM to process and calibrate over 50 nights of survey data acquired with the Mosaic2 camera. Comparison to truth tables in the case of the simulated data and internal crosschecks in the case of the real data indicate that astrometric and photometric data quality is excellent.
△ Less
Submitted 16 July, 2008;
originally announced July 2008.
-
Online Scientific Data Curation, Publication, and Archiving
Authors:
Jim Gray,
Alexander S. Szalay,
Ani R. Thakar,
Christopher Stoughton,
Jan vandenBerg
Abstract:
Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to pres…
▽ More
Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that pub-lished scientific data needs to be available forever ? this gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Survey (SDSS) strategies for data publication, data access, curation, and preservation.
△ Less
Submitted 7 August, 2002;
originally announced August 2002.
-
Data Mining the SDSS SkyServer Database
Authors:
Jim Gray,
Alex S. Szalay,
Ani R. Thakar,
Peter Z. Kunszt,
Christopher Stoughton,
Don Slutz,
Jan vandenBerg
Abstract:
An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load an…
▽ More
An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load and also a website for ad-hoc access. This paper reports on the database design, describes the data loading pipeline, and reports on the query implementation and performance. The queries typically translated to a single SQL statement. Most queries run in less than 20 seconds, allowing scientists to interactively explore the database. This paper is an in-depth tour of those queries. Readers should first have studied the companion overview paper Szalay et. al. "The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data" ACM SIGMOND 2002.
△ Less
Submitted 12 February, 2002;
originally announced February 2002.
-
The SDSS SkyServer: Public Access to the Sloan Digital Sky Server Data
Authors:
Alexander S. Szalay,
Jim Gray,
Ani R. Thakar,
Peter Z. Kunszt,
Tanu Malik,
Jordan Raddick,
Christopher Stoughton,
Jan vandenBerg
Abstract:
The SkyServer provides Internet access to the public Sloan Digi-tal Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and archi-tecture. It also describes our experience operating the SkyServer on the Internet. The SDSS data is public and well-documented so it makes a good test platform for research on database algorithms and performa…
▽ More
The SkyServer provides Internet access to the public Sloan Digi-tal Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and archi-tecture. It also describes our experience operating the SkyServer on the Internet. The SDSS data is public and well-documented so it makes a good test platform for research on database algorithms and performance.
△ Less
Submitted 12 February, 2002;
originally announced February 2002.
-
The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data
Authors:
Alexander Szalay,
Jim Gray,
Ani Thakar,
Peter Z. Kunszt,
Tanu Malik,
Jordan Raddick,
Christopher Stoughton,
Jan vandenBerg
Abstract:
The SkyServer provides Internet access to the public Sloan Digital Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and architecture. It also describes our experience operating the SkyServer on the Internet. The SDSS data is public and well-documented so it makes a good test platform for research on database algorithms and performanc…
▽ More
The SkyServer provides Internet access to the public Sloan Digital Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and architecture. It also describes our experience operating the SkyServer on the Internet. The SDSS data is public and well-documented so it makes a good test platform for research on database algorithms and performance.
△ Less
Submitted 7 November, 2001;
originally announced November 2001.