Skip to main content

Showing 1–6 of 6 results for author: vandenBerg, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:cs/0208013  [pdf

    cs.DB cs.CE

    Petabyte Scale Data Mining: Dream or Reality?

    Authors: Alexander S. Szalay, Jim Gray, Jan vandenBerg

    Abstract: Science is becoming very data intensive1. Today's astronomy datasets with tens of millions of galaxies already present substantial challenges for data mining. In less than 10 years the catalogs are expected to grow to billions of objects, and image archives will reach Petabytes. Imagine having a 100GB database in 1996, when disk scanning speeds were 30MB/s, and database tools were immature. Such… ▽ More

    Submitted 7 August, 2002; originally announced August 2002.

    Comments: originals at http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2002-84

    Report number: MSR-TR-2002-84 ACM Class: H.2.8; J.2

    Journal ref: SIPE Astronmy Telescopes and Instruments, 22-28 August 2002, Waikoloa, Hawaii

  2. Online Scientific Data Curation, Publication, and Archiving

    Authors: Jim Gray, Alexander S. Szalay, Ani R. Thakar, Christopher Stoughton, Jan vandenBerg

    Abstract: Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to pres… ▽ More

    Submitted 7 August, 2002; originally announced August 2002.

    Comments: original at http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2002-74

    Report number: MSR-TR-2002-74 ACM Class: H.3.7; I.7.4; J.2; J.3; J.7

  3. arXiv:cs/0208011  [pdf

    cs.NI cs.DC

    TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange

    Authors: Jim Gray, Wyman Chong, Tom Barclay, Alex Szalay, Jan vandenBerg

    Abstract: Large datasets are most economically trnsmitted via parcel post given the current economics of wide-area networking. This article describes how the Sloan Digital Sky Survey ships terabyte scale datasets both within the US and to Europe and Asia. We 3GT storage bricks (Ghz processor, GB ram, GbpsEthernet, TB disk) for about 2k$ each. These bricks act as database servers on the LAN. They are loade… ▽ More

    Submitted 7 August, 2002; originally announced August 2002.

    Comments: original at http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2002-54

    Report number: MSR-TR-2002-54 ACM Class: C.2.0; C.2.4; C.4.4; H.3; K.6

  4. arXiv:cs/0202014  [pdf

    cs.DB cs.DL

    Data Mining the SDSS SkyServer Database

    Authors: Jim Gray, Alex S. Szalay, Ani R. Thakar, Peter Z. Kunszt, Christopher Stoughton, Don Slutz, Jan vandenBerg

    Abstract: An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load an… ▽ More

    Submitted 12 February, 2002; originally announced February 2002.

    Comments: 40 pages, Original source is at http://research.microsoft.com/~gray/Papers/MSR_TR_O2_01_20_queries.doc

    Report number: Microsoft Tech Report MSR TR 02 01 ACM Class: H.2.8; H.3.3; H.3.5; h.3.7; H.4.2

  5. arXiv:cs/0202013  [pdf

    cs.DL cs.DB

    The SDSS SkyServer: Public Access to the Sloan Digital Sky Server Data

    Authors: Alexander S. Szalay, Jim Gray, Ani R. Thakar, Peter Z. Kunszt, Tanu Malik, Jordan Raddick, Christopher Stoughton, Jan vandenBerg

    Abstract: The SkyServer provides Internet access to the public Sloan Digi-tal Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and archi-tecture. It also describes our experience operating the SkyServer on the Internet. The SDSS data is public and well-documented so it makes a good test platform for research on database algorithms and performa… ▽ More

    Submitted 12 February, 2002; originally announced February 2002.

    Comments: 12 pages, original word document at http://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer.doc

    Report number: MSR TR 01 104 ACM Class: H.3.7; H.3.5; H.2; H.3; H.4; H.5

    Journal ref: ACM SIGMOD 2002 proceedings

  6. arXiv:cs/0111015  [pdf

    cs.DL cs.DB

    The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data

    Authors: Alexander Szalay, Jim Gray, Ani Thakar, Peter Z. Kunszt, Tanu Malik, Jordan Raddick, Christopher Stoughton, Jan vandenBerg

    Abstract: The SkyServer provides Internet access to the public Sloan Digital Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and architecture. It also describes our experience operating the SkyServer on the Internet. The SDSS data is public and well-documented so it makes a good test platform for research on database algorithms and performanc… ▽ More

    Submitted 7 November, 2001; originally announced November 2001.

    Comments: submitted for publication, original at http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2001-104

    Report number: Microsoft Research TR 2001 104 ACM Class: H.3.5, H.4, J.2, H.2.8