-
Web Infrastructure to Support e-Journal Preservation (and More)
Authors:
Herbert Van de Sompel,
David S. H. Rosenthal,
Michael L. Nelson
Abstract:
E-journal preservation systems have to ingest millions of articles each year. Ingest, especially of the "long tail" of journals from small publishers, is the largest element of their cost. Cost is the major reason that archives contain less than half the content they should. Automation is essential to minimize these costs. This paper examines the potential for automation beyond the status quo base…
▽ More
E-journal preservation systems have to ingest millions of articles each year. Ingest, especially of the "long tail" of journals from small publishers, is the largest element of their cost. Cost is the major reason that archives contain less than half the content they should. Automation is essential to minimize these costs. This paper examines the potential for automation beyond the status quo based on the API provided by CrossRef, ANSI/NISO Z39.99 ResourceSync, and the provision of typed links in publishers' HTTP response headers. These changes would not merely assist e-journal preservation and other cross-venue scholarly applications, but would help remedy the gap that research has revealed between DOIs' potential and actual benefits.
△ Less
Submitted 19 May, 2016;
originally announced May 2016.
-
Requirements for Digital Preservation Systems: A Bottom-Up Approach
Authors:
David S. H. Rosenthal,
Thomas S. Robertson,
Tom Lipkis,
Vicky Reich,
Seth Morabito
Abstract:
The field of digital preservation is being defined by a set of standards developed top-down, starting with an abstract reference model (OAIS) and gradually adding more specific detail. Systems claiming conformance to these standards are entering production use. Work is underway to certify that systems conform to requirements derived from OAIS.
We complement these requirements derived top-down…
▽ More
The field of digital preservation is being defined by a set of standards developed top-down, starting with an abstract reference model (OAIS) and gradually adding more specific detail. Systems claiming conformance to these standards are entering production use. Work is underway to certify that systems conform to requirements derived from OAIS.
We complement these requirements derived top-down by presenting an alternate, bottom-up view of the field. The fundamental goal of these systems is to ensure that the information they contain remains accessible for the long term. We develop a parallel set of requirements based on observations of how existing systems handle this task, and on an analysis of the threats to achieving the goal. On this basis we suggest disclosures that systems should provide as to how they satisfy their goals.
△ Less
Submitted 6 September, 2005; v1 submitted 6 September, 2005;
originally announced September 2005.
-
A Fresh Look at the Reliability of Long-term Digital Storage
Authors:
Mary Baker,
Mehul Shah,
David S. H. Rosenthal,
Mema Roussopoulos,
Petros Maniatis,
TJ Giuli,
Prashanth Bungale
Abstract:
Many emerging Web services, such as email, photo sharing, and web site archives, need to preserve large amounts of quickly-accessible data indefinitely into the future. In this paper, we make the case that these applications' demands on large scale storage systems over long time horizons require us to re-evaluate traditional storage system designs. We examine threats to long-lived data from an e…
▽ More
Many emerging Web services, such as email, photo sharing, and web site archives, need to preserve large amounts of quickly-accessible data indefinitely into the future. In this paper, we make the case that these applications' demands on large scale storage systems over long time horizons require us to re-evaluate traditional storage system designs. We examine threats to long-lived data from an end-to-end perspective, taking into account not just hardware and software faults but also faults due to humans and organizations. We present a simple model of long-term storage failures that helps us reason about the various strategies for addressing these threats in a cost-effective manner. Using this model we show that the most important strategies for increasing the reliability of long-term storage are detecting latent faults quickly, automating fault repair to make it faster and cheaper, and increasing the independence of data replicas.
△ Less
Submitted 30 August, 2005;
originally announced August 2005.
-
Notes On The Design Of An Internet Adversary
Authors:
David S. H. Rosenthal,
Petros Maniatis,
Mema Roussopoulos,
T. J. Giuli,
Mary Baker
Abstract:
The design of the defenses Internet systems can deploy against attack, especially adaptive and resilient defenses, must start from a realistic model of the threat. This requires an assessment of the capabilities of the adversary. The design typically evolves through a process of simulating both the system and the adversary. This requires the design and implementation of a simulated adversary bas…
▽ More
The design of the defenses Internet systems can deploy against attack, especially adaptive and resilient defenses, must start from a realistic model of the threat. This requires an assessment of the capabilities of the adversary. The design typically evolves through a process of simulating both the system and the adversary. This requires the design and implementation of a simulated adversary based on the capability assessment. Consensus on the capabilities of a suitable adversary is not evident. Part of the recent redesign of the protocol used by peers in the LOCKSS digital preservation system included a conservative assessment of the adversary's capabilities. We present our assessment and the implications we drew from it as a step towards a reusable adversary specification.
△ Less
Submitted 21 November, 2004;
originally announced November 2004.
-
Transparent Format Migration of Preserved Web Content
Authors:
David S. H. Rosenthal,
Thomas Lipkis,
Thomas Robertson,
Seth Morabito
Abstract:
The LOCKSS digital preservation system collects content by crawling the web and preserves it in the format supplied by the publisher. Eventually, browsers will no longer understand that format. A process called format migration converts it to a newer format that the browsers do understand. The LOCKSS program has designed and tested an initial implementation of format migration for Web content th…
▽ More
The LOCKSS digital preservation system collects content by crawling the web and preserves it in the format supplied by the publisher. Eventually, browsers will no longer understand that format. A process called format migration converts it to a newer format that the browsers do understand. The LOCKSS program has designed and tested an initial implementation of format migration for Web content that is transparent to readers, building on the content negotiation capabilities of HTTP.
△ Less
Submitted 21 November, 2004;
originally announced November 2004.
-
Attrition Defenses for a Peer-to-Peer Digital Preservation System
Authors:
T. J. Giuli,
Petros Maniatis,
Mary Baker,
David S. H. Rosenthal,
Mema Roussopoulos
Abstract:
In peer-to-peer systems, attrition attacks include both traditional, network-level denial of service attacks as well as application-level attacks in which malign peers conspire to waste loyal peers' resources. We describe several defenses for LOCKSS, a peer-to-peer digital preservation system, that help ensure that application-level attacks even from powerful adversaries are less effective than…
▽ More
In peer-to-peer systems, attrition attacks include both traditional, network-level denial of service attacks as well as application-level attacks in which malign peers conspire to waste loyal peers' resources. We describe several defenses for LOCKSS, a peer-to-peer digital preservation system, that help ensure that application-level attacks even from powerful adversaries are less effective than simple network-level attacks, and that network-level attacks must be intense, wide-spread, and prolonged to impair the system.
△ Less
Submitted 27 November, 2004; v1 submitted 28 May, 2004;
originally announced May 2004.
-
2 P2P or Not 2 P2P?
Authors:
Mema Roussopoulos,
Mary Baker,
David S. H. Rosenthal,
TJ Giuli,
Petros Maniatis,
Jeff Mogul
Abstract:
In the hope of stimulating discussion, we present a heuristic decision tree that designers can use to judge the likely suitability of a P2P architecture for their applications. It is based on the characteristics of a wide range of P2P systems from the literature, both proposed and deployed.
In the hope of stimulating discussion, we present a heuristic decision tree that designers can use to judge the likely suitability of a P2P architecture for their applications. It is based on the characteristics of a wide range of P2P systems from the literature, both proposed and deployed.
△ Less
Submitted 14 November, 2003;
originally announced November 2003.
-
On The Cost Distribution of a Memory Bound Function
Authors:
David S. H. Rosenthal
Abstract:
Memory Bound Functions have been proposed for fighting spam, resisting Sybil attacks and other purposes. A particular implementation of such functions has been proposed in which the average effort required to generate a proof of effort is set by parameters E and l to E * l. The distribution of effort required to generate an individual proof about this average is fairly broad. When particular use…
▽ More
Memory Bound Functions have been proposed for fighting spam, resisting Sybil attacks and other purposes. A particular implementation of such functions has been proposed in which the average effort required to generate a proof of effort is set by parameters E and l to E * l. The distribution of effort required to generate an individual proof about this average is fairly broad. When particular uses of these functions are envisaged, the choice of E and l, and the system design surrounding the generation and verification of proofs of effort, need to take the breadth of the distribution into account.
We show the distribution for this implementation, discuss the system design issues in the context of two proposed applications, and suggest an improved implementation.
△ Less
Submitted 6 November, 2003;
originally announced November 2003.
-
A Digital Preservation Appliance Based on OpenBSD
Authors:
David S. H. Rosenthal
Abstract:
The LOCKSS program has developed and deployed in a world-wide test a system for preserving access to academic journals published on the Web. The fundamental problem for any digital preservation system is that it must be affordable for the long term. To reduce the cost of ownership, the LOCKSS system uses generic PC hardware, open source software, and peer-to-peer technology. It is packaged as a…
▽ More
The LOCKSS program has developed and deployed in a world-wide test a system for preserving access to academic journals published on the Web. The fundamental problem for any digital preservation system is that it must be affordable for the long term. To reduce the cost of ownership, the LOCKSS system uses generic PC hardware, open source software, and peer-to-peer technology. It is packaged as a ``network appliance'', a single-function box that can be connected to the Internet, configured and left alone to do its job with minimal monitoring or administration. The first version of this system was based on a Linux boot floppy. After three years of testing it was replaced by a second version, based on OpenBSD and booting from CD-ROM.
We focus in this paper on the design, implementation and deployment of a network appliance based on an open source operating system. We provide an overview of the LOCKSS application and describe the experience of deploying and supporting its first version. We list the requirements we took from this to drive the design of the second version, describe how we satisfied them in the OpenBSD environment, and report on the initial
△ Less
Submitted 21 November, 2004; v1 submitted 30 March, 2003;
originally announced March 2003.
-
Preserving Peer Replicas By Rate-Limited Sampled Voting in LOCKSS
Authors:
Petros Maniatis,
Mema Roussopoulos,
TJ Giuli,
David S. H. Rosenthal,
Mary Baker,
Yanto Muliadi
Abstract:
The LOCKSS project has developed and deployed in a world-wide test a peer-to-peer system for preserving access to journals and other archival information published on the Web. It consists of a large number of independent, low-cost, persistent web caches that cooperate to detect and repair damage to their content by voting in "opinion polls." Based on this experience, we present a design for and…
▽ More
The LOCKSS project has developed and deployed in a world-wide test a peer-to-peer system for preserving access to journals and other archival information published on the Web. It consists of a large number of independent, low-cost, persistent web caches that cooperate to detect and repair damage to their content by voting in "opinion polls." Based on this experience, we present a design for and simulations of a novel protocol for voting in systems of this kind. It incorporates rate limitation and intrusion detection to ensure that even some very powerful adversaries attacking over many years have only a small probability of causing irrecoverable damage before being detected.
△ Less
Submitted 17 October, 2003; v1 submitted 25 March, 2003;
originally announced March 2003.