Contents |
[edit] Overview
LOCKSS is currently being used to preserve content in two distinct types of environments: a public LOCKSS network holds material of general interest to a wide community, and several private LOCKSS networks hold material for smaller communities. These correspond to two different models for providing enough replicas of the data to ensure a high probability of survival.
The public network is designed for material that is generally available on the internet, including subscription-only material. Anyone may participate in this network, by running one or more LOCKSS boxes. Each box may collect and preserve any content to which its host institution has access rights. Sufficient replication is ensured because the materials preserved in the public network are those that the community has agreed they wish to preserve.
Nodes in the public network are owned by their host institution. The network is maintained by the LOCKSS group with funding provided by the LOCKSS Alliance. As of Spring 2008, the public network comprises over 200 libraries worldwide and holds a collection of scientific and other journals, and ETDs. LOCKSS Alliance members have more titles available for them to preserve than libraries who have not joined the LOCKSS Alliance.
In contrast, material that is of interest to a small community, or that is sensitive, such as that held by government agencies, may be preserved in a Private LOCKSS Network (PLN). Participation in these private networks is controlled by the particular community. Individual entities or communities with too few members for sufficient replication may either run multiple LOCKSS boxes at each member/site, or combine with other such entities to reciprocally preserve each other's data.
[edit] Collection Evaluation
LOCKSS collects and preserves content in discrete collections called Archival Units (AUs), which are generally between several megabytes and several gigabytes in size. For example, for e-journals, an AU may be one volume (year) of one title. Any convenient grouping may be used, as long as the system can determine (via a plugin, below) whether or not any particular item belongs in any particular AU, usually by examining the item's name (URL).
The size of the initial collection (number and size of AUs) and the expected rate of growth should be determined. This will influence hardware decisions.
- Evaluation Criteria:
- How many objects do PLN members have?
- What is the most efficient way to organize these objects into AUs?
- How much space will these AUs take up?
- How much does the collection grow annually?
- What intellectual property restrictions need to be considered?
[edit] Hardware
Each AU should be replicated on a minimum of six LOCKSS boxes. We recommend at least seven, if possible. If a single copy of the collection is larger than the capacity of a single machine, additional multiples of machines will be needed. (All machines need not preserve the same set of AUs, as long as each AU has sufficient replicas somewhere in the network.)
Minimum machine specs are: 1GHz CPU, 512MB memory, and sufficient disk space. More memory is required for a large number of AUs (more than ~5000).
All the machines should be connected to a network where they will have access to the material they are supposed to collect, and to each other. Ideally they should be in different locations, so that a single physical event is not likely to impact all of them.
[edit] Software
The LOCKSS team supplies a bootable CD that turns a standard PC into a highly secure, easily configured, low maintenance preservation appliance. The CD is a specially configured version of OpenBSD; it runs on most standard PC hardware. We recommend this as the best solution in most situations.
If the LOCKSS CD cannot be used, the LOCKSS preservation software (the LOCKSS daemon) can be installed and run on Linux or other Unix systems that support Java. Installation scripts are provided for some Linux variants, but this method requires more work and knowledge on the part of the machine's administrator.
[edit] Plugins
Several aspects of the LOCKSS collection and preservation process must be customized for each application. This is done by building a Publisher Plugin, which captures the knowledge necessary to define any number of related AUs (such as those that share a common structure or publishing platform). More information about plugins.
[edit] Network Management
Configuring and running a PLN requires a few network services to be set up, and some databases built. Specifically, each LOCKSS box loads, from a network server, a set of runtime configuration parameters, plugins, and a title database describing individual AUs. These may be on the same or different servers, packaged in various combinations, depending on the application.
In addition, some monitoring of the network and constituent machines is recommended to ensure the continued health of the data.
[edit] Providing Access
The final step in building a PLN is to provide access to the preserved data. The appropriate mechanism will depend on the application; the two primary mechanisms are to configure your users or network to treat the LOCKSS boxes as HTTP proxy servers, see Proxy Integration, or to export the data to an external server.
[edit] Support for PLN members
All members of a PLN must be LOCKSS Alliance members and group discount rates are available. Support includes consulting on hardware and software for LOCKSS boxes, site design and plugin design, implementation and testing, server hosting of configuration parameters, plugins, and title database, etc., and assistance with proxy integration or content export.