Preservation Principles
LOCKSS is built from the ground up to address the most significant threat to digital information: ourselves.
News headlines and research corroborate that deliberate and accidental human actions are the greatest threats to the persistence of digital information. A digital preservation system without proven defenses against human action is like a house with no roof: an architecture that is incomplete for its given purpose, providing the appearance of protection while in fact leaving its inhabitants vulnerable to the elements.
Considerations for robust digital preservation systems
The following questions can help in evaluating how another digital preservation system stands up against LOCKSS:
Is the system based on a robust and realistic threat model?
The foundation of the LOCKSS architecture and the risks that it intends to mitigate are openly articulated. Familiar digital preservation risks such as media failure and format obsolescence are accounted for, as well as less-commonly but more-importantly considered factors such as human error, malicious attack, and organizational failure.
Does the system rely on a canonical and incorruptible fixity store?
A canonical fixity store provides an alarmingly central point of failure or attack for all of the content a digital preservation system manages, regardless of how distributed the copies are. LOCKSS eschews a canonical fixity store in favor of the consensus of independent (i.e., mutually-distrusting), conferring peers. The long-term integrity of the content can therefore be assured with nothing more than (multiple copies of) the content itself.
Is the system decentralized, as well as distributed?
The idea of "lots of copies keep stuff safe" is a digital preservation best practice adopted more broadly than LOCKSS. However, a single software instance, a hardware platform, an organization, or personnel with administrative control over all distributed copies threatens the integrity of all of the copies simultaneously. LOCKSS is distinguished not just by many copies but by its peer-to-peer architecture, in which no network participant controls all of the copies.
Does the system routinely both validate and repair data?
A dark archive, into which content disappears only to reappear in a future emergency, does not engender confidence in either its availability or its correctness. Routinely accessing and checking data is the only way to ensure that the data remains intact. As part of its normal operation, a LOCKSS system regularly reads, validates, and (as needed) repairs stored data from its original source or trusted peers.
Does the system fail gracefully and gradually?
A system that operates quickly and automatically to repair detected damage will likewise fail quickly, by accelerating the speed at which compromised or corrupted data displaces good copies. LOCKSS is conservative and sophisticated about repairs; regular polls on data integrity surface disagreement but do not prompt automatic repairs except in cases of overwhelming consensus, allowing time for investigation and recovery policies to be implemented. This approach is particularly vital in the case of active attack, as tampering can be detected before all copies are compromised.
Does the system support local custody and control?
Hosted systems limit recourse and reliable, perpetual access to data in cases of system or organizational failure, evolving business models, or changes in the larger policy environment. As open-source software that organizations can run themselves, LOCKSS supports local content custody, use, and determination, not dependent on any service provider.