Skip to main content Skip to secondary navigation
Main content start

IIPC Web Archiving Conference

British Library, International Internet Preservation Consortium, Research Infrastructure for the Study of Archived Web Materials, University of London

Event Details:

Friday, April 14, 2017 - Sunday, April 16, 2017

Speaker(s):

Nicholas Taylor

International Internet Preservation Consortium logo

Nicholas Taylor will participate in the IIPC Web Archiving Conference. We look forward to connecting with any LOCKSS partners who may also be attending!

As Program Chair, Nicholas will make the opening plenary.

Nicholas will also present Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Architecture, as part of the session, "Leveraging APIs".

Here is the abstract:

The LOCKSS Program is one of the longest-running web archiving initiatives, though often not thought of as such, given its initial concern with the archiving of electronic journals and the comparative prominence of its distributed approach to digital preservation. The early inception of the program and its distinct content focus led over time to a divergence between the technologies if not the approaches used by LOCKSS relative to the web archiving mainstream. This has resulted in a monolithic LOCKSS software architecture, missed opportunities for the application of LOCKSS innovations outside of its historical domain, and increasingly duplicative engineering efforts applied to common challenges.

The LOCKSS software is now in the midst of a multi-year re-architecture effort that should redress these longstanding issues, to the benefit of both the LOCKSS Program and the web archiving community. By aligning with evolved best practices in web archiving and leveraging open-source software from the web archiving community and beyond, the LOCKSS Program will be able to do more, and more efficiently, and concomitantly help to bolster community stewardship and advance the state-of-the-art for the common web archiving tool stack. This approach reflects a recognition that the sustainability of the tools that enable both the LOCKSS Program and web archiving more generally depends upon an ongoing, robust community effort.

The fundamental aim of the re-architecture effort is to make the LOCKSS software more maintainable, extensible, and externally reusable. This will be accomplished by using existing open-source software solutions wherever possible and by re-implementing existing LOCKSS components as standalone web services. Examples of applications to be integrated include Heritrix, OpenWayback, Solr, Warcbase, and Warcprox. Among the LOCKSS-specific features to be made externally reusable are the audit and repair protocol, metadata extraction and querying services to support access to archived web resources via DOI and OpenURL, a new component for indexing WARCs into Solr, and an on-access format migration framework.

This session will highlight how the LOCKSS Program and LOCKSS software are evolving, and what opportunities that may present for the web archiving community.

Nicholas will also present Understanding Legal Use Cases for Web Archives, as part of the session, "Changing records for scholarship & legal use cases".

Here is the abstract:

In the broader cataloging of access use cases for web archives, legal use cases are often gestured at but rarely unpacked. How are archived web materials being used as evidence in litigation? What has been the evolving treatment of web archives by courts and litigators? How well understood are the particular affordances and limitations of web archives in legal contexts? What are the relevant rules, precedents, and best practices for authentication of evidence from web archives? How could the web archiving community better support this category of users, and what might be considerations for doing so?

There are by now many examples of the actual application of web archives for legal use cases with which to address these questions. A preliminary literature review suggests a good deal of attention in both cases and law journal articles to issues of authentication but an opportunity for greater education on specific aspects of web archives affecting their reliability. Web archives can (and do) serve as exceptional resources for substantiating claims based on historical, public web content. However, archived websites are far less "self-evident" than many other types of documents that may be used as evidence; their fitness for purpose should ideally entail assessments of:

  • canonicality (i.e., is this version of a webpage the same as would have been served to another user?);
  • completeness (i.e., how to assess the relevance of content missing from the archive?);
  • discreteness (i.e., are efforts being made to ensure that live web content is not leaking into the archival representation?); and
  • temporal coherence (i.e., how to assess the reliability of an archival composite of objects collected at varying points in time?).

Drawing in particular on analyses and findings from the U.S. legal context as well as relevant research from the web archiving field, this session will explore trends and opportunities relating to web archives and legal use cases.

Related Topics

Explore More Events