Below is a portion of an abstract submitted to the Born Digital Collections, Archives, and Memory conference to be held in London in April 2025. As I started to prepare my lightning talk for the conference, I realized a short overview of methods for archiving digital projects might be useful as a reference for those I’m talking with.
Digital humanities (DH) projects have long suffered from limited funding, often leading to prototype applications being orphaned. This frequently happens when teams are redirected from unfunded projects to those with funding or to new project proposals. Even funded projects face challenges, as existing models primarily support early stages of development —planning, designing, and launching—leaving little for maintenance.
Among the core problems that arise from this are: the disappearance of valuable work and ideas such as codebase, user interface, or interaction design; limited ability for scholars to iterate on prior knowledge; and issues of discoverability and citation due to the limited documentation that is often produced as project teams are redirected.
Many of these projects are now deemed “endangered” as they sit on aging, out-of-date, and increasingly vulnerable web servers. Among the project types that the Digital Preservation Coalition’s Bitlist of Global Endangered Digital Species lists as Critically Endangered are Community-generated Content in Arts and Heritage, Digital Archives from Public Enquiries and Commissions, and Exhibition Content.
These projects can range from relatively simple websites with primarily text-based HTML files to more complex projects using open-source or commercial platforms. This creates a need to understand various ways to archive projects, as no single method seems to work for all platforms and programming languages.
Before you embark on archiving projects, it’s good to check to see if the project/site has already been archived either at the Internet Archive or in an institutional repository, institutional archive, etc.
Some Options for Archiving Projects
Internet Archive’s Wayback Machine
This is perhaps the easiest way to save a site or project – simply follow their instructions to submit a project by providing a link and let their tools do the work
Wget
For those familiar with command line use the wget command can be used to crawl a site and generate a .warc file of a site. This
Leave a Reply
You must be logged in to post a comment.