Below is a portion of an abstract submitted to the Born Digital Collections, Archives, and Memory conference to be held in London in April 2025. As I started to prepare my lightning talk for the conference, I realized a short overview of methods for archiving digital projects might be useful as a reference for those I’m talking with.
Digital humanities (DH) projects have long suffered from limited funding, often leading to prototype applications being orphaned. This frequently happens when teams are redirected from unfunded projects to those with funding or to new project proposals. Even funded projects face challenges, as existing models primarily support early stages of development —planning, designing, and launching—leaving little for maintenance.
Among the core problems that arise from this are: the disappearance of valuable work and ideas such as codebase, user interface, or interaction design; limited ability for scholars to iterate on prior knowledge; and issues of discoverability and citation due to the limited documentation that is often produced as project teams are redirected.
Many of these projects are now deemed “endangered” as they sit on aging, out-of-date, and increasingly vulnerable web servers. Among the project types that the Digital Preservation Coalition’s Bitlist of Global Endangered Digital Species lists as Critically Endangered are Community-generated Content in Arts and Heritage, Digital Archives from Public Enquiries and Commissions, and Exhibition Content.
These projects can range from relatively simple websites with primarily text-based HTML files to more complex projects using open-source or commercial platforms. This creates a need to understand various ways to archive projects, as no single method seems to work for all platforms and programming languages.
Before you embark on archiving projects, it’s good to check to see if the project/site has already been archived either at the Internet Archive or in an institutional repository, institutional archive, etc.
Some Options for Archiving Projects
Internet Archive’s Wayback Machine
This is perhaps the easiest way to save a site or project – simply follow their instructions to submit a project by providing a link and let their tools do the work
Wget
For those familiar with command line use the wget command can be used to crawl a site and generate a .warc file of a site. This method works well for simple sites, especially those that were manually coded, but does work for some WordPress or similar sites.
Viewing and Storing Archived Sites
The .warc/.wacz files that wget and other tools generate are viewable on players such as the open source tool https://replayweb.page/ which will allow you to browse the site. Files can be stored on any web server, including in the Knowledge Commons Works repository.
