Digital Preservation in Action
How Web Archives Save Vanishing Government Information
What happens when a website that you have been relying on as a source is taken down? For scholars that use government data, resources, and information, this is a serious problem for research and academic integrity. Thankfully, information workers have been working to archive the internet.
A screenshot of a page taken down from CDC.gov.
A screenshot of a page taken down from the National Park Service website.
The primary function of archives is to preserve materials of all kinds. Some objects are easier to preserve than others. For instance, books can be archived in temperature controlled storage facilities. Posters, vinyl records, and administrative documents can be wrapped in polyethylene or acid-free tissue paper to ensure the fibre of the paper doesn’t get damaged or degrade. However, preserving materials that weren’t made to last for a long time brings conservation challenges. Librarians and archivists call materials that are quick to be created and quick to disappear "ephemera". And while archiving ephemera such as postcards, fliers, pamphlets, programmes, tickets and other things “designed to discard” can be resource intensive, it provides access to alternative stories of the past. Such stories are valuable because they are typically not the formalized narratives of history cemented in textbooks, but history as-it-was-lived everyday.
One can imagine that today, the internet provides an enormous amount of digital ephemera – social media posts, webpages, URLs, emails, to name a few. With the prevalence of born-digital materials (ie. materials that originate in a digital form, as opposed to digital reformatting, through which analog materials become digital), information workers have been working to archive the internet since its early days. This form of preservation is called “web archiving,” and it is crucial to local and systematic knowledge production. Web archiving is vital because it preserves digital culture, ensures accountability of online information, and protects against link rot and content disappearance, creating a permanent record of the web for future generations to access and study.
Web archiving efforts have evolved significantly since the early days of the internet. One of the pioneering initiatives is the Internet Archive's Wayback Machine, which has been capturing and preserving web pages since 1996. This digital library now contains billions of web pages, offering public access to historical versions of websites that have changed or disappeared entirely.
Beyond broad archiving efforts, specialized projects focus on preserving specific types of digital content. For instance, academic institutions have developed web archiving programs to capture scholarly resources, digital publications, and research data. Government information, in particular, has become a critical focus area for web preservation.
The importance of archiving government websites became especially apparent during administrative transitions. When new leadership takes office, government websites often undergo substantial changes—content may be removed, reorganized, or replaced to reflect new policies and priorities. These changes can create significant challenges for researchers who rely on consistent access to government-published information and data.
With the Trump Administration, government agencies once tasked with storing and disseminating information are now being defunded. For instance, ERIC, a vital database of education research run by the Department of Education, has experienced funding reductions and operational changes mandated by the Department of Government Efficiency (DOGE). This has led to a decrease in the number of new research reports and documents being added. Public helpdesk services have also been eliminated, and the number of actively cataloged journals is being significantly reduced. While existing records remain, these changes raise concerns about long-term access to and discoverability of crucial education research. With the new administration in office, the content of government web pages has changed drastically, with some websites having been taken down altogether.
Screenshot of Wayback Machine’s capture of government org website (Children’s Bureau Express) from December 15th, 2024.

Screenshot of the same CBX web page from April 28th, 2025.
Web Archiving Tools:
Here are some efforts working to keep a running record of government websites and the resources they provided.
End of Term Web Archive: A collaborative project that preserves government websites during presidential transitions.
Access the preserved websites of specific government organizations (CDC, Department of Education, etc.) through GovArchive.us, a service provided by Webrecorder. This platform offers a 'mirror site' – an exact replica of the original government website hosted on separate servers, ensuring the information remains available even if the original site changes or disappears.
DataRefuge: An initiative that focuses on preserving federal climate and environmental data.
Library of Congress Web Archives: Preserves websites of historical importance, including government resources.
Environmental Data & Governance Initiative (EDGI): Monitors changes to federal environmental websites and data.
These projects not only provide access to historical content but also document how government information evolves over time, enabling analysis of policy shifts and ensuring transparency in governance. Learn about more initiatives like these in this resource from the Medical Library Association: https://www.mlanet.org/article/accessing-public-data-removed-from-us-government-websites/.
How to search End of Term Collection 2024:
You can search EoT by keyword, which will bring up a list of captured web pages across government agencies. Or, you can filter for a specific organization or date.
Screenshot of EoT search for the keyword 'contraception' across all government domains.
Screenshot of EoT search: 'trans youth' is the keyword, domain filter is cdc.gov.
If there is a specific URL you would like to see past versions of, search the URL in the Wayback search bar.
Screenshot of Wayback Machine search for the URL: https://www.cdc.gov/howrightnow/get-help/index.html
Screenshot showing Wayback Machine’s captures of the URL overtime. 2024 versions of the web page list “TransLifeline” as a resource for LGBTQ youth, which is no longer included at the current URL.
Citing the Wayback Machine:
To cite web archived content, err on the side of more information. The reference should include the original URL, as well as the archive ‘capture’ that was taken on a certain date. For information, see this post:
Elizabeth Shown Mills, "Citing the Wayback Machine," blog post, QuickTips: The Blog @ Evidence Explained (https://www.evidenceexplained.com/quicktips/citing-wayback-machine : posted 5 July 2018).
If you would like help using any of these tools, please send us a message or set up a time to speak with a librarian.
References:
Barshay, J. (2025, April 28). Education Department restarts online library ERIC. The Hechinger Report. http://hechingerreport.org/proof-points-restart-eric-ed-library/
Mulligan, S. J. (2025, February 7). Inside the race to archive the US government’s websites. MIT Technology Review. https://www.technologyreview.com/2025/02/07/1111328/inside-the-race-to-archive-the-us-governments-websites/
Singer, E. (2025, February 2). Thousands of U.S. Government Web Pages Have Been Taken Down Since Friday. The New York Times. https://www.nytimes.com/2025/02/02/upshot/trump-government-websites-missing-pages.html
Zald, A. (2025, April 28). Research Guides: U.S. Federal Documents: Web Archives. Northwestern Libraries. https://libguides.northwestern.edu/usdocs/webarchives