Correspondence Archives in the Age of Email: Technology, Privacy, and Policy Challenges

We send and receive more emails than many of us know how to conveniently organize. That’s no major obstacle when messages are used for everyday purposes like conducting business with limited senders and recipients. In archival collections, however, email messages take on a longer life, for a much larger audience, and offer a unique research value for future generations. Email will inform our children and grandchildren about who we were and how we lived.

Our ability to access personal correspondence between senior government officials has benefited observers in academia, politics, and the public. Email preservation, for example, has allowed historians to understand the full story of the Iran-Contra Affair, the scandal that rocked the political world in 1986:

This video demonstrates how email preservation allowed historians to study the Iran-Contra Affair, one of history's first scandals involving electronic correspondence. Adapted from the graphic “How to Read Email” in White House E-Mail by Tom Blanton (New York: The New Press, 1995). Courtesy of the National Security Archive.

Despite the trove of details a single email can unearth, rigorous cataloguing of email communication remains the exception, not the rule. Instead, those trying to understand the recent historical record are, too often, left feeling the way many of us do with our personal inboxes: searching in vain for that one elusive message. 

That’s why The Andrew W. Mellon Foundation and the Digital Preservation Coalition in the UK have organized a Task Force on Technical Approaches to Email Archives. The task force, composed of 18 members from national libraries, universities, archives, and industry, is halfway through a year-long process to assess current efforts to preserve email and develop a framework to address the challenges associated with email archives. By the end of 2017, the task force will report on its findings and recommendations for actions that archives could take in the next two to five years to safely acquire and preserve email for future research use.

While most email programs include an 'archive' function, creating email archives for future research requires the user—or the email platform—to follow a consistent storage and preservation plan for using the records. Because emails rely on complicated interactions of technical systems for composition, transmission, viewing, and storage, archival strategies have to consider a great deal—including the email’s content, its recipient(s), sender, date and time, and file attachments. A range of different email applications, representing both proprietary and open source systems, have been used over time by individuals and organizations.  

Archivists, technologists, and librarians have made progress and there are now more ways than ever to capture, save, and catalog various forms of digital expression. But email has remained resistant to a variety of efforts at preservation, and most archives and libraries still lack the technology and processes to systematically acquire it.

DiagramAn overview of the types of data contained in every email message. Courtesy of Joel Simpson.

Policy, privacy and even ethical concerns compound the technical difficulties of preserving email and making messages accessible to scholars and the public. Even in 2017, the matter of whom, exactly, "owns" email data is still unresolved. Is it your employer? The email service provider? The sender? The recipient(s)?

Organizations also need to make their own archiving decisions, based on needs that are specific to internal considerations. Those considerations are likely different from those of collecting institutions, who must balance the concerns of donors with the needs of their constituents.

Existing tools—such as ePADD, developed by Stanford University’s Special Collections & University Archives, and BitCurator, led by the School of Information and Library Science at the University of North Carolina—can help identify and redact private or sensitive information before messages are added to archival collections. Improving the accuracy of natural language processing techniques would give archivists and donors more confidence that email would be suitably preserved for future research use. The Task Force is also looking at tools outside the cultural heritage domain, such as digital forensics, legal protocols for email as evidence and commercial email services.

ePad interfaceAmong the tools around which the task force will develop implementation guidelines is ePADD, which supports screening, browsing, and access for email messages and attachments imported as MBOX or through IMAP. Screenshot courtesy of Stanford Libraries.

At the Coalition for Networked Information and Museums and the Web annual meetings in April, Task Force on Technical Approaches to Email Archives Co-Chairs Christopher Prom (University of Illinois at Urbana-Champaign) and Kate Murray (Library of Congress) presented their efforts and solicited community feedback on a working agenda around three issues: (1) articulating a technical framework, (2) determining how existing tools fit within this framework, and (3) identifying missing elements.

Those wanting to get more involved in the work of the Task Force are welcome the join the Friends of the Task Force or comment on draft versions of the report; simply send an email to Christopher Prom (prom@illinois.edu) or Kate Murray (kmur@loc.gov) to initiate contact.  A draft of the report will be available for public comment in autumn, and the final report will be available for download in December.