Yale University Library Partners with Preservica
The Yale University Library, which recently celebrated its 314th anniversary, is collaborating with Preservica to preserve nearly one petabyte* (PB) of digital content.
The Yale Library comprises 15 libraries and houses more than 15 million volumes and information in all media, ranging from ancient papyri to early printed books to electronic databases. Renowned globally for the size and rarity of much of its archive, it is highly regarded as an archival repository for many major 20th century American leaders. It is also home to The Fortunoff Video Archive for Holocaust Testimonies which contains approximately 4400 videotaped interviews. These have been digitized for preservation and access purposes, due to the degradation of their original media, and will be ingested into Preservica.
“The Digital Preservation Services team deals with a high volume of both digitized and ‘born digital’ content (content created in digital form) and our goal is to create a sustainable infrastructure to ensure long-term access to our digital collections,” says Euan Cochrane, Yale’s Digital Preservation Manager. “We have nearly a petabyte of highly unique and valuable digital content, which we anticipate will grow by 10s of TB next year and at an exponential rate over coming years. Beyond our existing preservation efforts, we knew we needed to get a digital preservation system in place to handle our plans to scale.”
Cochrane joined the library in 2013, when the university identified the need for the dedicated expertise to lead a large scale digital preservation program. He led an appraisal process with in-house and outside consultants to assess both open source and commercial off-the-shelf digital preservation products, analyzing feature sets, ease of connectivity with existing information systems and scalability.
“We established that an ‘off-the-shelf’ product would best fit our pressing need to have digital preservation infrastructure in place as soon as possible. After evaluating both commercial and open source options, we decided to move forwards with a commercial option in order to allow us to focus our resources in the short to medium term on curating and preserving digital content rather than developing and maintaining software. None of the open source options met our current needs and extensive development work would have been needed to meet our requirements at a significant but unpredictable cost,” says Cochrane.
Selecting a commercial option also enabled The Yale Library to ensure a full understanding of the Total Cost of Ownership (TCO), which in turn enabled better predictions of software costs over the short and medium term - a key challenge when planning for long term stewardship of digital information.
Cochrane continues, “We chose Preservica because of the combination of features along with the extensible nature of its architecture that will allow us to scale and connect with other systems as we grow. The ability to easily migrate between file formats post-ingest, the ease of its storage management and the fact that every item has a complete audit trail were also really key to our decision. Once you’ve put something into the digital archive using Preservica, you will always be able to track its entire provenance and history, which is very important to us.”
To begin using Preservica, Yale University Library is launching a pilot ingest process using their collection of 60TB of master files produced through their monograph preservation-digitization program. The Digital Preservation Services team are ingesting the 60TB into the system via an automated workflow process. The team will then move on to ingesting the Beinecke Rare Book and Manuscript Library’s collection of born-digital materials, which includes email correspondence, drafts of poetry and prose, drafts for Sesame Street skits, digital photographs, and many other items of significance to researchers.
The inclusion of other collections from throughout the university was a requirement from the beginning. Preservica’s multi-tenant model allows some of the independent Yale libraries and galleries to have their own secure and independent user interface. To provide additional bandwidth for these tenants, the university can simply scale horizontally by adding more servers.
Preservica will integrate with ArchivesSpace, a web-based archives information management system already in use at the library. This will automatically synchronize metadata between the two systems, providing a single coherent view of both physical and digital artefacts. While much of the digital ingestion is already automated, the large size of the collection means that the migration effort of the video in the primary collection alone will take approximately six months.
Cochrane is positive about the project’s future. “Having Preservica in place is really exciting because we are now able to widen our scope to include more complex objects and entire new archives, and we can ensure that our unique digital collections are accessible and useable for future generations.”
Preservica CEO Jon Tilbury is enthusiastic about the collaboration. “This is a great opportunity to work with a world-renowned educational institution and to preserve objects of significant historical importance. Yale’s Digital Preservation Services team has always been at the forefront of technology development and application for digital preservation, and we are delighted to be part of this dedicated preservation program.”
*One petabyte is equal to 1,000,000 gigabytes (GB) or around 1,250,000 CDs.