Skip to main content

Digital Preservation

Introduction

The following policies and standards are to be viewed in conjunction with the HKU Libraries Collection Development Policy. These policies cover the digitization projects undertaken by the HKU Libraries in achieving the following objectives:

  • Facilitating convenient online access to unique and priceless comprehensive digital materials for teaching, learning, and research purposes, regardless of the time and geography boundary.
  • Preserving unique and rare materials in the long term through the creation of digitized copies while minimizing the physical wear and tear caused by handling fragile collection items.
  • Generating new data and content through digitization, which not only enables openness but also adds value to new areas of research.

Selecting and prioritizing items for preservation

Items selected for digital collections should align with the scope, purpose, and intended audience of the collection. Also, when digitizing and preserving collections, prioritization is crucial. While each collection may set its policy and scope, it should still be on the base of the same digitization and preservation policies provided by the HKU Libraries. Materials that meet the following criteria should be prioritized and considered for collection development and digitization:

  • Uniqueness: : Unique items should be considered with a higher priority for digitization, as there is a lower chance of duplication of previous digitization works. Digitizing these materials will provide greater access to them as they are not widely available.
  • Fragility: Fragile items that can be digitized with minimal damage should be prioritized for digitization to reduce physical processing of originals and further damage. Items that are too fragile and cannot be digitized without significant damage should be repaired before digitization.
  • Historical significance: Items with significant historical value should be given a greater priority for digitization.
  • Integrity: Integral items that can be digitized with minimal damage should be prioritized for digitization, as completed and unaltered items ensure the stability, usability, and feasibility of the digital content.
  • Demand or use for access: Items known to have high usage or high research value should be considered a high priority for digitization as they have a higher potential usage after digitization. For example, items that support the university in teaching and research should be prioritized for digitization.
  • Copyright: All digitized items should comply with copyright law. Items within the public domain may be prioritized in digitization since they can be digitized and shared without permission from the author or rights holder.
  • Size/format: The size and format of items aligned with the current digitization infrastructure should be prioritized. The technical limitations may occur due to the capabilities of the current infrastructure in adapting varying sizes and formats of items. Hence, digitization may be considered by upgrading equipment or outsourcing.

Criteria for Content Exclusion

We make every effort to preserve and facilitate access to all content for our users. However, there may be instances where we are unable to accommodate certain materials or requests due to the following concerns:

  • For ethical, legal, and contractual reasons, any material containing personally identifiable information, sensitive information, export-controlled material, or primarily administrative information will be rejected as unsuitable for research purposes.
  • The format of the content is not supported by our libraries and/or the software obsoleted permanently to open the content.
  • The content of items did not comply with copyright, intellectual property rights, and other legal and moral rights related to copying, storage, modification of content, and the use of digital records.

The preservation period of contents

All collections aim to maintain their content throughout their useful lifespan. While we strive to ensure the long-term preservation and accessibility of digital collections and content by following the best practices for data management, we conduct regular retention reviews. As a result, items in current collections may be removed if they are no longer relevant or useful, or if there are ethical, legal, or contractual reasons for their removal.


Metadata and technical specification

We are pleased to invite you to contribute to our digital collections by providing clean datasets and digitized items. Collaboration with us will help develop digital collections that support research and teaching for both the HKU community and the public. If any selected items are not yet digitized, we will either scan them in-house or outsource them while helping you format and edit large amounts of data into meaningful structured datasets for Digital Scholarship projects.

Metadata is vital for preserving, exchanging, and maintaining long-term sustainable access to digital collections. It is essential to create descriptive and structural metadata that meet international standards before digitizing items. Currently, all digital collections in the HKU Libraries are using Dublin Core - a set of fifteen core elements for describing resources - as the minimum metadata standard for creating descriptive metadata. Metadata will vary in different digital collections, and they are developed based on the characteristics of the collection to align with the collection development and preservation guidelines.

For preservation purposes, high technical specification is needed for digital objects for reproduction in the long term. HKU Libraries establish a set of technical standards for in-house digitization.  This includes but is not limited to, formats such as books, journals, manuscripts, photos, maps, artifacts, 3D objects, audio, and video.  In general, the master file formats should be saved as uncompressed formats to preserve maximum information and data. 


Risk Management

Develop a risk management strategy that identifies potential threats to digital collections, such as data loss, cybersecurity breaches, and hardware/software failures. Implement proactive measures, such as regular backups and disaster recovery plans, to mitigate risks. The files should be securely stored on a device and backed up to other devices or servers, preferably not in the same location. This will ensure the files are preserved in case of natural disasters, human error, or equipment failure.


Digital Preservation Level

We are pleased to invite you to contribute to our digital collections by providing clean datasets and digitized items. Collaboration with us will help develop digital collections that support research and teaching for both the HKU community and the public. If any selected items are not yet digitized, we will either scan them in-house or outsource them while helping you format and edit large amounts of data into meaningful structured datasets for Digital Scholarship projects.

Level 1: Bit-Stream Level Preservation

At Level 1 preservation, the focus is on ensuring the integrity and authenticity of digital objects at the bit-stream level. This involves implementing robust storage solutions, regular integrity checks, and backup procedures to prevent data loss or corruption. Preservation activities at this level prioritize the maintenance of the original file format and structure to safeguard against data degradation and technological obsolescence.

Level 2: Proprietary Format Preservation

Level 2 preservation extends the principles of bit-stream level preservation to commercial proprietary formats. Recognizing the vulnerabilities associated with proprietary file formats, this level emphasizes the use of specialized tools and software, such as MS Excel, for format validation, emulation, or migration to open and widely supported standards. Preservation efforts at this level aim to mitigate the risks of format obsolescence, ensuring continued access to digital content over time.

Level 3: Proactive Migration for Long-Term Accessibility

At Level 3 preservation, the focus shifts towards proactive migration strategies to ensure long-term accessibility and usability of digital assets. Content stored in formats with limited compatibility or sustainability is identified and systematically migrated to widely adaptable formats, such as CSV (Comma-Separated Values). This level emphasizes the importance of future-proofing digital collections by prioritizing open, non-proprietary formats that are resilient to technological changes and evolving user needs.


Access

Digital collections accessible to the public online will include clearly defined ownership and copyright notices, enabling users to understand and comply with usage terms. Additionally, the policy will integrate accessibility standards to ensure that all users, including those with disabilities, can access the collections. This will involve providing alternative formats for visually impaired users and adhering to web accessibility guidelines.