Community Discussion: Best practices for digitising documents

An image of the historic archive of Guatemala's National Police. Photo by Tamy Guberek and Ann Harrison. Source: HRDAG website, and used with permission from HRDAG.

An image of the historic archive of Guatemala’s National Police. Photo by Tamy Guberek and Ann Harrison. Source: HRDAG website, and used with permission.

Summary and recordings available!

If you have physical documents related to your human rights work that you want to preserve, protect, or share with others, then learning good digitisation practices is vital.

Why digitise? Digitising your documents greatly improves access to your information, whether you are building an online public library to share documents related to corruption, or making documents searchable for your team. Digitisation also helps to preserve and protect important human rights information. Many defenders run the risk that malevolent groups seeking to destroy or confiscate witness testimony, evidence of abuse, and other sensitive information. Others run the risk of documents being subject to harmful storage conditions, such as humidity, insects, and rodents. These are just a few reasons for digitising your documents. However, figuring out the most efficient, affordable, and responsible way to digitise thousands of documents can be a daunting task.

In May, we hosted two webinars to discuss important considerations, common pitfalls, scanner and software recommendations, and other advice. We recruited practitioners to share their knowledge and experience on digitising documents. These presenters included:

Watch the webinars

The first of two webinars:

The second webinar:

How to join the discussion forum

In addition to the live webinar, we’re offering an opportunity for asynchronous discussion on this topic in our discussion forum. All you need to do is create an account on the HURIDOCS Collaboratory and add your comments to the discussion.

Note: Both the webinar and the forum are open to anyone to join so please be careful what you share – do not share any personal identifiable information of defenders or organisations with whom you work.


Summary

Table of contents:

Presenters of these webinars reiterated many of the best practices included in HURIDOCS existing guide on Digitising Your Human Rights Archive. Therefore, for this summary we will use the same basic digitisation steps from that guide and will include additional points, practices, examples, and resources shared by the presenters.

A typical digitisation process will most likely include these steps:

  1. Define the goal and scope
  2. Assess what is in your archive
  3. Determine how documents will be accessed
  4. Identify and locate the human resources and equipment
  5. Test a sample of your documents
  6. Scan, store and index documents

Presenters shared best practices related to the hyperlinked steps above.

Define the goal and the scope

The first step of your digitising project is to identify exactly what you’re hoping to achieve by digitising your documents. Your reasons for digitising will guide the next steps of the planning process. Organisations may choose digitisation for a number of reasons, including:

  • Internal use, such as for research, knowledge sharing or supporting documentation.
  • Dissemination – digitisation can greatly improve access to human rights information.
  • Longer-term preservation and protection of documents for the purpose of historical memory and future legal proceedings.
  • For use for other means, such as outreach, education materials, for advocacy purposes, or for legal cases.

For example, Alina Tiphange explained that for some civil society groups in India, digitisation assisted in tracking the overall state of human rights in the country over the long-term, as well as increasing the transparency of such work as NGOs face increasing levels of domestic scrutiny.

More information on how to define the goal and scope >>

Identify and locate the human resources and equipment

Your equipment will depending on the types of documentation, and may even influence a choice of outsourcing, which is standard practice for large projects.

Tips on choosing the right scanner

  • The best type of scanner to use depends on what you are digitising: not all information is textual, you could have photographs, posters, or various other forms of documentation.
  • A bulk scanner gets the digitisation done very quickly, so if you have a large set of documents which are in good condition, this is the best way to go.
  • Even with a good set of documents, however, some may be more weathered than others, so it may be good to also have a flatbed scanner, which gives a higher resolution image.
  • In an ideal world, you would want to be able to digitise quickly, so if you have some documents that need to be done on flatbed, you could put these aside to be scanned later.
  • Bert Verstappen explained that using optical character recognition (OCR) software, which recognises text in scanned files, digitised documents can also be published in a searchable format.

More information on how to Identify and locate the human resources and equipment >>

Scan, store and index documents

The structure of your digital archive is important. One presenter recommended that if possible, keep the structure of your physical archive, and make sure to add the archival information as metadata to the digital files.

You should also consider what you intend to do with the physical documents after digitisation. Preservation of paper copies can be valuable, and digitisation doesn’t necessarily replace this.

Using the right filenames

You always need a meaningful filename for your document, in order to make them as easy as possible to find again in future. If you use generic or unclear names, this will become difficult. Getting this right is essential, says Alina Tiphagne: “classification is a must – you need to know where to go to find your documents.”

Tips:

  • For large-scale projects, filenames need to include key information, particularly the date (in YYYY or YYYYMMDD format) as well as necessary references to context, creator or original medium. A good example of this could be something as simple as 2016_08_peru_receipt_001.tiff – while a filename such as Spending 2016 1.tiff would be much less useful.
  • Be consistent: don’t use special characters or spaces, and be cautious in avoiding the inclusion of personal identifying information, especially where the files pertain to persons at risk

Adding metadata to your files

Metadata refers to the descriptive information stored with your digitised documents. In thinking about metadata, consider what information you would need to retain for future use. Having documents in good condition, grouped meaningfully, with appropriate filenames and useful metadata, makes the process much easier and helps you find the documents in future. Furthermore, metadata and standardised terms are essential when working with libraries in order to allow documents to be searchable, as explained by Ann Marie Clark.

Digitising photographs? Elisabeth Baumgartner referred to the example of the South African History Archive’s Zenzo Nkobi Research Project that digitised the photographer’s extensive archive, including detailed metadata attached to each of the photographs.

Tips:

  • It’s good to add some creation information – date of creation, creator, person or institution, location if necessary, and potentially subjects – but you won’t need to use exact descriptions.
  • If you’re in a situation in which it isn’t possible to add metadata, a rough solution would be to leave a written or typed cover page with the physical files, to be added as metadata later on.
  • Both Ann Marie Clark and Alina Tiphagne noted that while quality will always be a concern, it is important to try to keep the process simple, especially where resources may be limited.

More information on how to scan, store and index documents >>

Digitisation challenges

Some of the challenges and considerations shared by presenters that organisations should be aware of before embarking on a digitisation process include:

  • The digitisation process can be complex, especially if you have large amounts of information – it may not always be possible to digitise all of your materials.
  • As with traditional archives, the personnel and funding available for digitisation can be limited, and it may not always be the priority for human rights organisations.
  • Where the information is very sensitive, redaction may be required – this is manual, intensive work, which may be difficult to involve the help of external assistants.
  • In the long-term, although it brings many benefits, digitisation may not necessarily be cheaper or easier than the storage of paper records.
  • Digitisation is not a long-term guarantee of preservation, so you should also consider retention of paper copies if this is possible
  • Digitising documents can be extremely time-consuming: Bert Verstappen explained those digitising the archives of the International Commission of Jurists spent around 1100 hours on the process.
  • It can be hard to integrate search engine optimization (SEO) into the digitisation process, as explained by Alina Tiphagne. There are techniques for making pages more easily accessible to search engines, including considering how documents could be classified in an SEO-friendly way.

Case study: digitisation of legal documents in Egypt

In the context of the recent crackdown on civil society in Egypt, many organisations were required to digitise documents in bulk at short notice to ensure preservation. Yasmin Shash’s presentation detailed her experiences digitising legal documentation relating to human rights cases. The vast majority of the files being digitised for these projects were prosecution documents from legal cases, usually written by hand and photocopied, and often in an almost unreadable state.

Yasmin explained a number of the challenges of digitising documents in this context, including:

  • Being unable to find OCR software that could accurately read handwritten Arabic
  • Dealing with the sheer scale of the documentation that many organisations were dealing with
  • Working with low quality documentation acquired from court photocopies
  • Lack of redaction: this places a moral responsibility on the person scanning the documents to consider the sensitivity of the information included
  • It can be difficult to dedicate personnel and time to digitisation, and sometimes it is necessary to compromise quality for the sake of speed
  • The ownership of the documentation is sometimes unclear, so publication can be legally risky as a result of this

Join the discussion forum

We invite you to continue this discussion in our HURIDOCS Collaboratory online forum. Share your own experiences, knowledge and challenges, or ask questions. Your participation is a valuable contribution to this conversation, so please join us!

Save

Save

Save

, , ,

Comments are closed.

Subscribe

  • Subscribe to our e-mail newsletter to receive updates.

Social Accounts

Search