The most important advantage of working with digital representations is that they can be computationally manipulated. Nothing that happens on the internet would be possible if this were not the case. Traditional analog sources, on the other hand, can really only be interpreted and used by people. The upshot of this is that you will get the most benefit from your research if you make it a habit to digitize any source immediately if it is not already in digital form.
For my “super-secret” monograph project, I am working entirely with digital sources. Since I am a historian, only a relatively small proportion of these were “born digital”. These are sources that were created by or with digital devices. Digital photographs and videos, e-mail messages, SMS texts, tweets, computer games, code and server logs are all examples of born digital sources. They make up an increasing proportion of all of the information in the world. The bulk of my sources actually began life as analog objects like books, letters, charts and film photographs, and were subsequently digitized by someone else. In the 1980s and 90s, it seemed as if most traditional sources might never be digitized. In the last seven years, however, Google alone has digitized more than a tenth of all of the books in the world. At this point, my money is on the eventual digitization of absolutely everything. So why not pitch in?
I still read physical papers and books, but instead of underlining them or writing notes on paper, I use an IRISPen to scan the quotes that I am going to want to access later. The pen scanner is faster than typing notes, even if you are a fast typist. If I think that the whole source will be useful (more on this in future posts) then I will use a standard desktop scanner to scan whole pages at a time. If I’m in an archive, I use whatever combination I can of handheld computer, laptop computer, digital camera and flatbed scanner. When I have research assistants, I ask them to scan sources for me rather than photocopy them.
For documents that are not born digital, the initial phase of digitization only creates digital pictures of text. Optical character recognition (OCR) is the next crucial step. I use the full version of Adobe Acrobat (Pro, not the free Reader) to add a text layer to any digital document that does not already have one. This is useful not only for documents that you digitize yourself, but also for documents that you download from the internet. Many sites use OCR on their documents to create a text layer for searching purposes, then strip out that layer from the PDFs that they make available for download. You can use Acrobat Pro to re-scan the document and add the text layer back in, and you will want to make it a habit to do that.