Archives for category: Method

Research 24/7

2011/03/02 //

Although I do spend a lot of time thinking about my research, I also do other things like sleep, eat, go for walks, read books, teach, and so on. Fortunately, my research continues even when my attention is directed elsewhere. If I were working with traditional analog sources, the only way to accomplish this would be to hire human research assistants. They are expensive, need to eat and sleep etc., and might have better things to do than work on my project. Since I am working with digital sources, however, I can employ as many computational processes as I’d like to help me. It’s even possible to create computational processes that can make additional copies of themselves to help with a heavy workload–but that is an advanced technique that I won’t get into here.

This is one area where knowing how to program can make a really big difference in your research, because you can create programs to help you with any task that you can specify clearly. In my own research I am doing a lot of custom programming, but my goal here is to lay out a method that doesn’t require any programming.

Usually when you do a search, you skim through the results and read or bookmark whatever interests you. That is fine if you are looking for something in particular, but what do you do if you want to keep up with the news on a topic, or create a steady stream of results on a regular basis? Tara Calishain calls this process Information Trapping. The basic idea is to use focused searching to create RSS feeds, and then to monitor these feeds with a feed reader or feed aggregator. You can use a desktop application to read your feeds (like NetNewsWire on the Mac) or an online service like Google Reader. I have both kinds of reader set up, but for my ‘super-secret’ monograph I am actually reading the feeds with a different program, as I will describe in a future post.

The major search engines like Google and Yahoo! provide mechanisms for creating RSS feeds from searches. Once you have refined a particular search for something that you want to monitor (hint: always use the advanced search page), you can subscribe to a feed for that search, and your reader will let you know whenever it finds anything new.

Two additional techniques can make this strategy even more powerful. First of all, it is possible to use a service like Feed43 to create RSS feeds for any webpage. This has a bit more of a learning curve, but it allows you to monitor anything on the web. Second, Yahoo! Pipes provides a mechanism for combing RSS feeds with other kinds of computational processing. Again, there is a learning curve, but it’s well worth it. See Marshall Kirkpatrick’s “5 Minute Intro to Yahoo Pipes” to get started.

Categories Method

Make Everything Digital

2011/02/27 //

The most important advantage of working with digital representations is that they can be computationally manipulated. Nothing that happens on the internet would be possible if this were not the case. Traditional analog sources, on the other hand, can really only be interpreted and used by people. The upshot of this is that you will get the most benefit from your research if you make it a habit to digitize any source immediately if it is not already in digital form.

For my “super-secret” monograph project, I am working entirely with digital sources. Since I am a historian, only a relatively small proportion of these were “born digital”. These are sources that were created by or with digital devices. Digital photographs and videos, e-mail messages, SMS texts, tweets, computer games, code and server logs are all examples of born digital sources. They make up an increasing proportion of all of the information in the world. The bulk of my sources actually began life as analog objects like books, letters, charts and film photographs, and were subsequently digitized by someone else. In the 1980s and 90s, it seemed as if most traditional sources might never be digitized. In the last seven years, however, Google alone has digitized more than a tenth of all of the books in the world. At this point, my money is on the eventual digitization of absolutely everything. So why not pitch in?

I still read physical papers and books, but instead of underlining them or writing notes on paper, I use an IRISPen to scan the quotes that I am going to want to access later. The pen scanner is faster than typing notes, even if you are a fast typist. If I think that the whole source will be useful (more on this in future posts) then I will use a standard desktop scanner to scan whole pages at a time. If I’m in an archive, I use whatever combination I can of handheld computer, laptop computer, digital camera and flatbed scanner. When I have research assistants, I ask them to scan sources for me rather than photocopy them.

For documents that are not born digital, the initial phase of digitization only creates digital pictures of text. Optical character recognition (OCR) is the next crucial step. I use the full version of Adobe Acrobat (Pro, not the free Reader) to add a text layer to any digital document that does not already have one. This is useful not only for documents that you digitize yourself, but also for documents that you download from the internet. Many sites use OCR on their documents to create a text layer for searching purposes, then strip out that layer from the PDFs that they make available for download. You can use Acrobat Pro to re-scan the document and add the text layer back in, and you will want to make it a habit to do that.

Categories Method

Start with a Backup and Versioning Strategy

2011/02/22 //

One advantage of working with digital resources is that they can be duplicated and stored more-or-less for free. It is tempting to think of backup as a chore that can be deferred, but experience shows that if you try that you may end up putting it off until it is too late. If you set up your backup system before doing anything else, you have more peace of mind, and you have more freedom to experiment with your process, because you can undo anything that isn’t working out.

For my ‘super-secret’ monograph, I am using a number of different kinds of backup and version control, and my system is intentionally redundant. First of all, since I am working on a Mac, I have Time Machine set up to copy all of my files to an external hard drive at regular intervals. If I decide that I need an earlier copy of something, it is usually easiest to check there first. I also have a script that automatically makes a bootable copy of my machine every night on a different external hard drive using SuperDuper. If my system dies, I want to be able to plug that drive into another Mac and boot up my working environment without any interruptions or any need to reinstall and customize software. Occasionally I do want to reinstall software, so I also keep copies of all of the packages that I install in a separate directory (these are the .dmg files on the Mac).

So, if I lose a file, I’m covered. If a software update breaks something that was working, I’m covered. If my machine dies, I’m covered. What happens if my house gets stepped on by Godzilla while I’m out for my daily walk? (He goes for daily walks, too.) That’s where offsite backups come in. I’m using Jungle Disk to make daily copies of essential files in the cloud. That system is automated so that I don’t have to think about it. I am a pretty big fan of Jungle Disk… the entire NiCHE server got nuked once, and I was able to put a new copy of our site online in a few hours with no hassle. I also use Dropbox regularly, to explicitly back up files, to synchronize things between my various computers, and to share document drafts and large data sets with collaborators in the UK.

Free and easy backup is just one advantage of an all-digital workflow; version control is another. (If this is unfamiliar, see Julie Meloni’s “Gentle Introduction“). I’m using Git to version my source code and GitHub to share it with other programmers. I tend not to revert to earlier versions of my own prose, so I don’t use version control for writing, but it is certainly an option. If you are not programming and are mostly happy with what you write, you can use your backup system as a kind of version control.

Categories Method

On Operating in Stealth Mode

2011/02/21 //

I didn’t have any plans to write a ‘super-secret’ monograph when I started my sabbatical. Instead, I assumed that I would mostly be working on desktop fabrication and electronics. One day I was looking through some old notes that I had taken and I had a crazy idea. In itself that isn’t unusual, since one of my working principles is to take a note whenever I find a primary source that is weird, funny or salacious. What was unusual was that I didn’t feel like sharing my crazy idea. In part, I was afraid that if I told other people about it, they might pooh-pooh it before I had a chance to develop it. My idea needed buttressing so it could “withstand the assaults of a hostile environment.” (I think that I read Science in Action too carefully as a student).

Since I’m working entirely with digital sources, I decided to put together a set of interacting programs–an ‘ecology’, if you will–to speed up the process of finding, harvesting, clustering, excerpting, and keeping track of my sources and my attempts to make sense of them. The original collection of programs consisted of mostly off-the-shelf software tied together with a bit of Automator and Python scripting as necessary: no use in reinventing the wheel. And what I discovered really surprised me. Since the mid-1990s I’ve been going around telling people that the digitization of primary sources would change the way that we write history. I took all of my notes during my PhD on a succession of handheld computers. I even blogged about the future for three years in a resolutely optimistic mode. What I hadn’t quite realized is that at some point I should have started using the past tense: that future has already happened. If you want to work in a new way, everything that you need is ready to hand. With all-digital sources and a set of tools for working with them, my research process was about an order of magnitude faster than it was on my last monograph. Instead of asking myself, “Do I really want to spend seven years developing this idea into a book?” the question had become, “Is this idea worth ten months of my time?” Changing the information and transaction costs meant that doing a crazy project was a lot less of a commitment. It also meant that if I wanted to, I might write ten times as many books over the course of my lifetime as I had been planning to write. I still find the process of writing to be difficult, however, so that may not happen.

So before I knew it I was about two years into writing another monograph, at least by the old way of measuring investment, and I still didn’t feel like telling anyone what I was doing. There are pros and cons to talking about your topic while you’re writing. The pro is that you get the benefit of other people’s good ideas. That’s a con, too, because the community tends to shape your work toward a consensus that everyone can agree on. After reading Infotopia I’m more wary about the wisdom of crowds, including (or especially) the wisdom of peers. Plenty of time for peer review when I send the manuscript to a press and it goes out to readers. By that time, I will have already answered most of the objections that I could think of, and the reviewers will help me to distill any excess crazy out of the final product.

I was kind of dreading encountering well-meaning colleagues while on sabbatical, because the first question that everyone asks is “What are you working on?” Is it rude not to tell them? So I started telling people the short version of the above story, and have found that colleagues are actually quite supportive and encouraging. Some think that it is a great marketing gimmick or that I should include a decoder ring with the book. Some tell me about the drawbacks they’ve encountered in telling other people about their work in progress. Some like the idea of having complete freedom to change their minds while they’re working. And instead of sharing the topic, I’ve made it a policy to share my method with anyone who is curious. We end up having an interesting discussion about research methodology instead of a less interesting discussion about my current idée fixe. Let’s face it: very few people are as interested in your topic as you are, but many of them are deeply engaged in research practice.

If you’re dying to know about the method you can e-mail me for details, but I will be posting more about it here soon.

Categories Method

Research 24/7

Make Everything Digital

Start with a Backup and Versioning Strategy

On Operating in Stealth Mode

Projects

Recent

Archives

Categories