Many digital humanists are probably aware that they could make their research activities faster and more efficient by working at the command line. Many are probably also sympathetic to arguments for open source, open content and open access. Nevertheless, switching to Linux full-time is a big commitment. Virtualization software, like Oracle’s free VirtualBox, allows one to create Linux machines that run inside a window on a Mac or PC. Since these virtual machines can be created from scratch whenever you need one, they make an ideal platform for learning command line techniques. They can also be customized for particular research tasks, as I will show in later posts.

In this post I show how to create a stripped-down Debian Linux virtual machine inside VirtualBox. It does not have a GUI desktop installed, so you have to interact with it through commands entered in a shell (you can add your own GUI later, if you’d like). The screenshots come from a Mac, but the install process should be basically the same for a Windows PC.

To get started, you need to download two things.  The first of these is a disk image file (ISO) for the version of Linux you want to install.  These files are different depending on the processor in your computer.  For a recent Windows or Mac desktop (i.e., a 64-bit Intel machine), the file that you probably want is debian-testing-amd64-CD-1.iso.  For older machines, you may need a different disk image.  Check the Debian distribution page for more details. The other thing that you need to download is the Oracle VirtualBox software for your operating system. Once you have downloaded VirtualBox, install it and then start it.

The image below shows the VirtualBox Manager running on my Mac. I have already created three other Linux virtual machines, but we can ignore these.

00-virtualbox-managerTo create a new virtual machine, click the “New” button in the upper left hand corner of the Manager. Debian Linux comes in three standard flavours, known as “stable,” which is very solid but not very up-to-date, “testing,” which is pretty solid and reasonably up-to-date, and “unstable,” which is just that. The current code name for the testing version is “Wheezy”.  I like to name each of my virtual machines so I know what version of the operating system I am using.  I’m going to call this one “VBDebianWheezy64.”  You can call yours whatever you’d like.

01-create-virtual-machineOnce you click “Continue,” the VirtualBox software will ask you a number of questions. For this installation we can use the default recommendations: a memory size of 384 megabytes of RAM, a virtual hard drive formatted as a VDI (VirtualBox Disk Image), dynamically allocated disk storage, and 8 gigabytes for the virtual machine.






Once we have set all of the options for the virtual machine, we are returned to the VirtualBox Manager.

07-virtualbox-createdNow we choose the virtual machine we just created and click the “Start” button in the Manager. The new machine starts with a message about how the mouse is handled when the cursor is over the virtual machine window.


Once you’ve read and accepted the message, the virtual machine will ask you for a start-up disk.

09-select-startup-diskClick the file icon with the green up arrow on it, and you will be given a dialog that lets you choose the Debian ISO file you downloaded earlier.

10-choosing-iso-fileThe ISO file is now selected.


When you click “start” the Debian Install process will begin in the virtual machine window.


You can move around the installer options with the Up and Down arrows and Tab key. Use the Enter key to select an item. If there are options, you can usually turn them on or off with the Space bar. Here, press Enter to choose the “Install” option.

Next you want to select your language, location and preferred keyboard layout.



15-choose-keyboardThe installer will ask you for a hostname and a domain name. You can set the former to whatever you’d like; leave the latter blank unless you have a reason to set it.


17-blank-domain-nameNext, the installer will ask you for a root password. In Linux and Unix systems, the root account typically has the power to do everything, good and bad. Rather than setting a root password, we are going to leave the root password entry blank. The installer will respond by not creating a root account, but rather by giving the user account (i.e., you) sudo privileges.


19-confirm-blank-root-passwordNow that the root account is disabled, you can enter your own name, username and password, and set the time zone.





24-set-timezoneThe next set of screens ask you to specify how you would like the file system to be set up. As before, we will use the defaults. Later, when you are more familiar with creating virtual machines for specific tasks, you can tweak these as desired. We want guided partitioning, and we are going to use the entire virtual disk (this is the 8Gb dedicated to this particular virtual machine)

25-guided-entire-disk-partitionWe only have one disk to partition, so we choose it.

26-partition-disksWe want all of our files in one partition for now.  Later, if you decide to do a lot of experimentation with Linux you may prefer to put your stuff in separate partitions when you create new virtual machines.

27-all-files-one-partitionWe can finish the partitioning…


and write the changes to disk.

29-write-changes-to-diskNow the install process will ask us if we want to use other disk image files.  We do not.

30-dont-scan-another-diskWe are going to grab install files from the Internet instead of from install disk images. (If you are working in a setting where downloads are expensive, you may not wish to do this.) We set up a network mirror to provide the install files.


Tell the installer what country you are in.

32-choose-mirror-countryThen choose a Debian archive mirror. The default mirror is a good choice.

33-choose-mirrorNow the installer will ask if we want to use a proxy server. Leave this blank unless you have a reason to change it.

34-blank-http-proxyI opt out of the popularity contest.

35-no-popularity-contestDebian gives you a lot of options for pre-installed bundles of software.  On a desktop, I choose only the “Standard system utilities.” If I am on a laptop, I also include the “Laptop” bundle. I leave all of the other ones unchecked.  (You can always install more software later.) The “Debian desktop environment” is the GUI, which is mouse-and-icon based, like Windows and OS X.  I have found it is much easier to get in the habit of using command line tools if you don’t bother with the GUI, at least at first.

36-software-selectionThe final step is to install the Grub bootloader.


Now the virtual machine will reboot when you click “Continue”.

38-finish-installationThis is the login prompt for your new Debian virtual machine.

39-debian-virtual-machine-loginYou can use Linux commands to shutdown the virtual machine if you would like.  You can also save it in such a way that it will resume where you left off when you reload it in VirtualBox. In the VirtualBox Manager, right click on the virtual machine and choose “Close”->”Save State”. That is shown in the next screenshot.

40-close-vmYou can save backups of your virtual machine whenever you reach a crucial point in your work, store VMs in the cloud, and share them with colleagues or students. You can also create different virtual machines for different tasks and use them to try out other Linux distributions. On my Macs, I also have Win XP and Win 7 VMs so I can run Windows-only software.



For a couple of years I have been working on outfitting the History Department at Western University with a new digital lab and classroom, funded by a very generous grant from our provost. The spaces are now open and mostly set up, and our graduate students and faculty have started to form working groups to teach themselves how to use the hardware and software and to share what they know with others. There is tremendous excitement about the potential of our lab, which is understandable. I believe that it is the best-equipped such space in the world: historians at Western now have their own complete Fab Lab.

In provisioning the lab and classroom, I wanted to strike a balance between supporting the kinds of activities that are typically undertaken in digital history and digital humanities projects right now, while also enabling our students and faculty to engage in the kind of “making in public” that many people argue will characterize the humanities and social sciences in the next decade.

Here is a high-level sketch of our facilities, organized by activity. The lab inventory actually runs to thousands of items, so this just an overview.


To date, the facilities have been used most fully by Devon Elliott, a PhD student who is working with Rob MacDougall and I. Devon’s dissertation is on the technology and culture of stage magic. In his work, he designs electronics, programs computers, does 3D scanning, modeling and printing, builds illusions and installations and leads workshops all over the place. You can learn more about his practice in a recent edition of the Canadian Journal of Communication and in the forthcoming #pastplay book edited by Kevin Kee. (Neither of these publications are open access yet, but you can email me for preprints.) Devon and I are also teaching a course on fabrication and physical computing at DHSI this summer with Jentery Sayers and Shaun Macpherson.

Past students in my interactive exhibit design course have also used the lab equipment to build dozens of projects, including a robot that plays a tabletop hockey game, a suitcase that tells the stories of immigrants, a batting helmet to immerse the user in baseball history, a device that lets the Prime Ministers on Canadian money tell you about themselves, a stuffed penguin in search of the South Pole, and many others. This year, students in the same class have begun to imagine drumming robots, print 3D replicas of museum artifacts, and make the things around them responsive to people.

In the long run, of course, the real measure of the space will be what kind of work comes out of it. While I don’t really subscribe to the motto “if you build it, they will come”, I do believe that historians who want to work with their hands as well as their heads have very few opportunities to do so. We welcome you! We’re very interested in taking student and post-doc makers and in collaborating with colleagues who are dying to build something tangible. Get excited and make things!

(If you have Mathematica you can download this as a notebook from my GitHub account. It is also available as a CDF document which can be read with Wolfram’s free CDF Player.)


For a couple of years now I have been using Mathematica as my programming language of choice for my digital history work. For one thing, I love working with notebooks, which allow me to mix prose, citations, live data, executable code, manipulable simulations and other elements in a single document. I also love the generality of Mathematica. For any kind of technical work, there is usually a well-developed body of theory that is expressed in objects drawn from some branch of mathematics. Chances are, Mathematica already has a large number of high-level functions for working with those mathematical objects. The Mathematica documentation is excellent, if necessarily sprawling, since there are literally thousands of commands. The challenge is usually to find the commands that you need to solve a given problem. Since few Mathematica programmers seem to be working historians or humanists dealing with textual sources, it can be difficult to figure out where to begin.

Using a built-in text

As a sample text, we will use the Darwin’s Origin of Species from Mathematica‘s built-in example database. The Short command shows a small piece of something large. Here we’re asking to see the two lines at the beginning and end of this text.

sample = ExampleData[{"Text", "OriginOfSpecies"}];
Short[sample, 2]

Mathematica responds with

INTRODUCTION. When on board H.M.S. ... have been, and are being, evolved.

The Head command tells us what something is. Our text is currently a string, an ordered sequence of characters.


Extracting part of a string

Suppose we want to work with part of the text. We can extract the Introduction of Origin by pulling out everything between “INTRODUCTION” and “CHAPTER 1″. The command that we use to extract part of a string is called StringCases. Once we have extracted the Introduction, we want to check to make sure that the command worked the way we expected. Rather than look at the whole text right now, we can use the Short command to show us about five line of the text. It returns a couple of phrases at the beginning and end, using ellipses to indicate the much larger portion which we are not seeing.

INTRODUCTION. When on board H.M.S. 'Beagle,' as naturalist, I was much struck with certain fac... nced that Natural Selection has been the main but not exclusive means of modification. CHAPTER

Note the use of the Shortest command in the string matching expression above. Since there are probably multiple copies of the word “CHAPTER” in the text, we have to tell Mathematica how much of the text we want to match… do we want the portion between “INTRODUCTION” and the first instance of the word, the second, the last? Here are two examples to consider:


From a string to a list of words

It will be easier for us to analyze the text if we turn it into a list of words. In order to eliminate punctuation, I am going to get rid of everything that is not a word character. Note that doing things this way turns the abbreviation “H.M.S.” into three separate words.


Mathematica has a number of commands for selecting elements from lists. The Take command allows us to extract a given number of items from the beginning of a list.


The First command returns the first item in a list, and the Rest command returns everything but the first element. The Last command returns the last item.


We can also use an index to pull out list elements.


We can test whether or not a given item is a member of a list with the MemberQ command.

MemberQ[introList, "naturalist"]
MemberQ[introList, "naturist"]

Processing each element in a list

If we want to apply some kind of function to every element in a list, the most natural way to accomplish this in Mathematica is with the Map command. Here we show three examples using the first 40 words of the Introduction. Note that Map returns a new list rather than altering the original one.

Map[ToUpperCase, Take[introList, 40]]
Map[ToLowerCase, Take[introList, 40]]
Map[StringLength, Take[introList, 40]]

Computing word frequencies

In order to compute word frequencies, we first convert all words to lowercase, the sort them and count how often each appears using the Tally command. This gives us a list of lists, where each of the smaller lists contains a single word and its frequency.


Finally we can sort our tally list by the frequency of each item. This is traditionally done in descending order. In Mathematica we can change the sort order by passing the Sort command an anonymous function. (It isn’t crucial for this example to understand exactly how this works, but it is explained in the next section if you are curious. If not, just skip ahead.)

sortedFrequencyList = Sort[wordFreq, #1[[2]] > #2[[2]] &];
Short[sortedFrequencyList, 8]

Here are the twenty most frequent words:

Take[sortedFrequencyList, 20]
{{"the", 100}, {"of", 91}, {"to", 54}, {"and", 52}, {"i", 44},
{"in", 37}, {"that", 27}, {"a", 24}, {"this", 20}, {"it", 20},
{"be", 20}, {"which", 18}, {"have", 18}, {"species", 17},
{"on", 17}, {"is", 17}, {"as", 17}, {"my", 13}, {"been", 13},
{"for", 11}}

The Cases statement pulls every item from a list that matches a pattern. Here we are looking to see how often the word “modification” appears.

Cases[wordFreq, {"modification", _}]
{{"modification", 4}}

Aside: Anonymous Functions

Most programming languages let you define new functions, and Mathematica is no exception. You can use these new functions with built-in commands like Map.


Map[plus2, {1, 2, 3}]
{3, 4, 5}

Being able to define functions allows you to

  • hide details: as long as you can use a function like plus2 you may not care how it works
  • reuse and share code: so you don’t have to keep reinventing the wheel.

In Mathematica, you can also create anonymous functions. One way of writing an anonymous function in Mathematica is to use a Slot in place of a variable.

# + 2 &

So we don’t have to define our function in advance, we can just write it where we need it.

Map[# + 2 &, {1, 2, 3}]
{3, 4, 5}

We can apply an anonymous function to an argument like this, and Mathematica will return the number 42.

(# + 2 &)[40]

A named function like plus2 is still sitting there when we’re done with it. An anonymous function disappears immediately after use.


The Partition command can be used to create n-grams. This tells Mathematica to give us all of the partitions of a list that are two elements long and that are offset by one.

bigrams = Partition[lowerIntroList, 2, 1];
Short[bigrams, 8]

We can tally and sort bigrams, too.

sortedBigrams = Sort[Tally[bigrams], #1[[2]] > #2[[2]] &];
Short[sortedBigrams, 8]

Concordance (Keyword in Context)

A concordance shows keywords in the context of surrounding words. We can make one of these quite easily if we starting by generating n-grams. Then we use Cases to pull out all of the 5-grams in the Introduction that have “organic” as the middle word (for example), and format the output with the TableForm command.

affinities of    organic beings on
several distinct organic beings by
coadaptations of organic beings to
amongst all      organic beings throughout
succession of    organic beings throughout

Removing stop words

Mathematica has access to a lot of built-in, curated data. Here we grab a list of English stopwords.

stopWords = WordData[All, "Stopwords"];
Short[stopWords, 4]

The Select command allows us to use a function to pull items from a list. We want everything that is not a member of the list of stop words.

Short[lowerIntroList, 8]
lowerIntroNoStopwords = 
  Select[lowerIntroList, Not[MemberQ[stopWords, #]] &];
Short[lowerIntroNoStopwords, 8]

Bigrams containing the most frequent words

Here is a more complicated example built mostly from functions we’ve already seen. We start by finding the ten most frequently occuring words once we have gotten rid of stop words.

freqWordCounts = 
     lowerIntroNoStopwords, {1, -120}]], #1[[2]] > #2[[2]] &], 10]

We remove a few of the words we are not interested in, then we rewrite the bigrams as a list of graph edges. This will be useful for visualizing the results as a network.

freqWords = 
  Complement[Map[First, freqWordCounts], {"shall", "subject"}];
edgeList = 
  Map[#[[1]] -> #[[2]] &, Partition[lowerIntroNoStopwords, 2, 1]];
Short[edgeList, 4]

We grab the most frequent ones.

freqBigrams = Union[Select[edgeList, MemberQ[freqWords, #[[1]]] &],
   Select[edgeList, MemberQ[freqWords, #[[2]]] &]];
Short[freqBigrams, 4]

Finally we can visualize the results as a network. When you are exploring a text this way, you often want to keep tweaking your parameters and see if anything interesting comes up.

   Method -> {"SpringElectricalEmbedding", 
     "InferentialDistance" -> .1, "RepulsiveForcePower" -> -4}, 
   VertexLabeling -> True, DirectedEdges -> True, 
   ImageSize -> {1100, 800}], {400, 400}, Scrollbars -> True, 
  ScrollPosition -> {400, 200}]]

Document frequencies

We have been looking at the Introduction to Origin. We can also calculate word frequencies for the whole document. When we list the fifty most common words (not including stop words) we can get a better sense of what the whole book is about.

sampleList = 
  Map[ToLowerCase, StringSplit[sample, Except[WordCharacter] ..]];
docFreq = Sort[Tally[Sort[sampleList]], #1[[2]] > #2[[2]] &];
Take[Select[Take[docFreq, 200], 
  Not[MemberQ[stopWords, First[#]]] &], 50]

TF-IDF: Term frequency-Inverse document frequency

The basic intuition behind tf-idf is as follows…

  • A word that occurs frequently on every page doesn’t tell you anything special about that page. It is a stop word.
  • A word that occurs only a few times in the whole document or corpus can be ignored.
  • A word that occurs a number of times on one page but is relatively rare in the document or corpus overall can give you some idea what the page is about.

Here is one way to calculate tf-idf (there are lots of different versions)

     Log[termfreq+1.0] Log[numdocs/docfreq]

Using document frequencies and TF-IDF we can get a sense of what different parts of a text are about. Here is how we would analyze chapter 9 (there are 15 chapters in all).

ch9 = StringCases[sample, Shortest["CHAPTER 9" ~~ __ ~~ "CHAPTER"]][[
ch9List = Map[ToLowerCase, StringSplit[ch9, Except[WordCharacter] ..]];
ch9Terms = Union[ch9List];
ch9TermFreq = Sort[Tally[ch9List], #1[[2]] > #2[[2]] &];
ch9DocFreq = Select[docFreq, MemberQ[ch9Terms, #[[1]]] &];
computeTFIDF[termlist_, tflist_, dflist_] :=
 Module[{outlist, tf, df},
  outlist = {};
   tf = Cases[tflist, {t, x_} -> x][[1]];
   df = Cases[dflist, {t, x_} -> x][[1]];
   outlist = Append[outlist, {t, tf, df, tfidf[tf, df, 15.0]}],
   {t, termlist}];
ch9TFIDF = 
  Sort[computeTFIDF[ch9Terms, ch9TermFreq, 
    ch9DocFreq], #1[[4]] > #2[[4]] &];
Take[ch9TFIDF, 50][[All, 1]]

Whether or not you are familiar with nineteenth-century science, it should be clear that the chapter has something to do with geology. Darwin also provided chapter summaries of his own:

StringTake[ch9, 548]
of intermediate varieties at the present day. On the nature of extinct 
intermediate varieties; on their number. On the vast lapse of time, as 
inferred from the rate of deposition and of denudation. On the poorness 
of our palaeontological collections. On the intermittence of geological 
formations. On the absence of intermediate varieties in any one formation. 
On the sudden appearance of groups of species. On their sudden appearance 
in the lowest known fossiliferous strata.

In September, Tim Hitchcock and I had a chance to meet with Adam Farquhar at the British Library to talk about potential collaborative research projects. Adam suggested that we might do something with a collection of about 25,000 E-books. Although I haven’t had much time yet to work with the sources, one of the things that I am interested in is using techniques from image processing and computer vision to supplement text mining. As an initial project, I decided to see if I could find a way to automatically extract images from the collection.

My first thought was that I might be able to identify text based on its horizontal and vertical correlation. Parts of the image that were not text would then be whitespace, illustration or marginalia. (One way to do this in Mathematica is to use the ImageCooccurence function). As I was trying to figure out the details, however, I realized that a much simpler approach might work. Since the method seems promising I decided to share it so that other people might get some use out of it (or suggest improvements).

In a British Library E-book, each page has a JPEG page image and an associated ALTO (XML) file which contains the OCRed text. The basic idea is to compare the JPEG image file size with the ALTO file size for the same page. Pages that have a lot of text (and no images) should have large ALTO files relative to the size of the JPEG. Pages with an image but little or no text should have a large JPEG relative to the size of the ALTO file. Blank pages should have relatively small JPEG and ALTO files.

The graph below shows the results for an E-book chosen at random from the sample. Textual pages cluster, pages with images tend to cluster, and blank pages (and covers) fall out along one axis because they have no text at all. We can use more sophisticated image processing and machine learning to further subdivide images and extract them once they are located, but this seems pretty good for a first pass.

In my previous post, I showed how to connect an Arduino microcontroller to Mathematica on Mac OS X using the SerialIO package.  It is also quite straightforward to interact with Phidgets.  In this case we can take advantage of Mathematica’s J/Link Java interface to call the Phidgets API.  This is basically a ‘hello world’ demonstration.  For a real application you would include error handling, event driven routines, and so on.  For more details, read the Java getting started tutorial and Phidgets programming manual, then look at the sample code and javadocs on this page.

Start by installing the Mac OS X Phidgets driver on your system. Once you have run Phidgets.mpkg you can open System Preferences and there will be a pane for Phidgets.  For my test, I used a PhidgetInterfaceKit 8/8/8 with an LED on Output 2 and a 60mm slider (potentiometer) attached to Sensor 0. Once you have the hardware configuration you like, plug the InterfaceKit into the USB. It should show up in the General tab of system preferences. If you double click on the entry, it will start a demonstration program that allows you to make sure you can toggle the LED and get values back from the slider. When everything is working correctly, you can close the program and open Mathematica.

In a Mathematica notebook, you are going to load the J/Link package, install Java, and put the phidget21.jar file on your class path by editing the AddToClassPath[] command in the snippet below.

phidgetsClass = LoadJavaClass["com.phidgets.InterfaceKitPhidget"]

Next, create a new instance of the InterfaceKit object, open it and wait for attachment. You can include a timeout value if you’d like. Once the InterfaceKit is attached, you can query it for basic information like device name, serial number and sensor and IO options.

ik = JavaNew[phidgetsClass]
{ik@getOutputCount[], ik@getInputCount[], ik@getSensorCount[]}

Finally you can use Mathematica‘s Dynamic[] functionality to create a virtual slider in the notebook that will waggle back and forth as you move the physical slider attached to the InterfaceKit. You can also turn the LED on and off by clicking a dynamic checkbox in the notebook.

  Refresh[ik@getSensorValue[0], UpdateInterval -> 0.1]], {0, 1000}]

Dynamic[ik@setOutputState[2, bool]]

When you are finished experimenting, close the InterfaceKit object.


I’ve been programming regularly in Mathematica for more than a year, using the language mostly for spidering, text mining and machine learning applications. But now that I am teaching my interactive exhibit design course again, I’ve started thinking about using Mathematica for physical computing and desktop fabrication tasks. First on my to do list was to find a way to send and receive data from the Arduino. A quick web search turned up the work of Keshav Saharia, who is close to releasing a package called ArduinoLink that will make this easy. In the meantime, Keshav helped me to debug a simple demonstration that uses the SerialIO package created by Rob Raguet-Schofield. There were a few hidden gotchas involved in getting this working on Mac OS X, so I thought I would share the process with others who may be interested in doing something similar.

On the Arduino side, I attached a potentiometer to Analog 1, and then wrote a simple program that waits for a signal from the computer, reads the sensor and then sends the value back on the serial port.  It is based on the Serial Call and Response tutorial on the Arduino website.


 This code is adapted from

 When started, the Arduino sends an ASCII A on the serial port until
 it receives a signal from the computer. It then reads Analog 1,
 sends a single byte on the serial port and waits for another signal
 from the computer.

 Test it with a potentiometer on A1.

int sensor = 0;
int inByte = 0;

void setup() {

void loop() {
  if (Serial.available() > 0) {
    inByte =;
    // divide sensor value by 4 to return a single byte 0-255
    sensor = analogRead(A1)/4;

void establishContact() {
  while (Serial.available() <= 0) {

Once the sketch is installed on the Arduino, close the Arduino IDE (otherwise the device will look busy when you try to interact with it from Mathematica).  On the computer side, you have to install the SerialIO package in


and make sure that it is in your path.  If the following command does not evaluate to True

MemberQ[$Path, "/Users/username/Library/Mathematica/Applications"]

then you need to run this command

AppendTo[$Path, "/Users/username/Library/Mathematica/Applications"]

Next, edit the file


so the line

$Link = Install["SerialIO"]


$Link =
LinkProtocol -> "Pipes"]

If you need to find the port name for your Arduino, you can open a terminal and type

ls /dev/tty.*

The demonstration program is shown below.  You can download both the Arduino / Wiring sketch and the Mathematica notebook from my GitHub repository.  You need to change the name of the serial device to whatever it is on your own machine.


myArduino = SerialOpen["/dev/tty.usbmodem3a21"]

SerialSetOptions[myArduino, "BaudRate" -> 9600]


 Dynamic[Refresh[SerialWrite[myArduino, "B"];
  First[SerialRead[myArduino] // ToCharacterCode],
  UpdateInterval -> 0.1]], {0, 255}]

The Mathematica code loads the SerialIO package, sets the rate of the serial connection to 9600 baud to match the Arduino, and then polls the Arduino ten times per second to get the state of the potentiometer.  It doesn’t matter what character we send the Arduino (here we use an ASCII B).  We need to use ToCharacterCode[] to convert the response to an integer between 0 and 255.  If everything worked correctly, you should see the slider wiggle back and forth in Mathematica as you turn the potentiometer.  When you are finished experimenting, you need to close the serial link to the Arduino with


For a number of years I’ve taught a studio course in our public history graduate program on designing interactive exhibits. Most academic historians present their work in monographs and journal articles unless they are way out there on the fringe, in which case they may be experimenting with trade publications, documentary film, graphic novels, photography, websites, blogs, games or even more outré genres. Typically the emphasis remains on creating representations that are intended to be read in some sense, ideally very carefully. Public historians, however, need to be able to communicate to larger and more disparate audiences, in a wider variety of venues, and in settings where they may not have all, or even much, of the attention of their publics. Exhibits that are designed merely to be read closely are liable to be mostly ignored. When that happens, of course, it doesn’t matter how interesting your interpretation is.

Students in the course learn how to embed their interpretations in interactive, ambient and tangible forms that can be recreated in many different settings. To give some idea of the potential, consider the difference between writing with a word processor and stepping on the brake of a moving car. While using a word processor you are focused on the task and aware that you are interacting with a computer. The interface is intricate, sensorimotor involvement is mostly limited to looking and typing, and your surrounding environment recedes into the background of awareness. On the other hand, when braking you are focused on your involvement with the environment. Sensorimotor experiences are immersive, the interface to the car is as simple as possible, and you are not aware that you are interacting with computers (although recent-model cars in fact have dozens of continuously operating and networked microcontrollers).

Academic historians have tended to emphasize opportunities for knowledge dissemination that require our audience to be passive, focused and isolated from one another and from their surroundings. When we engage with a broader public, we need to supplement that model by building some of our research findings into communicative devices that are transparently easy to use, provide ambient feedback, and are closely coupled with the surrounding environment. The skills required to do this come from a number of research fields that ultimately depend on electronics and computers. Thanks to the efforts of community-minded makers, hackers, and researchers, these techniques are relatively easy to learn and apply.

Physical computing. In order to make objects or environments aware of people, to make them responsive and interactive, we need to give them a better sense of what human beings are like and what they’re capable of (Igoe & O’Sullivan 2004; Igoe 2011). Suppose your desktop computer had to guess what you look like based on your use of a word processer.  It could assume that you have an eye and an ear–because you respond to things presented on the screen and to beeps–and it could assume you have a finger–because you push keys on the keyboard. To dramatize this, I usually use the image above, which is based on a drawing in Igoe and O’Sullivan (2004). It looks horrible: people are nothing like that. By giving our devices a better sense of what we’re actually like, we make it possible for them to better fit into our ongoing lifeworlds.

Pervasive computing. We are at the point where computational devices are becoming ubiquitous, invisible, part of the surroundings (McCullough 2004). The design theorist Adam Greenfield refers to this condition as “everyware” (2006). A number of technologies work together to make this possible. Embedded microprocessors put the power of full computers into tiny packages. Micro-electro-mechanical systems (MEMS) include sensors and actuators to sense and control the environment. Radio transceivers allow these miniature devices to communicate with one another and get online. Passive radio frequency ID circuits (RFIDs) are powered by radio waves to transmit identifying information. All of these systems are mass-produced so that unit costs are very low, and it becomes possible to imagine practically everything being manufactured with its own unique identifier and web address. This scenario is sometimes called the “internet of things.” Someday instead of searching for your keys you may be able to Google for them instead. As Bruce Sterling notes, practically everything in the world could become the “protagonist of a documented process” (2005). Provenance has typically had to be reconstructed painstakingly for a tiny handful of objects. Most historians are not ready to conduct research in a world where every object can tell us about its own history of manufacture, ownership, use, repair, and so on. Dealing with pervasive computation will require the ability to quickly focus on essential information, to relegate non-essential information to peripheral awareness, and to access information in the places and settings where it can make a difference.

Interaction Design. The insinuation of computation and interactivity into every conceivable setting has forced designers to abandon the traditional idea of “human-computer interaction,” and to take a much more expansive perspective instead (Moggridge 2006; Saffer 2006). Not only is everything becoming a potential interface, but many smart devices are better conceptualized as mediating between people, rather than between person and machine. Services like ordering a cup of coffee at Starbucks are now designed using the same techniques as those used to create interactive software (e.g., Google calendar) and hardware (e.g., the iPod). In order to benefit from the lessons of interaction design, historians will have to take into account the wide range of new settings where we can design experiences and shape historical consciousness. The technology of tangible computing provides a link between pervasive devices, social interaction, and the material environment (Dourish 2004).

Desktop Fabrication. Most radical of all, everything that is in digital form can be materialized, via machines that add or subtract matter. The former include a range of 3D printing technologies that deposit tiny amounts of glue, plastic or other materials, or that use lasers to selectively fuse small particles of metal, ceramic or plastic. The latter include computer-controlled milling machines, lathes, drills, grinders, laser cutters and other tools. The cost of these devices has been dropping rapidly, while their ease-of-use increases. The physicist Neil Gershenfeld has assembled a number of “fab labs”—universal fabrication laboratories—from collections of these devices. At present, a complete fab lab costs around $30-$40,000 and a few key machines are considerably cheaper (Gershenfeld 2000, 2007). Enthusiasts talk about the possibility of downloading open source plans and “printing out” a bicycle, an electric guitar, anything really. An open source hardware community is blossoming, aided in part by O’Reilly Media’s popular MAKE magazine and by websites like Instructables and Thingiverse. Desktop fabrication makes it possible to build and share custom interactive devices that communicate our knowledge in novel, material forms.


  • Dourish, Paul. Where the Action Is: The Foundations of Embodied Interaction. Cambridge, MA: MIT, 2004.
  • Gershenfeld, Neil. When Things Start to Think. New York: Holt, 2000.
  • Gershenfeld, Neil. Fab: The Coming Revolution on Your Desktop—From Personal Computers to Personal Fabrication. New York: Basic, 2007.
  • Greenfield, Adam. Everyware: The Dawning Age of Ubiquitous Computing. Berkeley, CA: New Riders, 2006.
  • Igoe, Tom. Making Things Talk, 2nd ed. Sebastopol, CA: O’Reilly, 2011.
  • Igoe, Tom and Dan O’Sullivan. Physical Computing: Sensing and Controlling the Physical World with Computers. Thomson Course Technology, 2004.
  • McCullough, Malcolm. Digital Ground: Architecture, Pervasive Computing and Environmental Knowing. Cambridge, MA: MIT, 2004.
  • Moggridge, Bill. Designing Interactions. Cambridge, MA: MIT, 2006.
  • Norretranders, Tor. The User Illusion: Cutting Consciousness Down to Size. New York: Penguin, 1999.
  • Saffer, Dan. Designing for Interaction: Creating Smart Applications and Clever Devices. Berkeley, CA: New Riders, 2006.
  • Sterling, Bruce. Shaping Things. Cambridge, MA: MIT, 2005.
  • Torrone, Phillip. “Open Source Hardware, What Is It? Here’s a Start…” MAKE: Blog (23 Apr 2007).

When Alan MacEachern and I started working on the first edition of The Programming Historian in late 2007, our goal was to create an online resource that could be used by historians and other humanists to teach themselves a little bit of programming. Many introductory texts and websites approach programming languages in a systematic (if dull) way, starting with basics such as data types and gradually introducing various language constructs. This is fine if you already know how to program. Most beginners, however, are more concerned with addressing a practical need than they are with learning technical details that don’t seem to be immediately relevant. We wanted to approach programming as a means of expression. Plenty of time to begin learning grammar after you’ve had a few conversations, as it were.

Neither of us expected PH to become nearly as popular as it did. We’re still young (OK, technically we’re middle aged) but we’ll have to work pretty hard to ever gain as many readers for anything else we write. While gratifying, the success of the first edition raised new problems. Some people wanted to pitch in. Some wanted help with particular problems. Some wanted to translate the material into other natural languages or other programming languages. In the meantime, websites changed, operating systems changed, software libraries changed, programming languages changed. Change is good! But dealing with change is difficult if one or two people try to do everything by themselves. Fortunately, there is a better way.

For a couple of years now, Adam Crymble has been working with us on creating a new edition of The Programming Historian that will be open to user contributions. There will be a number of ways for people to get involved: as writers, programmers, editors, technical reviewers, testers, website hackers, graphic designers, discussants, translators, and so on.  All contributions will be peer-reviewed, and everyone who participates will get credit for his or her work.  One of our aims with the first edition was to maintain a narrative thread that led the user through a series of useful projects. The new edition is organized in terms of short lessons that build on knowledge acquired in previous ones. Informally, you might think of this as “choose your own adventure.” Technically the new site will be structured like a directed acyclic graph, with tools that make it easy to keep track of what you’ve learned so far and provide you with a number of choices going forward. All source code will be under version control, making it easy to maintain and fork.

Over the next few months we will be inviting beta contributors to help us design and develop the website, write and program new lessons, do editing and peer reviewing, and generally turn the goodness up to 11.  There will be new lessons that lead into subjects like visualization, geospatial data, image search, integration with external tools, and the use of APIs. We will also be working with new institutional partners and exploring the connections to be made with other, similar projects. If you would like to get involved, please don’t hesitate to e-mail me and let me know. We will do a public launch when everything is working smoothly and we are ready to accept general contributions, hopefully sometime in 2012.

In April 2008, I posted an article in my blog Digital History Hacks about visualizing the social network of NiCHE: Network in Canadian History & Environment as it was forming. We now use custom programs written in Mathematica to explore and visualize the activities of NiCHE members, and to assess our online communication strategies. Some of the data comes from our online directory, where members can contribute information about their research interests and activities. Some of it comes from our website server logs, and some of it is scraped from social networking sites like Twitter. A handful of examples are presented here, but the possibilities for this kind of analysis are nearly unbounded.

Clusters of Research Interest

When NiCHE members add their information to our online directory, they are encouraged to select one or more of a set of research interests. This heat map shows the degree of overlap for each pair of topics, with brighter colours indicating a greater number of people who indicated an interest in both areas. Some of the pairings are not surprising: people who are interested in landscape are often interested in conservation, environmentalism and parks, and vice versa. The absence of overlap is also meaningful. People who are interested in fisheries seem not to be interested in landscape, and vice versa. Why not? A workshop that tried to bring both groups together to search for common ground might lead to new insights. Studying visualizations like this one also allow us to assess the extent to which our original thematic projects (focusing on Landscapes, Forest History, Water, etc.) actually cover the interests of members. Some of the members of the NiCHE executive do research in the history and philosophy of science, and this is apparently something that many NiCHE members are also interested in. A future workshop to address this interest might be co-hosted by NiCHE and the Situating Science knowledge cluster.

Research Interest Overlaps

Looking at the same information in a different way brings new things to light. This figure shows the degree of overlap between pairs of research interests as a graph rather than a heat map. Research topics are represented as vertices, and the size of the edge connecting each pair indicates the degree of overlap. This graph suggests that NiCHE members who are interested in subjects that focus on material evidence over very long temporal durations are relatively marginal in the knowledge cluster, and may not be well connected even with one another. Again, being able to visualize the data gives us the possibility of addressing the situation. Perhaps we should make more outreach to geologists and archaeologists?

Bridging Capital

Studies of social networks suggest that their “small world” properties are typically due to people who provide bridges between interest groups or make other kinds of long-distance connections. Here we use a graph to visualize every pair of research topics that are of interest to a single NiCHE member. From this figure it is easy to see that Darin Kinsey is the only person who has claimed to be interested in both landscapes and fisheries. If we did decide to hold a workshop on the intersection of those two topics, he might be the ideal person to help organize it. If we want to try to get scholars talking to one another across methodological or thematic boundaries, then we should enlist the help of people like Ravi Ranganathan, Norm Catto and Liza Piper to get the conversation started.

Centrality of NiCHE Activities

Our online directory also allows NiCHE members to indicate which activities they have participated in. The vertices in this graph represent NiCHE members and activities that we have sponsored. If a member participated in a particular activity, there is an edge connecting the two vertices. The color of each vertex represents a measure of network centrality. Here we have labeled the most central of our activities. Note that the conferences (especially Confluences 2007 and Climate History 2008) drew relatively large numbers of participants who did not attend other NiCHE activities. Participants in the summer field schools (CHESS), on the other hand, were much more likely to attend more than one of our activities. This suggests that our field schools do a better job of helping to constitute NiCHE as an ongoing entity than regular conferences would. This is consistent with reports that we have received from regular CHESS participants, especially new scholars.

NiCHE Twitter Followers

In April 2011, NiCHE had about 340 followers on the social networking site Twitter. Each vertex in this graph represents one Twitter user. The size of the icon is scaled according to the log of the number of followers that each user has. (In this case, the number of followers range from a handful to tens of thousands, depending on the user). The edges of the graph represent some of the connections between Twitter users who follow one another. This figure shows that the NiCHE Twitter audience includes a relatively dense network of scholars who identify themselves either as digital humanists or as Canadian / environmental historians or geographers. There is also a relatively large collection of followers who do not appear to have many connections with one another. Knowing something about who is reading our tweets enables us to gauge the degree to which our online knowledge mobilization activities are effective, and helps us think about targeting our messages to particular communities.

For physical computing or amateur science projects, you often need to be able to get the output of a sensor or transducer into your computer.  There are a lot of ways to do this with specialized data acquisition hardware and software.  This method is pretty old school, and requires only a standard laptop and a handful of inexpensive electronic components.  It may be appropriate if you are working in a relatively quiet environment and you don’t need to take a lot of samples per second.

Just about any physical signal or measurement in the world can be converted into a fluctuating voltage, an analog signal.  Most laptops have a built-in microphone, so if you can convert your voltage into an audible signal, you can use the microphone to digitize it.  Once it is in digital form, you can then process the signal with any programming language.  Here we’ll use Mathematica.

The electronic side of the project is a very simple metronome (or tone generator) based on the 555 timer.  It is a mashup of two circuits from Forrest Mims’ work, the V/F Converter on page 51 of his Science and Communication Circuits and Projects and the Audio Oscillator / Metronome on p. 22 of his Timer, Op Amp and Optoelectronic Circuits and Projects.

The changing voltage from a sensor or transducer is fed into pin 5 on the timer.  For voltages between about 1.25v and 4.70v, the timer will respond with chirps or clicks that are more-or-less frequent: the frequency depends on the voltage.  With lower voltages, the frequency is higher, and with higher voltages it is lower.  The variable resistor can be used to tune the center of the frequency range.  Increasing the value of the capacitor decreases the frequency and makes each individual chirp longer.  Decreasing the value of the capacitor increases the frequency and makes each chirp shorter.  For the test here, the variable resistor was set to 100K measured with a digital multimeter.  The piezo element was rated for 3-20v, 10mA.  You can make the circuit portable by powering it with a 9v battery, but you might have to adjust the values of the resistors and capacitor for best results.  I used a bench power supply and proto-board for my experiments.

To make a sample signal, I connected the signal line to a power supply set at 4.7v, decreased it by hand to 1.25v, then increased it back to 4.7v.  The circuit responded by producing slower, faster and slower clicks.  If you would like to listen to the recording, there are MP3 and WAV files in the GitHub repository.

My original plan was to use the SystemDialogInput[ "RecordSound" ] command to record the audio directly into Mathematica, but that feature is unfortunately not supported on Macs yet.  So I recorded the sample using Audacity instead, and then used the Import command to load the sound file into Mathematica.

My complete Mathematica notebook is available for download on GitHub.  Here are the highlights.

We load the signal in as data, and plot every thousandth point to get an idea of what it looks like.  Since we sampled the sound at 44100 Hz, there will be 44100 data points for each second of audio, or 44.1 data points per millisecond.  The duration of each click is on the order of 10 milliseconds or less, so we won’t be able to make out individual clicks this way.

signalData = Flatten@Import[path <> "sample-data.wav", "Data"];
   signalData[[1 ;; Length[signalData] ;; 1000]],
   ImageSize -> Full, AspectRatio -> 0.25, PlotRange -> All,
   Joined -> False, Filling -> Axis]

Near the beginning of the sample, the clicks are relatively sparse.  If we start at the 3 second mark and plot every point for 50 milliseconds of our data, we can see one click.

sparseRegion=signalData[[(3*second) ;; (3*second) + Round[50*msec]]];
    ImageSize -> Full, AspectRatio -> 0.25, PlotRange -> All,
    Joined -> False, Filling -> Axis]

We are interested in pulling out the clicks, so we take the absolute value of our data, then use the MovingAverage command to smooth it a bit.  Averaging over ten data points seems to give us a sharp peak.

ListLinePlot[MovingAverage[Abs[sparseRegion], 10],
    ImageSize -> Full, AspectRatio -> 0.25, PlotRange -> All,
    Joined -> False, Filling -> Axis]

Now we can use a thresholding function that returns 1 when the signal is above a given level and 0 otherwise. We use Map to apply the function to each data point.

thresholdFunction[value_, threshold_] :=
    If[value > threshold, 1, 0]
threshold = 0.25;
        thresholdFunction[#, threshold] &, 
        MovingAverage[Abs[sparseRegion], 10]], 
    ImageSize -> Full, AspectRatio -> 0.25, PlotRange -> All, 
    Joined -> False, Filling -> Axis]

This looks pretty good when the clicks are infrequent. We need to make sure that it will work in a region of our signal where clicks are more frequent, too. We look at a 50-millisecond-long region 10 seconds into our signal, doing the same processing and plotting the results.

denseRegion=signalData[[(10*second) ;; (10*second) + Round[50*msec]]];

Since our thresholding looks like it will work nicely for our data, we can apply it to the whole dataset.

signalThresholded = 
    Map[thresholdFunction[#, threshold] &, 
        MovingAverage[Abs[signalData], 10]];

Now we need to count the number of clicks that occur in a given interval of time. We will do this by using the Partition command to partition our data into short, overlapping windows. Each window will be 200 milliseconds long, and will overlap those on its left and right by 100 milliseconds each. To count clicks, we will use the Split command to split each window into runs of similar characters (i.e., all 1s or all 0s) then Map the Total function over each of the resulting lists. If there are more than, say, five 1s in a row, surrounded on either side by a run of 0s, we can assume that a click has occurred.

signalThresholdedPartition = 
    Partition[signalThresholded, Round[200*msec], Round[100*msec]];
countClicks[window_] :=
        Map[If[# > 5, 1, 0] &, 
            Map[Total, Split[window]]]]

We can now plot the number of clicks per window. This shows us that the voltage starts high, drops, then increases again. Note the spike in the middle, which represents a dense cluster of clicks. Looking at the raw signal, this appears to have been a burst of noise. At this point, we have a workable way to convert a varying voltage into a digital signal that we can analyze with Mathematica. For most applications, the next step would be to calibrate the system by recording audio samples for specific voltages and determining how many clicks per window was associated with each.

ListLinePlot[Map[countClicks, signalThresholdedPartition]]


Get every new post delivered to your Inbox.

Join 133 other followers