In an earlier hack we discovered that the majority of people in the Dictionary of Canadian Biography were categorized as businessmen, office holders, politicians, lawyers and soldiers. The biographies cover a 930-year span, however, which raises the question of which occupations were prevalent at any particular time. One might suspect, for example, that there were relatively fewer explorers and fur traders in more recent years. We can call this the “Adams Effect,” after a passage from a letter that John Adams wrote to Abigail Adams in 1780. (Not to be confused with Pepper Adams’s final album of the same name.) Anyway, John Adams said
I must study politics and war that my sons may have liberty to study mathematics and philosophy, geography, natural history, naval architecture, navigation, commerce, and agriculture, in order to give their children a right to study painting, poetry, music, architecture, statuary, tapestry, and porcelain.
Does this capture the Canadian experience? Time for another hack, available on GitHub.
# dcbo-ids-chg.pl # 21 jan 2006 # # wj turkel # http://digitalhistoryhacks.blogspot.com # # Uses entries in the online Dictionary of Canadian Biography # to determine how biography categories ('Aboriginal', 'Accountant', etc.) # have changed over time. Output into CSV file for analysis with # spreadsheet: volume numbers are columns, category ids are rows. # # LWP code adapted from # Burke, Perl & LWP (O'Reilly 2002), pp. 27-28, 96-97. # Max subroutine from # Schwartz & Phoenix, Learning Perl, 3rd ed (O'Reilly 2001), pp. 65. use LWP; my $browser; sub do_POST { $browser = LWP::UserAgent->new() unless $browser; my $resp = $browser->post(@_); return ($resp->content, $resp->status_line, $resp->is_success, $resp) if wantarray; return unless $resp->is_success; return $resp->content; } my $doc_url = 'http://www.biographi.ca/EN/Search.asp'; # To speed up processing, the codes expected by the DCB search FORM # have already been scraped. # This goes into Data6 of the search FORM my %volumes = ( '01' => '1000-1700', '02' => '1701-1740', '03' => '1741-1770', '04' => '1771-1800', '05' => '1801-1820', '06' => '1821-1835', '07' => '1836-1850', '08' => '1851-1860', '09' => '1861-1870', '10' => '1871-1880', '11' => '1881-1890', '12' => '1891-1900', ); # This goes into Data3 of the search FORM my %categories = ( '35716' => 'Philanthropists and Soc...', '35681' => 'Business', '35712' => 'Legal Professions', '35719' => 'Accountants', '35720' => 'Archivists', '35723' => 'Curators', '35713' => 'Fur Trade', '35704' => 'Slaves', '35726' => 'Librarians', '35698' => 'Surveyors', '35718' => 'Sports and recreation', '35674' => 'Agriculture', '35697' => 'Scientists', '35696' => 'Religion', '35691' => 'Miscellaneous', '35676' => 'Armed Forces', '35683' => 'Engineers', '35709' => 'Interpreters and Transl...', '35728' => 'Police', '35689' => 'Mariners', '35694' => 'Office Holders', '35711' => 'Labourers and Labour Or...', '35680' => 'Blacks', '35695' => 'Politicians', '35679' => 'Authors', '35677' => 'Artisans', '35721' => 'Arts and Entertainment', '35722' => 'Communications', '35710' => 'Inventors', '35730' => 'Social Scientists', '35707' => 'Criminals', '35708' => 'Aboriginal people', '35682' => 'Education', '35684' => 'Explorers', '35690' => 'Medicine', '35675' => 'Architects', ); # We need to keep the category keys in the right order. my @catarray = sort {$categories{$a} cmp $categories{$b}} (keys %categories); # This will be a hash of arrays. my %categorycount; # For each volume (Volumes 1, 3, 9, 10 aren't coded for categories) foreach my $vol ('02', '04', '05', '06', '07', '08', '11', '12') { # For each different category, return count of matching biographies. foreach my $id (@catarray) { # Give the user some feedback about progress. print "Vol: " . $vol . "\tCat: " . $id . "\n"; # Get the page. my ($content, $message, $is_success) = do_POST( $doc_url, [ 'Data3' => $id, 'Data4' => '1', 'Data6' => $vol ], ); die "Error $message\n" unless $is_success; my $tmpcontent = $content; # Sometimes there are no matching biographies. if ($tmpcontent =~ m{ Your search criteria has no matching biographies}) { # $categorycount{$id} = 0; push @{ $categorycount{$id} }, "0"; } else { $content =~ m{ <strong>([0-9,]+)</strong> biography\(ies\) are available using your current search criteria}is; my $tmp = $1; # Remove commas from numbers in the thousands. $tmp =~ s/\,//; # $categorycount{$id} = $tmp; push @{ $categorycount{$id} }, $tmp; } # Be considerate to their server. sleep 2; } } # Now output the .CSV file open(OUTPUT, ">ids-chg.csv"); # Output count. foreach my $key (@catarray) { # print OUTPUT $key . "," . $categories{$key} . "," . $categorycount{$key} . "\n"; print OUTPUT $key . "," . $categories{$key} . "," . join(",", @{ $categorycount{$key} }) . "\n"; } # Close the file. close(OUTPUT);
To save time, we scrape the search page and extract the codes for the different volumes and categories. Biographies in four of the twelve volumes haven’t been categorized, so there will be some gaps in our data, but we should have enough to see trends. We build a table of data to analyze in a spreadsheet.
Category,Name,Vol2,Vol4,Vol5,Vol6,Vol7,Vol8,Vol11,Vol12 35708,Aboriginal people,34,32,17,15,22,18,22,22 35719,Accountants,0,0,0,0,1,1,0,1 35674,Agriculture,84,45,78,76,73,82,55,69 35675,Architects,0,0,7,4,3,13,10,11 35720,Archivists,0,0,0,0,0,0,0,0 35676,Armed Forces,195,189,195,205,196,155,55,81 35677,Artisans,21,22,26,24,41,40,31,34 35721,Arts and Entertainment,16,15,31,19,32,22,27,40 35679,Authors,12,29,66,44,79,85,85,146 35680,Blacks,1,3,3,1,2,3,3,3 35681,Business,82,120,152,183,214,211,235,290 35722,Communications,0,6,8,20,33,41,64,75 35707,Criminals,0,0,0,12,1,0,0,3 35723,Curators,0,0,0,0,0,0,0,0 35682,Education,9,5,13,37,58,79,108,116 35683,Engineers,10,0,6,7,13,6,15,22 35684,Explorers,15,13,9,6,7,10,10,10 35713,Fur Trade,58,44,44,38,33,46,21,17 35709,Interpreters and Transl...,0,0,0,9,5,0,0,7 35710,Inventors,0,0,0,1,3,6,0,9 35711,Labourers and Labour Or...,0,0,0,6,1,3,6,5 35712,Legal Professions,79,74,156,187,181,160,90,126 35726,Librarians,0,0,0,0,1,0,0,0 35689,Mariners,28,20,19,17,7,12,6,9 35690,Medicine,16,13,19,23,26,29,37,37 35691,Miscellaneous,3,0,2,3,6,0,0,16 35694,Office Holders,123,154,190,195,193,178,162,170 35716,Philanthropists and Soc...,0,0,0,0,6,12,0,30 35728,Police,0,0,0,0,0,0,0,0 35695,Politicians,0,29,135,154,160,171,193,210 35696,Religion,129,82,76,68,87,102,100,90 35697,Scientists,2,0,7,3,6,7,20,21 35704,Slaves,2,1,0,0,0,0,0,0 35730,Social Scientists,0,0,0,0,0,0,0,0 35718,Sports and recreation,0,0,0,0,0,3,0,11 35698,Surveyors,10,9,20,21,20,15,9,19
We can then plot the relative proportion of each occupation over time. (Relative because there are more biographies overall in some volumes.) Also, since some occupations are very common (e.g., member of armed forces) and others are not (e.g., architect) we will want to plot the numbers on a log scale. The following three graphs show which occupations lose, maintain or gain ground over time.
So, is there an “Adams Effect” in Canadian biography? His “war” prediction is spot on: almost 21% of the biographies in Volumes 2 (1701-40) and 4 (1771-1800) are for members of the armed forces. This drops steadily to about 4 or 5% by the late nineteenth century. “Politics,” alas, becomes more, rather than less, popular, accounting for about 10% of the biographies in the first half of the nineteenth century, and peaking at about 14% in Volume 11 (1881-90). It is worth emphasizing the fact that the volumes of the DCB are organized by death dates, and that politicians who died in the 1880s may well have been active in the 1860s and 70s (i.e., during the Canadian Confederation and its aftermath.)
How about the next generation? We see a decline in explorers and mariners, in the fur trade and agriculture. Surveying and engineering hold relatively steady. In the final generation we do see an increase in architects, authors and educators. Some predictions (like “statuary,” “tapestry” and “porcelain”) are harder to check. Overall, not too bad for a statement which was never intended as a prediction.
Tags: dictionary of canadian biography | digital history | hacking | perl
Revision History
- 2006-01-22. Published at http://digitalhistoryhacks.blogspot.ca/2006/01/adams-effect.html
- 2008-09-26. Links to code and CSV file updated to point to wiki archive
- 2012-04-17. Revised edition published on http://williamjturkel.net. Internal links changed to point to this site. Graph images uploaded to this site and links changed. Code and CSV links changed to point to GitHub repository. Code and CSV file also displayed inline.
You must be logged in to post a comment.