In Data Mining, Witten and Frank define the subject as “the extraction of implicit, previously unknown, and potentially useful information from data,” as the process of “finding and describing patterns in data.” Machine learning, a sub-discipline of computer science, goes one step further by attempting to use these patterns to classify previously unseen data. Historians are now beginning to use both kinds of techniques in the research process.
Readings
- Bobley, “Is Computational Linguistics the New Computer Science for the Humanities?” NEH Office of Digital Humanities (14 April 2011)
- Cohen, “Do APIs Have a Place in the Digital Humanities?” dancohen.org (21 November 2005)
- Cohen, “When Machines Are the Audience,” dancohen.org (2 March 2006)
- Cohen, “Mapping What Americans Did on September 11,” dancohen.org (8 August 2006)
- Cohen, “History and the Second Decade of the Web,” Rethinking History (June 2004)
- Cohen, “Initial Thoughts on the Google Books Ngram Viewer and Datasets,” dancohen.org (19 December 2010)
- Enriquez, “The Glory of Big Data,” Popsci (31 October 2011)
- Gibbs and Owens, “The Hermeneutics of Data and Historical Writing,” in Dougherty and Nawrotzki, eds. Writing History in the Digital Age (2011)
- Grafton, “Loneliness and Freedom,” Perspectives on History (March 2011)
- Hand, “Culturomics: Word Play,” Nature 474 (2011)
- Harris, “Word Clouds Considered Harmful,” Nieman Journalism (13 October 2011)
- Hitchcock, “Culturomics, Big Data, Code Breakers and the Casaubon Delusion,” Historyonics (19 June 2011)
- Johnston, “Data is the New Black,” The Signal (14 October 2011)
- Kelly, “Analyzing Traffic,” edwired (29 September 2006)
- Manovich, “Style Space,” Software Studies Initiative (4-6 August 2011)
- Rogers, “Data Journalism Broken Down,” Guardian DataBlog (7 April 2011)
- Schmidt, “Theory First,” Sapping Attention (3 November 2011)
- Sherratt, “Every Story Has a Beginning,” Discontents (14 September 2011)
- Stray, “How The Guardian is Pioneering Data Journalism with Free Tools,” Nieman Journalism (5 August 2010)
- Turkel, “Searching for History,” Digital History Hacks (12 October 2006)
- Udell, “Seven Ways to Think Like the Web” (24 January 2011)
- Whitelaw, “The Visible Archive” (2008-10)
- Wilson, “Principles of Computational Thinking,” Software Carpentry (14 February 2011)
- Zax, “Visualizing Historical Data,” Fast Company (9 June 2011)
Further Reading
- Machlis, “22 Free Tools for Data Visualization and Analysis,” Computerworld (20 April 2011)
- Sadun, “Yahoo Pipes: Getting Started with Custom RSS Feeds,” Ars Technica (March 2009)
- Stray, “A Computational Journalism Reading List,” (18 April 2011)
See Also