This is an open access, open content and open source textbook in the form of a Mathematica notebook. If you do not have Mathematica, you can open the notebook with Wolfram’s free CDF Player software or view it in the Wolfram Cloud. You will not be able to do everything that you can with the notebook version, but it should be good enough for you to get an idea of what is included.
Second Revised Edition (Summer 2020)
GitHub Repository
Download Mathematica notebook
The second revised edition (v2.01, September 2020) contains 23 complete chapters which cover the following topics. There are also a number of screencasts for each lesson.
- Lesson 00. Introduction to Mathematica. Interacting with notebooks.
- Screencast 00 1 Working with Mathematica Notebooks
- Screencast 00 2 Evaluating Expressions
- Lesson 01. Reading Code. Word frequency, word clouds and stopwords.
- Screencast 01 0 Introduction
- Screencast 01 1 Word Frequency
- Screencast 01 2 Sorted Word List
- Screencast 01 3 Word Clouds
- Screencast 01 4 Removing Stopwords
- Lesson 02. Computable Knowledge. Entities, tables, timelines and maps.
- Screencast 02 0 Introduction
- Screencast 02 1 Entity Lists
- Screencast 02 2 Tables
- Screencast 02 3 Timelines
- Screencast 02 4 Maps
- Lesson 03. Text Content. Mathematica notebooks and expressions, strings and natural language processing.
- Screencast 03 0 Introduction
- Screencast 03 1 Manipulating Strings
- Screencast 03 2 Natural Language Processing
- Screencast 03 3 Natural Language Processing Continued
- Lesson 04. Data Structures. Lists, associations and datasets.
- Screencast 04 0 Introduction
- Screencast 04 1 Lists
- Screencast 04 2 Associations
- Screencast 04 3 Datasets
- Screencast 04 4 Building a Dataset
- Lesson 05. Reusing Code. Defining and developing functions, keyword in context (KWIC).
- Screencast 05 0 Introduction
- Screencast 05 1 Ways to Input Expressions
- Screencast 05 2 Defining Functions
- Screencast 05 3 Pure Functions
- Screencast 05 4 Keyword in Context (KWIC)
- Screencast 05 5 Developing a Function
- Lesson 06. Networks. Metadata, matrices and social network analysis.
- Screencast 06 0 Introduction
- Screencast 06 1 Dataset
- Screencast 06 2 Matrices
- Screencast 06 3 Visualizing Networks
- Screencast 06 4 Related Entities
- Lesson 07. Indexing and Searching. Pattern matching, topic classification and term distribution.
- Screencast 07 0 Introduction
- Screencast 07 1 Position
- Screencast 07 2 Map and Relatives
- Screencast 07 3 Word Context and String Position
- Screencast 07 4 Pattern Matching
- Screencast 07 5 High-Level Indexing
- Screencast 07 6 Plotting Term Distribution
- Lesson 08. Geospatial Analysis. Geographic information: raster, vector and attribute data.
- Screencast 08 0 Introduction
- Screencast 08 1 Maps and Texts
- Screencast 08 2 Exploring Place
- Screencast 08 3 Geographic Information
- Screencast 08 4 The Past is a Foreign Country
- Lesson 09. Images. Computer vision, face detection, feature extraction and image mining.
- Screencast 09 0 Introduction
- Screencast 09 1 Faces and Fakes
- Screencast 09 2 Image Mining
- Screencast 09 3 Latent Information
- Screencast 09 4 Feature Extraction
- Lesson 10. Page Images. Optical character recognition (OCR), figure extraction and classification.
- Screencast 10 0 Introduction
- Screencast 10 1 Optical Character Recognition (OCR)
- Screencast 10 2 Extracting Figures
- Screencast 10 3 Figure Classification
- Screencast 10 4 Figure Mining
- Lesson 11. Crawling. Browser automation, batch downloading, web archives and WARC files.
- Screencast 11 0 Introduction
- Screencast 11 1 Browser Automation
- Screencast 11 2 Batch Downloading
- Screencast 11 3 Larger Batch Text Search
- Screencast 11 4 Crawling
- Screencast 11 5 Web Archives and WARC Files
- Lesson 12. Linked Open Data. Resource description framework (RDF), SPARQL queries and endpoints, JSON-LD.
- Screencast 12 0 Introduction
- Screencast 12 1 Linked Open Data and RDF
- Screencast 12 2 SPARQL Queries and Endpoints
- Screencast 12 3 Aggregating
- Lesson 13. Markup Languages. Scraping and parsing, XML, really simple syndication (RSS) and text encoding initiative (TEI).
- Screencast 13 0 Introduction
- Screencast 13 1 RSS and XML
- Screencast 13 2 Text Encoding Initiative (TEI)
- Screencast 13 3 File Differences
- Lesson 14. Studying Societies. Computational social science, search data, social media and social networks.
- Screencast 14 0 Introduction
- Screencast 14 1 The Database of Intentions
- Screencast 14 2 Social Media
- Screencast 14 3 Social Network Analysis
- Screencast 14 4 Management Science
- Lesson 15. Extracting Keywords. Information retrieval, term frequency-inverse document frequency (TF-IDF) and rapid automatic keyword extraction (RAKE).
- Screencast 15 0 Introduction
- Screencast 15 1 Term Frequency Inverse Document Frequency (TFIDF)
- Screencast 15 2 Rapid Automatic Keyword Extraction (RAKE)
- Screencast 15 3 Keyword Entities
- Lesson 16. Word and Document Vectors. Feature extraction, dimension reduction, word embeddings and global vectors.
- Screencast 16 0 Introduction
- Screencast 16 1 Document Vector Model
- Screencast 16 2 Feature Extraction and Dimension Reduction
- Screencast 16 3 Word Embeddings
- Screencast 16 4 Global Vectors for Word Representation (GloVe)
- Lesson 17. Citations. References, web services, bibliographic linked open data and citation networks.
- Screencast 17 0 Introduction
- Screencast 17 1 References and Citation
- Screencast 17 2 Web Services
- Screencast 17 3 Bibliographic Linked Open Data
- Screencast 17 4 Citation Networks
- Lesson 18. Natural Language. Multilingual analysis, computational linguistics and sentiment analysis.
- Screencast 18 0 Introduction
- Screencast 18 1 Multilingual Analysis
- Screencast 18 2 Computational Linguistics
- Screencast 18 3 Entropy, Compression, Syntax and Semantics
- Screencast 18 4 Sentiment
- Lesson 19. Web Services. Entity networks, publication search, dashboards, manipulating JSON.
- Screencast 19 0 Introduction
- Screencast 19 1 Entity Network Crawling
- Screencast 19 2 Publication Search
- Screencast 19 3 Dashboards
- Lesson 20. Databases. Parts, selections and transformations, computations and querying, relations.
- Screencast 20 0 Introduction
- Screencast 20 1 Parts and Structural Operations
- Screencast 20 2 Selections and Transformations
- Screencast 20 3 Computations and Querying
- Screencast 20 4 Relations
- Lesson 21. Measuring Images. Photogrammetry, georectification, handwriting and facial 3D reconstruction.
- Screencast 21 0 Introduction
- Screencast 21 1 Photogrammetry
- Screencast 21 2 Georectification
- Screencast 21 3 Handwriting
- Screencast 21 4 Facial 3D Reconstruction
- Lesson 22. Machine Learning. Unsupervised clustering, classify, predict and transfer learning.
- Screencast 22 0 Introduction
- Screencast 22 1 Unsupervised Clustering
- Screencast 22 2 Classify
- Screencast 22 3 Predict
- Screencast 22 4 Transfer Learning
Second Edition (Summer 2019)
GitHub Repository
Download Mathematica notebook
Download PDF version
The second edition (v2.0, August 2019) contains 22 complete chapters which cover the following topics.
- Lesson 01. Reading Code. Word frequency, word clouds and stopwords.
- Lesson 02. Computable Knowledge. Entities, tables, timelines and maps.
- Lesson 03. Text Content. Mathematica notebooks and expressions, strings and natural language processing.
- Lesson 04. Data Structures. Lists, associations and datasets.
- Lesson 05. Reusing Code. Defining and developing functions, keyword in context (KWIC).
- Lesson 06. Networks. Metadata, matrices and social network analysis.
- Lesson 07. Indexing and Searching. Pattern matching, topic classification and term distribution.
- Lesson 08. Geospatial Analysis. Geographic information: raster, vector and attribute data.
- Lesson 09. Images. Computer vision, face detection, feature extraction and image mining.
- Lesson 10. Page Images. Optical character recognition (OCR), figure extraction and classification.
- Lesson 11. Crawling. Browser automation, batch downloading, web archives and WARC files.
- Lesson 12. Linked Open Data. Resource description framework (RDF), SPARQL queries and endpoints, JSON-LD.
- Lesson 13. Markup Languages. Scraping and parsing, XML, really simple syndication (RSS) and text encoding initiative (TEI).
- Lesson 14. Studying Societies. Computational social science, search data, social media and social networks.
- Lesson 15. Extracting Keywords. Information retrieval, term frequency-inverse document frequency (TF-IDF) and rapid automatic keyword extraction (RAKE).
- Lesson 16. Word and Document Vectors. Feature extraction, dimension reduction, word embeddings and global vectors.
- Lesson 17. Citations. References, web services, bibliographic linked open data and citation networks.
- Lesson 18. Natural Language. Multilingual analysis, computational linguistics and sentiment analysis.
- Lesson 19. Web Services. Entity networks, publication search, dashboards, manipulating JSON.
- Lesson 20. Databases. Parts, selections and transformations, computations and querying, relations.
- Lesson 21. Measuring Images. Photogrammetry, georectification, handwriting and facial 3D reconstruction.
- Lesson 22. Machine Learning. Unsupervised clustering, classify, predict and transfer learning.
First Edition (Summer 2015)
Download Mathematica Notebook (.nb, 4.4MB)
Download CDF (.cdf, 14MB)
GitHub Repository
The first edition (v1.0, August 2015) contains six complete chapters which cover the following topics.
- Analyzing Text: word frequencies, word clouds, characterizing sentences, text search, bag of words representation, keyword in context.
- Pattern Matching: string patterns, computable word data, stemming, concordance, capitalized words and phrases, n-gram analysis, stop words.
- Who and What: computable data about people, associations, named entities.
- When and Where: computable data about events and geospatial entities, timelines, maps, collocations, visualizing cooccurrence, vector distance and similarity.
- Information Retrieval: document vector model, related records, TF-IDF, document frequencies, summarization, computable subject data.
- Internet Sources: batch downloading, comparing texts, indexing for search, markup languages, scraping, interactive pattern matching, RSS feeds,
In addition, there are code samples (but no explanatory text) for the following tasks. I hope to expand these to full chapters in future editions.
- Image Processing: PDFs, optical character recognition, visualizing page images, automatic image extraction, detecting faces, photogrammetry, georectification, image classification and identification.
- Spidering and APIs: Wikipedia, network graphing, clustering, Internet Archive, OCLC WorldCat Identities, Open Library API, JSTOR Data for Research
A set of accompanying slides will be posted weekly from September through early December 2015 at
https://williamjturkel.net/teaching/history-2816a-introduction-to-digital-history-fall-2015/
These cover many of the techniques from the textbook in slightly simplified form, focussing more on the use of the techniques than the underlying code.
You must be logged in to post a comment.