Digitize a Text

Hafner and Cerf ask us to think about the processes of digital preservation. Now that you’ve investigated how books are made physically, we can begin to think about how books are represented digitally. First, you’ll examine a representation of a book as a series of digital images, and then you’ll remediate those images into XML, a markup language that gives structure to machine-readable text.

As usual, follow the steps below to complete the workshop exercise, and then we’ll discuss.

1) Go to the Internet Archive pages for Early English Books: https://archive.org/details/pub_early-english-books-1475-1640?tab=collection&sort=-downloads or https://archive.org/details/pub_early-english-books-1641-1700?tab=collection

2) Select a topic that interests you, and search for a book that matches that topic

  • You can choose anything you want, but here are some good general ideas:
    • Cooking
    • Witchcraft
    • Civil War (n.b. This will get you results on the English Civil War 1642-51)
    • Sonnets
    • Games

3) Select a book and choose an interesting page (just one page, or even less!). For a fun challenge, you might choose a page that has more than just a single block of prose text. Maybe your page has a poem or a song. Maybe it has multiple columns, or even a table or diagram.

4) Follow the guide to create an XML representation of that page or part of the page

  • Read over parts of the introduction to XML here (you only need to read v.1-3 and v.6): https://tei-c.org/release/doc/tei-p5-doc/en/html/SG.html
  • Once you have you a sense of XML, you can mark up your text in whatever way you see fit. Feel free to make up whatever elements and attributes you like. For a good example, look over the poem in section v.3, highlighted in pink.
  • You could make your XML document in a word processor like Microsoft Word or Google Docs, but you could also use a text editor such as Brackets: https://brackets.io/.
  • Feel free to ask lots of questions. Again, the goal is to get a feel for how this works rather than create a perfect object. If your page is very long, you can choose just one part to represent. You needn’t spend more than fifteen minutes or so creating your XML.

5) Think about what is gained and what is lost in this representation vs. the digital image vs. the physical book itself (which you’ve never seen!).