1. Managing Data
As I mentioned in class, the purpose of these workshops is to provide short introductions to popular techniques or tools. Each workshop is only the first few steps in a larger area of study, so be sure to reach out to me if there’s one that especially interests you.
Instead of creating custom standalone workshops for our course, I thought it would be helpful to take advantage of the larger ecosystem of DH tutorials. To that end, many of the workshops for this course are borrowed from collaborative DH resources like The Programming Historian and The DH Literacy Guidebook. This will provide a roadmap for you if you want to build skills in other areas.
To accompany next week’s readings that define and explore data, follow this Programming Historian tutorial on OpenRefine.
OpenRefine is a popular tool for organizing and “cleaning” tabular data sets. It began its life as a proprietary tool—Google Refine—but it’s been open source for quite a while. Follow the tutorial and use OpenRefine to make the Powerhouse Museum data set more usable.
This task is part of the larger field of data management and “cleaning,” an essential part of many digital humanities projects with its own history and theory. If you’re interested in the topic, you might check out Trevor Munoz and Katie Rawson’s well-known piece, “Against Cleaning”.
Once you’ve worked your way through the tutorial, you might spend some time thinking about what happens before and after a tool like OpenRefine. Do you already have a data set that needs to be “cleaned”? Where might you find one? Could data that’s been “refined” be fed into one of our Explore tools for this week, like Palladio or RAWGraphs?
Remember, these workshops are designed to take up about an hour of your time (making up the third hour of our seminar). If you find that a workshop is taking a lot longer than that, it’s okay to stop and reach out to me about it.