Text as Interface:

Analyzing Data on the Command Line

Using this Slideshow

jrladd.com/commandline

John R Ladd / @johnrladd

To navigate, press the arrows or hit the space bar

What is UNIX? What is the Command Line?

Unix: A family of operating systems that includes MacOS and Linux
Command Line (shell): “a program that takes keyboard commands and passes them to the operating system to carry out” (Shotts)
GUI (graphical user interface): the “normal”, visual interface of windows and applications
Synonyms/Acronyms for “command line”: CLI, shell, sh, bash, terminal

What can you do with CLI?

  • navigate your computer
  • manage files (create, delete, read, edit)
  • run programs
  • use programming languages
  • access the internet and download files
  • combine commands/applications
  • count, aggregate, and analyze… anything!

How to Use this Tutorial

From here forward, you’ll find lists of commands and their functions, followed by exercises with those commands.

You’ll run these exercises inside the Terminal application on your Mac or Linux operating system. (Click here for Windows-based instructions.) Type each line as it appears in order, and then hit <ENTER> to run that line.

If you see, <TAB> in a command, hit the tab key on your keyboard (don’t type the word TAB).

And if you encounter any problems or have a question, feel free to contact me.

But First, Directory Structure

There’s No Place Like Home

When you type pwd and hit enter you’ll see your “home” directory. In my case it’s:

/home/jrladd

In yours it might be “/Users/yourUsername” or something else.

This is also written as “~

Typing “cd ~” will always bring you home!

Command Line Basics

The Basics

  • pwd: Print Working Directory, see where you are currently
  • cd <directory/path>: Change Directory; move to any directory/folder
  • ls <directory/path>: list contents of current directory (or any directory)
  • mkdir <directory name>: Make a new directory/folder
  • <TAB>: autocomplete any filename or directory
  • man <command>: see manual for any command

Manipulating Files

touch <filename>: create empty file/change its timestamp echo "<any text>": repeat any text to standard output <command> > <filename>: put any output into file <command> >> <filename>: put any output at end of file <command> < <filename>: put contents of file into command *: any bit of text (So *.txt would be all .txt files) cat <filename>: repeat contents of file to standard output less <filename>: see file in basic file viewer cp <filename> <destination/new filename>: Copy a file mv <filename> <destination/new filename>: Move or rename a file rm <filename>: Remove a file or directory (Be careful!)

Manipulating Files (cont.)

cd test_dir
ls
touch test_file.txt
ls
echo "Knock, Knock"
echo "Knock, Knock" > test_file.txt
cat test_file.txt
echo "Who's there?" >> test_file.txt
cat test_file.txt
less te<TAB>
q

Manipulating Files (cont.)

cp test_file.txt duplicate_file.txt
cat duplicate_file.txt
ls
cat *.txt
mv test_file.txt renamed_file.txt
cat renamed_file.txt
ls
rm duplicate_file.txt

Command Line Analaysis

Analyzing Files

wget <URL>: download a file from the internet unzip <zipfile>: extract the contents of a .zip file wc <filename>: counts words (-w), lines (-l), and characters (-c) grep <search term> <filename>: search within a file or group of files, use -c to count occurrences tr '<original_text>' '<replacement_text>': transform all occurrences of some text with some other text sort: put anything into alphabetical/numerical order uniq: reduce a list to only unique items, use -c to count them ptx: create a “permuted index” for keyword searching |: “pipe” is the Most Useful Operator, move the output of one command to the output of any other

Analyzing Files (cont.)

wget jrladd.com/inauguralspeeches.zip
ls
unzip ina<TAB>
ls
cd inauguralspeeches
ls
"How many words are in one file? in all files?"
wc -w 1_washington_1789.txt
wc -w *.txt
"How many files are in the directory?"
ls
ls | wc -l

Analyzing Files (cont.)

"Find one word in a file"
grep -i justice 3_ad<TAB>
grep -oi justice 3_ad<TAB>
grep -ci justice 3_ad<TAB>
grep -oi justice 3_ad<TAB> | wc -l
"Find one word in all files"
grep -oi justice *.txt
grep -oi justice *.txt | wc -l
"Count all words in entire corpus"
cat *.txt | tr ' ' '\n' | sort | uniq -c | sort -n
"Keyword in Context (KWIC) Search"
ptx -f -w 50 *.txt
ptx -f -w 50 *.txt | grep -i justice
ptx -f -w 50 *.txt | grep -i  "[[:alpha:]]   justice"

Further Reading

Resources

Explain Shell: Full explanation of any command

Command Line Cheatsheet: Comprehensive list of basic commands

Programming Historian

More Tutorials

William Turkel, Basic Text Analysis with Command Line Tools in Linux and Pattern Matching and Permuted Term Indexing with Command Line Tools in Linux

Kenneth Ward Church, Unix for Poets

Thank you!