01: Networks in Python#
This week we’re getting used to Python as well as the NetworkX library. Some of these exercise will be review and/or will help you adapt what you know from other languages into Python. As always, if you have questions, I’m more than happy to talk through things with you, and you’re welcome to work on this with classmates and solve problems together (but all the work on the assignment should ultimately be your own).
In a short Jupyter notebook report, answer the following questions about this network. Don’t simply calculate the answers: make sure you’re fully explaining (in writing) the metrics and visualizations that you generate. Consider the Criteria for Good Reports as a guide. You can create markdown cells with section headers to separate the different sections of the report. Rather than number the report as if you’re answering distinct questions, use the questions as a guide to do some data storytelling, i.e. explain this network’s data in an organized way.
n.b. You can download this notebook for your own use by clicking the download icon above.
First, some Python practice based on a classic programming interview question. Print the numbers from 1 to 100. If the number is divisible by 2, print the word “Tinker”. If the number is divisible by 7, print the word “Tailor”. If the number is divisible by both 2 and 7, print the words “Tinker Tailor Soldier Spy” (a great movie if you haven’t seen it). If the number isn’t divisible by any of these, just display the number as normal. (Hint: consider the order of your conditions!)
Choose a dataset from the Walsh collection. Remember to take a look at the metadata to see where this network came from. Load the data into NetworkX, and explain your steps in Markdown cells or Python comments. (If there is additional information about your network’s nodes or edges, make sure to load that too, as attributes).
What are some basic facts about your chosen network: how many nodes and edges does it have? is it directed? is it connected? What do these basic measures tell you about the network?
Loop through every node in the network, displaying any attributes that go along with them. Some of these networks are large and this will create hundreds of lines of output. We won’t normally want such messy output, but in this case it’s good to practice. Then do the same thing for edges, but only print the first 100 edges of your network.
After writing the above as traditional loops, write the code again as a list comprehension and display the resulting list. Explain the difference between the two approaches.
Convert both the nodes and edges, with any attributes, into Pandas dataframes. Display both dataframes and explain why you might want this data structure instead of Python lists or NetworkX graph objects.
Export the network as an adjacency matrix, a GML file, and a JSON file. Show your exporting code and include these files when you turn in this assignment.
When you’re finished, remove these instructions from the top of the file, leaving only your own writing and code. Export the notebook as an HTML file, check to make sure everything is formatted correctly, and submit your HTML file to Sakai.