04: High School Interactions#
This week we’ll explore the sociological questions of triadic closure and weak ties in a network of anonymized high school interaction data. Go to Austin R. Benson’s network data site and download the “contact-high-school” network available there. Make sure you download the projected graph, not the timestamped simplices. This will be a zip file that includes a .txt
edgelist file and a text file with some metadata. (Sometimes .tar.gz
files are difficult to unzip in Windows. If you’re having trouble, there is a copy of the unzipped file under Sakai Resources.) Make sure you read over Benson’s page to understand this data before you begin.
In a short Jupyter notebook report, answer the following questions about this network. Don’t simply calculate the answers: make sure you’re fully explaining (in writing) the metrics and visualizations that you generate. Consider the Criteria for Good Reports as a guide. You can create markdown cells with section headers to separate the different sections of the report. Rather than number the report as if you’re answering distinct questions, use the questions as a guide to do some data storytelling, i.e. explain this network’s data in an organized way.
What is the total number of triangles in this network? Does this number seem large or small relative to the overall size of the network?
What’s the average clustering coefficient for the network? Graph the distribution of clustering coefficients. What do these measures tell you about the network’s overall structure?
How many bridges are there in the network? How many local bridges are there? What is the maximum span of any local bridge in the network, and what does that tell you about this social community?
There is no function for neighborhood overlap in NetworkX, but the library provides multiple different functions for neighbors. Write a function to calculate neighborhood overlap for every edge in the network. Then, recreate a version of the graph in Figure 3.7 of your book by making a regression plot of weight and neigborhood overlap in your network. Does there appear to be a correlation between these two metrics? What does this tell you about the network?
n.b. To answer these questions, you’ll need to use a mixture of the NetworkX skills we’ve been working on, as well as combine NetworkX code with pandas and seaborn.
When you’re finished, remove these instructions from the top of the file, leaving only your own writing and code. Export the notebook as an HTML file, check to make sure everything is formatted correctly, and submit your HTML file to Sakai.