Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

04: High School Interactions

This week we’ll explore the sociological questions of triadic closure and weak ties in a network of anonymized high school interaction data. Go to Austin R. Benson’s network data site and download the “contact-high-school” network available there. Make sure you download the projected graph, not the timestamped simplices. This will be a zip file that includes a .txt edgelist file and a text file with some metadata. (Sometimes .tar.gz files are difficult to unzip in Windows. If you’re having trouble, there is a copy of the unzipped file under Sakai Resources.) Make sure you read over Benson’s page to understand this data before you begin.

In a short Jupyter notebook report, answer the following questions about this network. Don’t simply calculate the answers: make sure you’re fully explaining (in writing) the metrics and visualizations that you generate. Consider the Criteria for Good Reports as a guide. You can create markdown cells with section headers to separate the different sections of the report. Rather than number the report as if you’re answering distinct questions, use the questions as a guide to do some data storytelling, i.e. explain this network’s data in an organized way.

  1. What is the total number of triangles in this network? Does this number seem large or small relative to the overall size of the network?

  2. What’s the average clustering coefficient for the network? Graph the distribution of clustering coefficients. What do these measures tell you about the network’s overall structure?

  3. How many bridges are there in the network? How many local bridges are there? What is the maximum span of any local bridge in the network, and what does that tell you about this social community?

  4. There is no function for neighborhood overlap in NetworkX, but the library provides multiple different functions for neighbors. Write a function to calculate neighborhood overlap for every edge in the network. Then, recreate a version of the graph in Figure 3.7 of your book by making a plot of weight and percentile of neigborhood overlap in your network. Does there appear to be a correlation between these two metrics? What does this tell you about the network?

n.b. To answer these questions, you’ll need to use a mixture of the NetworkX skills we’ve been working on, as well as combine NetworkX code with pandas and altair.

When you’re finished, remove these instructions from the top of the file, leaving only your own writing and code. Push the notebook to the Homework repo to submit to Github Classroom by the deadline on the schedule.