05: Political Books#

This week we’ll explore homophily and communities in the network of political books from the Walsh datasets, which you can download from Github. Keep in mind that this data (like the Quaker data from class) includes both a nodes table and an edge table, and you will likely need both. Make sure you read over Walsh’s page to understand this data before you begin.

In a short Jupyter notebook report, answer the following questions about this network. Don’t simply calculate the answers: make sure you’re fully explaining (in writing) the metrics and visualizations that you generate. Consider the Criteria for Good Reports as a guide. You can create markdown cells with section headers to separate the different sections of the report. Rather than number the report as if you’re answering distinct questions, use the questions as a guide to do some data storytelling, i.e. explain this network’s data in an organized way.

  1. Using the political books network, test for statistically significant homophily between liberal and conservative books in this network. Follow all the steps we covered in class: calculate 2pq, the proportion of mixed edges, and homophily, and use a permutation test to look for statistical significance. Explain all of your outputs fully and accurately. Keep in mind that the political ideology attribute includes three categories instead of two. How will that change your code and your analysis?

  2. Run both Girvan-Newman and Louvain Modularity community detection in this network. Write some additional code to answer the question: how much do the generated communities overlap with the liberal, conservative, and neutral labels in the graph? Which community detection seems more accurate?

  3. Create three dynamic visualizations using nx_altair: one that shows the Girvan-Newman communities, one that shows the Louvain communities, and one that shows the original political ideology labels. How are these visualizations similar or different, and do they match what you discovered in question 2? Make sure you use the same layout algorithm and positions for each one: having the nodes in the same place each time will make it much easier to compare the graphs.

  4. Write a short conclusion summarizing the community structure of this graph. Some questions you might answer include: does the graph exhibit homophily and how much? which community detection method works best for this network and why? how do visualizations help you understand the community structure here?

n.b. To answer these questions, you’ll need to use a mixture of the NetworkX skills we’ve been working on, as well as combine NetworkX code with pandas and altair.

When you’re finished, remove these instructions from the top of the file, leaving only your own writing and code. Export the notebook as an HTML file, check to make sure everything is formatted correctly, and submit your HTML file to Sakai.