Community Detection

Community Detection#

We can use some of the metrics and principles we already know to split, or partition, a network into distinct communities. This process is called community detection.

Girvan-Newman Method#

The Girvan-Newman method, described in Chapter 3 of Networks, Crowds, and Markets, creates partitions of nodes by successively deleting the edges with the highest edge betweenness.

As an example, we’ll use the Karate Club graph, which is built-in to NetworkX.

import networkx as nx
import pandas as pd

G = nx.karate_club_graph()

You must import community modules separate from the rest of NetworkX.

from networkx.algorithms.community import girvan_newman
import itertools # You need this to work with generators

# Calculate partition
communities = girvan_newman(G)

# Choose a limit on the number of partitions
k = 4

# Limit generator in your for loop
limited = itertools.takewhile(lambda c: len(c) <= k, communities)

partitions = [c for c in limited]

for p in partitions:
    print(p)

({0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21}, {2, 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33})
({0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21}, {32, 33, 2, 8, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}, {9})
({0, 1, 3, 7, 11, 12, 13, 17, 19, 21}, {32, 33, 2, 8, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}, {4, 5, 6, 10, 16}, {9})

You need to decide how many communities you take! In this case, it seems like 2 is the optimal number.

You could add these communities as a node attribute.

two_groups = partitions[0]

partition_dict = {}
for i,group in enumerate(two_groups):
    for g in group:
        partition_dict[g] = i
        
print(partition_dict)

{0: 0, 1: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 10: 0, 11: 0, 12: 0, 13: 0, 16: 0, 17: 0, 19: 0, 21: 0, 2: 1, 8: 1, 9: 1, 14: 1, 15: 1, 18: 1, 20: 1, 22: 1, 23: 1, 24: 1, 25: 1, 26: 1, 27: 1, 28: 1, 29: 1, 30: 1, 31: 1, 32: 1, 33: 1}

nx.set_node_attributes(G, partition_dict, "gn_partition")

G.nodes.data()

NodeDataView({0: {'club': 'Mr. Hi', 'gn_partition': 0}, 1: {'club': 'Mr. Hi', 'gn_partition': 0}, 2: {'club': 'Mr. Hi', 'gn_partition': 1}, 3: {'club': 'Mr. Hi', 'gn_partition': 0}, 4: {'club': 'Mr. Hi', 'gn_partition': 0}, 5: {'club': 'Mr. Hi', 'gn_partition': 0}, 6: {'club': 'Mr. Hi', 'gn_partition': 0}, 7: {'club': 'Mr. Hi', 'gn_partition': 0}, 8: {'club': 'Mr. Hi', 'gn_partition': 1}, 9: {'club': 'Officer', 'gn_partition': 1}, 10: {'club': 'Mr. Hi', 'gn_partition': 0}, 11: {'club': 'Mr. Hi', 'gn_partition': 0}, 12: {'club': 'Mr. Hi', 'gn_partition': 0}, 13: {'club': 'Mr. Hi', 'gn_partition': 0}, 14: {'club': 'Officer', 'gn_partition': 1}, 15: {'club': 'Officer', 'gn_partition': 1}, 16: {'club': 'Mr. Hi', 'gn_partition': 0}, 17: {'club': 'Mr. Hi', 'gn_partition': 0}, 18: {'club': 'Officer', 'gn_partition': 1}, 19: {'club': 'Mr. Hi', 'gn_partition': 0}, 20: {'club': 'Officer', 'gn_partition': 1}, 21: {'club': 'Mr. Hi', 'gn_partition': 0}, 22: {'club': 'Officer', 'gn_partition': 1}, 23: {'club': 'Officer', 'gn_partition': 1}, 24: {'club': 'Officer', 'gn_partition': 1}, 25: {'club': 'Officer', 'gn_partition': 1}, 26: {'club': 'Officer', 'gn_partition': 1}, 27: {'club': 'Officer', 'gn_partition': 1}, 28: {'club': 'Officer', 'gn_partition': 1}, 29: {'club': 'Officer', 'gn_partition': 1}, 30: {'club': 'Officer', 'gn_partition': 1}, 31: {'club': 'Officer', 'gn_partition': 1}, 32: {'club': 'Officer', 'gn_partition': 1}, 33: {'club': 'Officer', 'gn_partition': 1}})

Look at how closely the Girvan-Newman communities overlap with the two different karate clubs!

Louvain Modularity#

In addition to the Girvan-Newman betweenness-based method, you can also determine communities based on Louvain modularity. Remember that modularity is very similar to the “homophily” metric we generated in the previous chapter.

The Louvain method attempts to maximize modularity, by moving nodes between partitions until the highest possible modularity values for each partition are achieved. Because this is an agglomerative method, it’s possible to ask the algorithm to generate the optimal number of communities. (NetworkX does this automatically.)

# You have to import this function as well
from networkx.algorithms.community import louvain_communities

# Run the algorithm with a random seed, so it's the same each time
louvain = louvain_communities(G, seed=42)
print(louvain)

[{1, 2, 3, 7, 12, 13}, {0, 4, 5, 6, 10, 11, 16, 17, 19, 21}, {24, 25, 28, 31}, {32, 33, 8, 9, 14, 15, 18, 20, 22, 23, 26, 27, 29, 30}]

The Louvain algorithm generated 4 partitions instead of two! You can add these to your graph in a similar manner as above.

louvain_dict = {}
for i,group in enumerate(louvain):
    for g in group:
        louvain_dict[g] = i
        
print(louvain_dict)

{1: 0, 2: 0, 3: 0, 7: 0, 12: 0, 13: 0, 0: 1, 4: 1, 5: 1, 6: 1, 10: 1, 11: 1, 16: 1, 17: 1, 19: 1, 21: 1, 24: 2, 25: 2, 28: 2, 31: 2, 32: 3, 33: 3, 8: 3, 9: 3, 14: 3, 15: 3, 18: 3, 20: 3, 22: 3, 23: 3, 26: 3, 27: 3, 29: 3, 30: 3}

nx.set_node_attributes(G, louvain_dict, "louvain_partition")

G.nodes.data()

NodeDataView({0: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 1}, 1: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 0}, 2: {'club': 'Mr. Hi', 'gn_partition': 1, 'louvain_partition': 0}, 3: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 0}, 4: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 1}, 5: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 1}, 6: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 1}, 7: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 0}, 8: {'club': 'Mr. Hi', 'gn_partition': 1, 'louvain_partition': 3}, 9: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 10: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 1}, 11: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 1}, 12: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 0}, 13: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 0}, 14: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 15: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 16: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 1}, 17: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 1}, 18: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 19: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 1}, 20: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 21: {'club': 'Mr. Hi', 'gn_partition': 0, 'louvain_partition': 1}, 22: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 23: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 24: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 2}, 25: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 2}, 26: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 27: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 28: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 2}, 29: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 30: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 31: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 2}, 32: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}, 33: {'club': 'Officer', 'gn_partition': 1, 'louvain_partition': 3}})

From here a Pandas DataFrame can help you compare the two community detection methods to the real communities.

nodes = pd.DataFrame.from_dict(G.nodes, orient='index')
nodes.reset_index(level=0,names="node_id",inplace=True)
nodes.sort_values("club")

	node_id	club	gn_partition	louvain_partition
0	0	Mr. Hi	0	1
21	21	Mr. Hi	0	1
19	19	Mr. Hi	0	1
17	17	Mr. Hi	0	1
13	13	Mr. Hi	0	0
12	12	Mr. Hi	0	0
11	11	Mr. Hi	0	1
10	10	Mr. Hi	0	1
16	16	Mr. Hi	0	1
1	1	Mr. Hi	0	0
7	7	Mr. Hi	0	0
6	6	Mr. Hi	0	1
5	5	Mr. Hi	0	1
4	4	Mr. Hi	0	1
3	3	Mr. Hi	0	0
2	2	Mr. Hi	1	0
8	8	Mr. Hi	1	3
31	31	Officer	1	2
30	30	Officer	1	3
29	29	Officer	1	3
28	28	Officer	1	2
27	27	Officer	1	3
26	26	Officer	1	3
25	25	Officer	1	2
9	9	Officer	1	3
23	23	Officer	1	3
22	22	Officer	1	3
20	20	Officer	1	3
18	18	Officer	1	3
32	32	Officer	1	3
15	15	Officer	1	3
14	14	Officer	1	3
24	24	Officer	1	2
33	33	Officer	1	3

Community Detection

Contents

Community Detection#

Girvan-Newman Method#

Louvain Modularity#