Directed Networks#
As we begin to discuss information networks like citations networks, semantic networks, and especially the Web, being able to work with directed graphs becomes an essential skill.
DiGraph Objects#
Unlike signed networks, which have no special designation in NetworkX, directed networks use a special DiGraph()
object that retains directed information about the source and target of each edge.
import networkx as nx
import pandas as pd
# Read a DiGraph from a file
# This is actually a "MultiDiGraph", with multiple edges between nodes
# This file contains the neural network of C. Elegans
D = nx.read_gml("../data/celegansneural.gml")
print(D)
MultiDiGraph with 297 nodes and 2359 edges
Some files (especially .gml
format) will already indicate that the network is directed. With other files and objects, like edgelists or Pandas DataFrames, you’ll need to read or convert the file using create_using=nx.DiGraph
to make sure that the DiGraph constructor is used to create the object.
# Import an edgelist of high school student interaction data as a DiGraph
HS = nx.read_weighted_edgelist("../data/contact-high-school-proj-graph.txt", create_using=nx.DiGraph)
print(HS)
DiGraph with 327 nodes and 5818 edges
In-Degree and Out-Degree#
When you create DiGraph()
objects, they include methods for in-degree and out-degree as well as standard, undirected degree.
in_degree = D.in_degree() # Calculate in-degree
out_degree = D.out_degree() # Calculate out-degree
degree = D.degree() # Calculate degree
# Add attributes to Graph
nx.set_node_attributes(D, dict(in_degree), "in_degree")
nx.set_node_attributes(D, dict(out_degree), "out_degree")
nx.set_node_attributes(D, dict(degree), "degree")
# Convert nodes to a dataframe
nodes = pd.DataFrame.from_dict(D.nodes, orient='index')
nodes.reset_index(level=0,names="neuron_id",inplace=True)
nodes
neuron_id | in_degree | out_degree | degree | |
---|---|---|---|---|
0 | 1 | 2 | 9 | 11 |
1 | 51 | 24 | 10 | 34 |
2 | 72 | 41 | 39 | 80 |
3 | 77 | 33 | 21 | 54 |
4 | 78 | 35 | 21 | 56 |
... | ... | ... | ... | ... |
292 | 298 | 0 | 1 | 1 |
293 | 299 | 0 | 1 | 1 |
294 | 300 | 0 | 1 | 1 |
295 | 301 | 0 | 1 | 1 |
296 | 302 | 0 | 1 | 1 |
297 rows × 4 columns
Strongly Connected Components#
You can easily find out if a DiGraph
is strongly connected.
nx.is_strongly_connected(D)
False
Using the undirected version of this function will return an error.
nx.is_connected(D)
---------------------------------------------------------------------------
NetworkXNotImplemented Traceback (most recent call last)
Cell In[6], line 1
----> 1 nx.is_connected(D)
File ~/Library/Python/3.9/lib/python/site-packages/networkx/classes/backends.py:148, in _dispatch.<locals>.wrapper(*args, **kwds)
144 else:
145 raise NetworkXNotImplemented(
146 f"'{name}' not implemented by {plugin_name}"
147 )
--> 148 return func(*args, **kwds)
File ~/Library/Python/3.9/lib/python/site-packages/networkx/utils/decorators.py:766, in argmap.__call__.<locals>.func(_argmap__wrapper, *args, **kwargs)
765 def func(*args, __wrapper=None, **kwargs):
--> 766 return argmap._lazy_compile(__wrapper)(*args, **kwargs)
File <class 'networkx.utils.decorators.argmap'> compilation 26:3, in argmap_is_connected_23(G)
1 import bz2
2 import collections
----> 3 import gzip
4 import inspect
5 import itertools
File ~/Library/Python/3.9/lib/python/site-packages/networkx/utils/decorators.py:86, in not_implemented_for.<locals>._not_implemented_for(g)
82 def _not_implemented_for(g):
83 if (mval is None or mval == g.is_multigraph()) and (
84 dval is None or dval == g.is_directed()
85 ):
---> 86 raise nx.NetworkXNotImplemented(errmsg)
88 return g
NetworkXNotImplemented: not implemented for directed type
To find out if a directed graph is connected simply by the presence or absence of an edge, use the “weakly connected” functions.
nx.is_weakly_connected(D)
True
There are similar functions to get strongly connected components.
strong_comp = list(nx.strongly_connected_components(D))
# Count the number of strongly connected components
len(strong_comp)
57
# Get the largest strongly connected component
largest = max(strong_comp, key=len)
# How many nodes are there in this component
len(largest)
239
# Compare to the number of nodes in
# the largest weekly connected component
weak_comp = list(nx.weakly_connected_components(D))
len(max(weak_comp, key=len))
297
Paths#
Many of the path functions work the same for directed graphs, but using the directed definition of a path rather than the undirected definition.
nx.shortest_path(D, '1', '10')
['1', '90', '4', '10']
But certain concepts, like average shortest path length, don’t make sense unless your graph is strongly connected. The following will throw an error.
nx.average_shortest_path_length(D)
---------------------------------------------------------------------------
NetworkXError Traceback (most recent call last)
Cell In[13], line 1
----> 1 nx.average_shortest_path_length(D)
File ~/Library/Python/3.9/lib/python/site-packages/networkx/algorithms/shortest_paths/generic.py:409, in average_shortest_path_length(G, weight, method)
407 # Shortest path length is undefined if the graph is not strongly connected.
408 if G.is_directed() and not nx.is_strongly_connected(G):
--> 409 raise nx.NetworkXError("Graph is not strongly connected.")
410 # Shortest path length is undefined if the graph is not connected.
411 if not G.is_directed() and not nx.is_connected(G):
NetworkXError: Graph is not strongly connected.
You can approximate average shortest path length with a subgraph of the largest strongly connect component.
nx.average_shortest_path_length(D.subgraph(largest))
3.9943215780035866
Or you could convert the directed network to undirected and then calculate average shortest path length (if this results in a connected graph). But keep in mind that this produces a completely different kind of measure that doesn’t directly apply to your original directed graph!
G = nx.Graph(D)
nx.average_shortest_path_length(G)
2.455318955318955
You can also check to see the paths that exist between two strongly connected components. This is known as an edge boundary, or a cut set.
# Get the edge boundary between the two largest components in the graph
# First get the second largest component
second_largest = sorted(strong_comp, key=len, reverse=True)[1]
# Then get the edge boundary
for edge in nx.edge_boundary(D, largest, second_largest):
print(edge)
('161', '102')
('20', '121')
('8', '121')
('8', '102')
('41', '121')
('41', '102')
('7', '121')
('7', '102')
('108', '121')
('9', '121')
('37', '121')
('37', '102')
('171', '121')
('17', '121')
('30', '102')
('98', '121')
('98', '102')
('21', '102')
('100', '121')
('36', '121')
('36', '102')
('120', '121')
('89', '121')
('89', '102')
('101', '121')
('90', '121')
('90', '102')
('18', '102')
('6', '102')
('38', '121')
('3', '121')
The code above shows you all of the edges going from the largest component to the second largest component.