{ "cells": [ { "cell_type": "markdown", "id": "dee32b11-7922-44f4-a587-e118c0e1b793", "metadata": {}, "source": [ "# Directed Networks\n", "\n", "As we begin to discuss information networks like citations networks, semantic networks, and especially the Web, being able to work with *directed* graphs becomes an essential skill.\n", "\n", "## DiGraph Objects\n", "\n", "Unlike signed networks, which have no special designation in NetworkX, directed networks use a special `DiGraph()` object that retains directed information about the source and target of each edge." ] }, { "cell_type": "code", "execution_count": 1, "id": "2976a937-0352-4d5d-aaac-6a1ff7f474ed", "metadata": {}, "outputs": [], "source": [ "import networkx as nx\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "c6870e8e-feb5-45dc-89bf-4810c3aee461", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MultiDiGraph with 297 nodes and 2359 edges\n" ] } ], "source": [ "# Read a DiGraph from a file\n", "# This is actually a \"MultiDiGraph\", with multiple edges between nodes\n", "# This file contains the neural network of C. Elegans\n", "D = nx.read_gml(\"../data/celegansneural.gml\")\n", "print(D)" ] }, { "cell_type": "markdown", "id": "a787d1c1-1992-499c-b056-ef0d9dceda4d", "metadata": {}, "source": [ "Some files (especially `.gml` format) will already indicate that the network is directed. With other files and objects, like edgelists or Pandas DataFrames, you'll need to read or convert the file using `create_using=nx.DiGraph` to make sure that the DiGraph constructor is used to create the object." ] }, { "cell_type": "code", "execution_count": 3, "id": "8bb66da8-6551-4df8-8f0c-99ef1d172fa7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DiGraph with 327 nodes and 5818 edges\n" ] } ], "source": [ "# Import an edgelist of high school student interaction data as a DiGraph\n", "HS = nx.read_weighted_edgelist(\"../data/contact-high-school-proj-graph.txt\", create_using=nx.DiGraph)\n", "print(HS)" ] }, { "cell_type": "markdown", "id": "fc625f1b-34fb-4b54-9ae0-daabf685d4f7", "metadata": {}, "source": [ "## In-Degree and Out-Degree\n", "\n", "When you create `DiGraph()` objects, they include methods for in-degree and out-degree as well as standard, undirected degree." ] }, { "cell_type": "code", "execution_count": 4, "id": "4f37501a-7a8f-403a-85aa-edee9e6448f5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
neuron_idin_degreeout_degreedegree
012911
151241034
272413980
377332154
478352156
...............
292298011
293299011
294300011
295301011
296302011
\n", "

297 rows × 4 columns

\n", "
" ], "text/plain": [ " neuron_id in_degree out_degree degree\n", "0 1 2 9 11\n", "1 51 24 10 34\n", "2 72 41 39 80\n", "3 77 33 21 54\n", "4 78 35 21 56\n", ".. ... ... ... ...\n", "292 298 0 1 1\n", "293 299 0 1 1\n", "294 300 0 1 1\n", "295 301 0 1 1\n", "296 302 0 1 1\n", "\n", "[297 rows x 4 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "in_degree = D.in_degree() # Calculate in-degree\n", "out_degree = D.out_degree() # Calculate out-degree\n", "degree = D.degree() # Calculate degree\n", "\n", "# Add attributes to Graph\n", "nx.set_node_attributes(D, dict(in_degree), \"in_degree\")\n", "nx.set_node_attributes(D, dict(out_degree), \"out_degree\")\n", "nx.set_node_attributes(D, dict(degree), \"degree\")\n", "\n", "# Convert nodes to a dataframe\n", "nodes = pd.DataFrame.from_dict(D.nodes, orient='index')\n", "nodes.reset_index(level=0,names=\"neuron_id\",inplace=True)\n", "nodes" ] }, { "cell_type": "markdown", "id": "1ca382cf-bde2-40a6-b773-d001ed267d06", "metadata": {}, "source": [ "## Strongly Connected Components\n", "\n", "You can easily find out if a `DiGraph` is strongly connected." ] }, { "cell_type": "code", "execution_count": 5, "id": "cec5cf04-31da-4635-96ad-0816244fef25", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nx.is_strongly_connected(D)" ] }, { "cell_type": "markdown", "id": "b4d9d6bf-1995-4e05-83f8-18045fa8135a", "metadata": {}, "source": [ "Using the *undirected* version of this function will return an error." ] }, { "cell_type": "code", "execution_count": 6, "id": "fbff38a3-b32d-4816-b0fc-081cf516ccb4", "metadata": {}, "outputs": [ { "ename": "NetworkXNotImplemented", "evalue": "not implemented for directed type", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNetworkXNotImplemented\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[6], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mnx\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mis_connected\u001b[49m\u001b[43m(\u001b[49m\u001b[43mD\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/Library/Python/3.9/lib/python/site-packages/networkx/utils/decorators.py:766\u001b[0m, in \u001b[0;36margmap.__call__..func\u001b[0;34m(_argmap__wrapper, *args, **kwargs)\u001b[0m\n\u001b[1;32m 765\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mfunc\u001b[39m(\u001b[38;5;241m*\u001b[39margs, __wrapper\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[0;32m--> 766\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43margmap\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_lazy_compile\u001b[49m\u001b[43m(\u001b[49m\u001b[43m__wrapper\u001b[49m\u001b[43m)\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m compilation 26:3\u001b[0m, in \u001b[0;36margmap_is_connected_23\u001b[0;34m(G)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mbz2\u001b[39;00m\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mcollections\u001b[39;00m\n\u001b[0;32m----> 3\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mgzip\u001b[39;00m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01minspect\u001b[39;00m\n\u001b[1;32m 5\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mitertools\u001b[39;00m\n", "File \u001b[0;32m~/Library/Python/3.9/lib/python/site-packages/networkx/utils/decorators.py:86\u001b[0m, in \u001b[0;36mnot_implemented_for.._not_implemented_for\u001b[0;34m(g)\u001b[0m\n\u001b[1;32m 82\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_not_implemented_for\u001b[39m(g):\n\u001b[1;32m 83\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m (mval \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mor\u001b[39;00m mval \u001b[38;5;241m==\u001b[39m g\u001b[38;5;241m.\u001b[39mis_multigraph()) \u001b[38;5;129;01mand\u001b[39;00m (\n\u001b[1;32m 84\u001b[0m dval \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mor\u001b[39;00m dval \u001b[38;5;241m==\u001b[39m g\u001b[38;5;241m.\u001b[39mis_directed()\n\u001b[1;32m 85\u001b[0m ):\n\u001b[0;32m---> 86\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m nx\u001b[38;5;241m.\u001b[39mNetworkXNotImplemented(errmsg)\n\u001b[1;32m 88\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m g\n", "\u001b[0;31mNetworkXNotImplemented\u001b[0m: not implemented for directed type" ] } ], "source": [ "nx.is_connected(D)" ] }, { "cell_type": "markdown", "id": "73a14b4a-c910-4319-bbdd-83147030a367", "metadata": {}, "source": [ "To find out if a directed graph is connected simply by the presence or absence of an edge, use the \"weakly connected\" functions." ] }, { "cell_type": "code", "execution_count": 7, "id": "9059651c-3f6a-49e6-8cd3-f4f0536eb3e8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nx.is_weakly_connected(D)" ] }, { "cell_type": "markdown", "id": "dfb9331b-311d-472a-a35f-cc9ae9f3991a", "metadata": {}, "source": [ "There are similar functions to get strongly connected components." ] }, { "cell_type": "code", "execution_count": 8, "id": "04a141c6-1e19-4937-b31f-e38bc90657fe", "metadata": {}, "outputs": [], "source": [ "strong_comp = list(nx.strongly_connected_components(D))" ] }, { "cell_type": "code", "execution_count": 9, "id": "9d370b28-8829-4539-9fd0-9358f7752582", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "57" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Count the number of strongly connected components\n", "len(strong_comp)" ] }, { "cell_type": "code", "execution_count": 10, "id": "a9fc66b5-61bc-4b32-8d82-9eaf7202886a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "239" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the largest strongly connected component\n", "largest = max(strong_comp, key=len)\n", "\n", "# How many nodes are there in this component\n", "len(largest)" ] }, { "cell_type": "code", "execution_count": 11, "id": "45199436-8eb0-46f0-8e7b-b8e88bfcfd8d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "297" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Compare to the number of nodes in \n", "# the largest weekly connected component\n", "\n", "weak_comp = list(nx.weakly_connected_components(D))\n", "len(max(weak_comp, key=len))" ] }, { "cell_type": "markdown", "id": "c847c1a1-d6cd-4f8c-a6fd-5f7c46e92dd1", "metadata": {}, "source": [ "## Paths\n", "\n", "Many of the path functions work the same for directed graphs, but using the directed definition of a path rather than the undirected definition." ] }, { "cell_type": "code", "execution_count": 12, "id": "c0384a57-afe7-4976-b15e-5947b697ed4c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['1', '90', '4', '10']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nx.shortest_path(D, '1', '10')" ] }, { "cell_type": "markdown", "id": "357780ad-e7e2-41ed-99c9-b46559e6e761", "metadata": {}, "source": [ "But certain concepts, like average shortest path length, don't make sense unless your graph is strongly connected. The following will throw an error." ] }, { "cell_type": "code", "execution_count": 13, "id": "a27742e2-b680-46c5-9139-52fcfb09beff", "metadata": {}, "outputs": [ { "ename": "NetworkXError", "evalue": "Graph is not strongly connected.", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNetworkXError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[13], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mnx\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43maverage_shortest_path_length\u001b[49m\u001b[43m(\u001b[49m\u001b[43mD\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/Library/Python/3.9/lib/python/site-packages/networkx/algorithms/shortest_paths/generic.py:409\u001b[0m, in \u001b[0;36maverage_shortest_path_length\u001b[0;34m(G, weight, method)\u001b[0m\n\u001b[1;32m 407\u001b[0m \u001b[38;5;66;03m# Shortest path length is undefined if the graph is not strongly connected.\u001b[39;00m\n\u001b[1;32m 408\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m G\u001b[38;5;241m.\u001b[39mis_directed() \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m nx\u001b[38;5;241m.\u001b[39mis_strongly_connected(G):\n\u001b[0;32m--> 409\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m nx\u001b[38;5;241m.\u001b[39mNetworkXError(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mGraph is not strongly connected.\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 410\u001b[0m \u001b[38;5;66;03m# Shortest path length is undefined if the graph is not connected.\u001b[39;00m\n\u001b[1;32m 411\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m G\u001b[38;5;241m.\u001b[39mis_directed() \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m nx\u001b[38;5;241m.\u001b[39mis_connected(G):\n", "\u001b[0;31mNetworkXError\u001b[0m: Graph is not strongly connected." ] } ], "source": [ "nx.average_shortest_path_length(D)" ] }, { "cell_type": "markdown", "id": "c31d4994-b57b-4f5a-9f9b-e665b827b11f", "metadata": {}, "source": [ "You can approximate average shortest path length with a subgraph of the largest strongly connect component." ] }, { "cell_type": "code", "execution_count": 14, "id": "f6bb0951-536e-456d-8258-c12e55854cc3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.9943215780035866" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nx.average_shortest_path_length(D.subgraph(largest))" ] }, { "cell_type": "markdown", "id": "3c47092e-0633-4bb5-9e0f-dab8ff8e6608", "metadata": {}, "source": [ "Or you could convert the directed network to undirected and then calculate average shortest path length (if this results in a connected graph). But keep in mind that this produces a completely different kind of measure that doesn't directly apply to your original directed graph!" ] }, { "cell_type": "code", "execution_count": 15, "id": "9e277462-1886-455a-bec6-b636f4c68bf1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.455318955318955" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G = nx.Graph(D)\n", "nx.average_shortest_path_length(G)" ] }, { "cell_type": "markdown", "id": "e210da27-e622-4cd4-a49f-41a646b7e8d7", "metadata": {}, "source": [ "You can also check to see the paths that exist between two strongly connected components. This is known as an edge boundary, or a cut set." ] }, { "cell_type": "code", "execution_count": 17, "id": "6f401ec7-d8ff-405d-ba72-caafd211733c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('161', '102')\n", "('20', '121')\n", "('8', '121')\n", "('8', '102')\n", "('41', '121')\n", "('41', '102')\n", "('7', '121')\n", "('7', '102')\n", "('108', '121')\n", "('9', '121')\n", "('37', '121')\n", "('37', '102')\n", "('171', '121')\n", "('17', '121')\n", "('30', '102')\n", "('98', '121')\n", "('98', '102')\n", "('21', '102')\n", "('100', '121')\n", "('36', '121')\n", "('36', '102')\n", "('120', '121')\n", "('89', '121')\n", "('89', '102')\n", "('101', '121')\n", "('90', '121')\n", "('90', '102')\n", "('18', '102')\n", "('6', '102')\n", "('38', '121')\n", "('3', '121')\n" ] } ], "source": [ "# Get the edge boundary between the two largest components in the graph\n", "\n", "# First get the second largest component\n", "second_largest = sorted(strong_comp, key=len, reverse=True)[1]\n", "\n", "# Then get the edge boundary\n", "for edge in nx.edge_boundary(D, largest, second_largest):\n", " print(edge)" ] }, { "cell_type": "markdown", "id": "4b510cba-5bf4-4ca0-be83-136acddefeed", "metadata": {}, "source": [ "The code above shows you all of the edges going *from* the largest component *to* the second largest component." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }