Graphs in Python
Origins of Graph Theory

Before we start with the actual implementations of graphs in Python and before we start with the introduction of Python modules dealing with graphs, we want to devote ourselves to the origins of graph theory. The origins take us back in time to the Künigsberg of the 18th century. Königsberg was a city in Prussia that time. The river Pregel flowed through the town, creating two islands. The city and the islands were connected by seven bridges as shown. The inhabitants of the city were moved by the question, if it was possible to take a walk through the town by visiting each area of the town and crossing each bridge only once? Every bridge must have been crossed completely, i.e. it is not allowed to walk halfway onto a bridge and then turn around and later cross the other half from the other side. The walk need not start and end at the same spot. Leonhard Euler solved the problem in 1735 by proving that it is not possible. He found out that the choice of a route inside each land area is irrelevant and that the only thing which mattered is the order (or the sequence) in which the bridges are crossed. He had formulated an abstraction of the problem, eliminating unnecessary facts and focussing on the land areas and the bridges connecting them. This way, he created the foundations of graph theory. If we see a "land area" as a vertex and each bridge as an edge, we have "reduced" the problem to a graph.
Introduction into Graph Theory Using Python

Before we start our treatise on possible Python representations of graphs, we want to present some general definitions of graphs and it's components. A "graph"1 in mathematics and computer science consists of "nodes", also known as "vertices". Nodes may or may not be connected with one another. In our illustration, - which is a pictorial representation of a graph, - the node "a" is connected with the node "c", but "a" is not connected with "b". The connecting line between two nodes is called an edge. If the edges between the nodes are undirected, the graph is called an undirected graph. If an edge is directed from one vertex (node) to another, a graph is called a directed graph. A directed edge is called an arc. Though graphs may look very theoretical, many practical problems can be represented by graphs. They are often used to model problems or situations in physics, biology, psychology and above all in computer science. In computer science, graphs are used to represent networks of communication, data organization, computational devices, the flow of computation, In the latter case, the are used to represent the data organisation, like the file system of an operating system, or communication networks. The link structure of websites can be seen as a graph as well, i.e. a directed graph, because a link is a directed edge or an arc. Python has no built-in data type or class for graphs, but it is easy to implement them in Python. One data type is ideal for representing graphs in Python, i.e. dictionaries. The graph in our illustration can be implemented in the following way:
The keys of the dictionary above are the nodes of our graph. The corresponding values are lists with the nodes, which are connecting by an edge. There is no simpler and more elegant way to represent a graph. An edge can be seen as a 2-tuple with nodes as elements, i.e. ("a", "b") Function to generate the list of all edges:
This code generates the following output, if combined with the previously defined graph dictionary:
As we can see, there is no edge containing the node "f". "f" is an isolated node of our graph. The following Python function calculates the isolated nodes of a given graph:
If we call this function with our graph, a list containing "f" will be returned: ["f"]
Graphs as a Python Class

Before we go on with writing functions for graphs, we have a first go at a Python graph class implementation. If you look at the following listing of our class, you can see in the __init__-method that we use a dictionary "self.__graph_dict" for storing the vertices and their corresponding adjacent vertices.
If you start this module standalone, you will get the following result:
Paths in Graphs
We want to find now the shortest path from one node to another node. Before we come to the Python code for this problem, we will have to present some formal definitions. Adjacent vertices: Two vertices are adjacent when they are both incident to a common edge. Path in an undirected Graph: A path in an undirected graph is a sequence of vertices such that vi is adjacent to for . Such a path is called a path of length from to .
Simple Path: A path with no repeated vertices is called a simple path. Example: (a, c, e) is a simple path in our graph, as well as (a, c, e, b). (a, c, e, b, c, d) is a path but not a simple path, because the node c appears twice. The following method finds a path from a start vertex to an end vertex:
If we save our graph class including the find_path method as "graphs.py", we can check the way of working of our find_path function:
The result of the previous program looks like this:
The method find_all_paths finds all the paths between a start vertex to an end vertex:
We slightly changed our example graph by adding edges from "a" to "f" and from "f" to "d" to test the previously defined method:
The result looks like this:
Degree

The degree of a vertex in a graph is the number of edges connecting it, with loops counted twice. The degree of a vertex is denoted . The maximum degree of a graph , denoted by , and the minimum degree of a graph, denoted by , are the maximum and minimum degree of its vertices. In the graph on the right side, the maximum degree is 5 at vertex and the minimum degree is 0, i.e the isolated vertex . If all the degrees in a graph are the same, the graph is a regular graph. In a regular graph, all degrees are the same, and so we can speak of the degree of the graph. The degree sum formula (Handshaking lemma): This means that the sum of degrees of all the vertices is equal to the number of edges multiplied by 2. We can conclude that the number of vertices with odd degree has to be even. This statement is known as the handshaking lemma. The name "handshaking lemma" stems from a popular mathematical problem: In any group of people the number of people who have shaken hands with an odd number of other people from the group is even. The following method calculates the degree of a vertex:
The following method calculates a list containing the isolated vertices of a graph:
The methods delta and Delta can be used to calculate the minimum and maximum degree of a vertex respectively:
Degree Sequence
The degree sequence of an undirected graph is defined as the sequence of its vertex degrees in a non-increasing order. The following method returns a tuple with the degree sequence of the instance graph:
The degree sequence of our example graph is the following sequence of integers: (5, 2, 1, 1, 1, 0). Isomorphic graphs have the same degree sequence. However, two graphs with the same degree sequence are not necessarily isomorphic. There is the question whether a given degree sequence can be realized by a simple graph. The Erdös-Gallai theorem states that a non-increasing sequence of n numbers (for i = 1, ..., n) is the degree sequence of a simple graph if and only if the sum of the sequence is even and the following condition is fulfilled:

Implementation of the Erdös-Gallai theorem
Our graph class - see further down for a complete listing - contains a method "erdoes_gallai" which decides, if a sequence fulfills the Erdös-Gallai theorem. First, we check, if the sum of the elements of the sequence is odd. If so the function returns False, because the Erdös-Gallai theorem can't be fulfilled anymore. After this we check with the static method is_degree_sequence whether the input sequence is a degree sequence, i.e. that the elements of the sequence are non-increasing. This is kind of superfluous, as the input is supposed to be a degree sequence, so you may drop this check for efficiency. Now, we check in the body of the second if statement, if the inequation of the theorem is fulfilled:
Version without the superfluous degree sequence test:
Graph Density
The graph density is defined as the ratio of the number of edges of a given graph, and the total number of edges, the graph could have. In other words: It measures how close a given graph is to a complete graph.
The maximal density is 1, if a graph is complete. This is clear, because the maximum number of edges in a graph depends on the vertices and can be calculated as:
.
On the other hand the minimal density is 0, if the graph has no edges, i.e. it is an isolated graph.
For undirected simple graphs, the graph density is defined as:
A dense graph is a graph in which the number of edges is close to the maximal number of edges. A graph with only a few edges, is called a sparse graph. The definition for those two terms is not very sharp, i.e. there is no least upper bound (supremum) for a sparse density and no greatest lower bound (infimum) for defining a dense graph.
The precisest mathematical notation uses the big O notation:
Sparse Graph: Dense Graph:
A dense graph is a graph in which .
"density" is a method of our class to calculate the density of a graph:
We can test this method with the following script.
A complete graph has a density of 1 and isolated graph has a density of 0, as we can see from the results of the previous test script:
Connected Graphs

A graph is said to be connected if every pair of vertices in the graph is connected. The example graph on the right side is a connected graph. It possible to determine with a simple algorithm whether a graph is connected:
Choose an arbitrary node x of the graph G as the starting point
Determine the set A of all the nodes which can be reached from x.
If A is equal to the set of nodes of G, the graph is connected; otherwise it is disconnected.
We implement a method is_connected to check if a graph is a connected graph. We don't put emphasis on efficiency but on readability.
If you add this method to our graph class, we can test it with the following script. Assuming that you save the graph class as graph2.py:
A connected component is a maximal connected subgraph of G. Each vertex belongs to exactly one connected component, as does each edge.
Distance and Diameter of a Graph

The distance "dist" between two vertices in a graph is the length of the shortest path between these vertices. No backtracks, detours, or loops are allowed for the calculation of a distance. In our example graph on the right, the distance between the vertex a and the vertex f is 3, i.e. dist(a,f) = 3, because the shortest way is via the vertices c and e (or c and b alternatively). The eccentricity of a vertex s of a graph g is the maximal distance to every other vertex of the graph: e(s) = max( { dist(s,v) | v ∈ V }) (V is the set of all vertices of g) The diameter d of a graph is defined as the maximum eccentricity of any vertex in the graph. This means that the diameter is the length of the shortest path between the most distanced nodes. To determine the diameter of a graph, first find the shortest path between each pair of vertices. The greatest length of any of these paths is the diameter of the graph. We can directly see in our example graph that the diameter is 3, because the minimal length between a and f is 3 and there is no other pair of vertices with a longer path. The following method implements an algorithm to calculate the diameter.
We can calculate the diameter of our example graph with the following script, assuming again, that our complete graph class is saved as graph2.py:
It will print out the value 3.
The Complete Python Graph Class
In the following Python code, you find the complete Python Class Module with all the discussed methodes: graph2.py
Tree / Forest
A tree is an undirected graph which contains no cycles. This means that any two vertices of the graph are connected by exactly one simple path.
A forest is a disjoint union of trees. Contrary to forests in nature, a forest in graph theory can consist of a single tree!
A graph with one vertex and no edge is a tree (and a forest).
An example of a tree:
While the previous example depicts a graph which is a tree and forest, the following picture shows a graph which consists of two trees, i.e. the graph is a forest but not a tree:
Overview of forests:
With one vertex:
Forest graphs with two vertices:
Forest graphs with three vertices:
Spanning Tree
A spanning tree T of a connected, undirected graph G is a subgraph G' of G, which is a tree, and G' contains all the vertices and a subset of the edges of G. G' contains all the edges of G, if G is a tree graph. Informally, a spanning tree of G is a selection of edges of G that form a tree spanning every vertex. That is, every vertex lies in the tree, but no cycles (or loops) are contained.
Example:
A fully connected graph:
Two spanning trees from the previous fully connected graph:
Hamiltonian Game
An Hamiltonian path is a path in an undirected or directed graph that visits each vertex exactly once. A Hamiltonian cycle (or circuit) is a Hamiltonian path that is a cycle. Note for computer scientists: Generally, it is not not possible to determine, whether such paths or cycles exist in arbitrary graphs, because the Hamiltonian path problem has been proven to be NP-complete. It is named after William Rowan Hamilton who invented the so-called "icosian game", or Hamilton's puzzle, which involves finding a Hamiltonian cycle in the edge graph of the dodecahedron. Hamilton solved this problem using the icosian calculus, an algebraic structure based on roots of unity with many similarities to the quaternions, which he also invented.

Complete Listing of the Graph Class
We can test this Graph class with the following program:
If we start this program, we get the following output:
Footnotes: 1 The graphs studied in graph theory (and in this chapter of our Python tutorial) should not be confused with the graphs of functions 2 A singleton is a set that contains exactly one element. Credits: Narayana Chikkam, nchikkam(at)gmail(dot)com, pointed out an index error in the "erdoes_gallai" method. Thank you very much, Narayana!
Last updated