The breadth-first search algorithm
Last updated
Last updated
Breadth-first search assigns two values to each vertex :
A distance, giving the minimum number of edges in any path from the source vertex to vertex .
The predecessor vertex of along some shortest path from the source vertex. The source vertex's predecessor is some special value, such as null
, indicating that it has no predecessor.
If there is no path from the source vertex to vertex , then 's distance is infinite and its predecessor has the same special value as the source's predecessor.
For example, here's an undirected graph with eight vertices, numbered 0 to 7, with vertex numbers appearing above or below the vertices. Inside each vertex are two numbers: its distance from the source, which is vertex 3, followed by its predecessor on a shortest path from vertex 3. A dash indicates null
:
In BFS, we initially set the distance and predecessor of each vertex to the special value (null
). We start the search at the source and assign it a distance of 0. Then we visit all the neighbors of the source and give each neighbor a distance of 1 and set its predecessor to be the source. Then we visit all the neighbors of the vertices whose distance is 1 and that have not been visited before, and we give each of these vertices a distance of 2 and set its predecessor to be vertex from which we visited it. We keep going until all vertices reachable from the source vertex have been visited, always visiting all vertices at distance from the source before visiting any vertex at distance .
Given the example above, here are the steps plus a visualization to play through each step:
Start by visiting vertex 3, the source, setting its distance to 0.
Then visit vertices 2 and 6, setting their distance to 1 and their predecessor to vertex 3.
Start visiting from vertices at distance 1 from the source, beginning with vertex 2. From vertex 2, visit vertices 4 and 5, setting their distance to 2 and their predecessor to vertex 2. Don't visit vertex 3, because it has already been visited.
From vertex 6, don't visit vertex 5, because it was just visited from vertex 2, and don't visit vertex 3, either.
Now start visiting from vertices at distance 2 from the source. Start by visiting from vertex 4. Vertex 2 has already been visited. Visit vertex 1, setting its distance to 3 and its predecessor to vertex 4.
From vertex 5, don't visit any of its neighbors, because they have all been visited.
Now start visiting from vertices at distance 3 from the source. The only such vertex is vertex 1. Its neighbors, vertices 4 and 5, have already been visited. But vertex 0 has not. Visit vertex 0, setting its distance to 4 and its predecessor to vertex 1.
Now start visiting from vertices at distance 4 from the source. That's just vertex 0, and its neighbor, vertex 1, has already been visited. We're done!
Notice that because there is no path from vertex 3 to vertex 7, the search never visits vertex 7. Its distance and predecessor remain unchanged from their initial values of null
.
A couple of questions come up. One is how to determine whether a vertex has been visited already. That's easy: a vertex's distance is null
until it has been visited, at which time it gets a numeric value for its distance. Therefore, when we examine the neighbors of a vertex, we visit only the neighbors whose distance is currently null
.
The other question is how to keep track of which vertices have already been visited but have not yet been visited from. We use a queue, which is a data structure that allows us to insert and remove items, where the item removed is always the one that has been in the queue the longest. We call this behavior first in, first out. A queue has three operations:
enqueue(obj)
inserts an object into the queue.
dequeue()
removes from the queue the object that has been in it the longest, returning this object.
isEmpty()
returns true
if the queue currently contains no objects, and false
if the queue contains at least one object.
Whenever we first visit any vertex, we enqueue it. At the start, we enqueue the source vertex because that's always the first vertex we visit. To decide which vertex to visit next, we choose the vertex that has been in the queue the longest and remove it from the queue—in other words, we use the vertex that's returned from dequeue()
. Given our example graph, here's what the queue looks like for each step, plus the previous visualization shown with the queue state:
Initially, the queue contains just vertex 3 with distance 0.
Dequeue vertex 3, and enqueue vertices 2 and 6, both with distance 1. The queue now contains vertex 2 with distance 1 and vertex 6 with distance 1.
Dequeue vertex 2, and enqueue vertices 4 and 5, both with distance 2. The queue now contains vertex 6 with distance 1, vertex 4 with distance 2, and vertex 5 with distance 2.
Dequeue vertex 6, and don't enqueue any vertices. The queue now contains vertex 4 with distance 2 and vertex 5 with distance 2.
Dequeue vertex 4, and enqueue vertex 1 with distance 3. The queue now contains vertex 5 with distance 2 and vertex 1 with distance 3.
Dequeue vertex 5, and don't enqueue any vertices. The queue now contains just vertex 1 with distance 3.
Dequeue vertex 1, and enqueue vertex 0 with distance 4. The queue now contains just vertex 0 with distance 4.
Dequeue vertex 0, and don't enqueue any vertices. The queue is now empty. Because the queue is empty, breadth-first search terminates.
Notice that at each moment, the queue either contains vertices all with the same distance, or it contains vertices with distance followed by vertices with distance . That's how we ensure that we visit all vertices at distance before visiting any vertices at distance .
How long does breadth-first search take for a graph with vertex set and edge set ? The answer is time.
Let's see what time means. Assume for the moment that , which is the case for most graphs, especially those for which we run breadth-first search. Then . Because we ignore constant factors in asymptotic notation, we see that when , really means . If, however, we have , then , and so really means . We can put both cases together by saying that really means . In general, if we have parameters and , then really means .
(Note, by the way, that a graph is connected if there is a path from every vertex to all other vertices. The minimum number of edges that a graph can have and still be connected is . A graph in which is called a free tree.)
How is it that breadth-first search runs in time? It takes time to initialize the distance and predecessor for each vertex ( time, actually). Each vertex is visited at most one time, because only the first time that it is reached is its distance null
, and so each vertex is enqueued at most one time. Since we examine the edges incident on a vertex only when we visit from it, each edge is examined at most twice, once for each of the vertices it's incident on. Thus, breadth-first search spends time visiting vertices.