Search in Graph Theory
Generally speaking, search is one of the most fundamental techniques for designing algorithms. It means to visit some interconnected objects, in a specific manner that will hopefully solve the problem, mostly in a depth-first manner and breadth-first manner, until we find the object or state that is the answer to the problem.
For example, in the problem of finding a solution to a Sudoku puzzle, one may apply the depth-first search by trying to fill in possible digits from 1 to 9 into vacant slots, and backtrack when filling in such a way does not result in a feasible solution. In the problem of finding the shortest path in a 2-dimensional maze, most commonly described by a 2-dimensional array of 0s and 1s, one might apply the breadth-first search instead, trying one move at a time and ensuring the result path found is the shortest.
The term "search in graph theory" means to search a graph in its algorithmic representation as input. However, given that a "search in general" is to visit interconnected objects, if the search algorithm is guaranteed to halt, we are able to create a finite set of objects as the vertex set, with their interconnections as the edge set, and build a graph (usually called the implicit graph of the problem, while the one in its algorithmic representation is said to be explicit) such that searching in it yields the answer equivalently, although we won't do so due to efficiency and space complexity issues.
In fact, our point-of-interest in the search in graph theory differs quite vastly from the one in general. When referring to search in general, we mostly refer to searching for a specific object or state; when referring to search in graph theory, we mostly won't care about a specific vertex, but structural information reflected during the search. You will know what it means by after reading this text, I think.
Search Forests
The principle of search, given that it's a fundamental technique, is quite simple. When it comes to graph theory, there's not much difference. What we care most about are still breadth-first search (BFS) and depth-first search (DFS), mostly in the following forms:
The input is a graph
- Initialize empty queue
, and empty set to track enqueued vertices. - Loop while
: - Pick up any
, enqueue it into , and update . - Loop while
is not empty: - Dequeue
from , and visit it. - Iterate For every
: - If
, enqueue into and update .
- If
- Dequeue
- Pick up any
For convenience, we say each
iteration at step 2 to be
one pass of BFS at
For graph
That is, we define
The input is a graph
- Initialize empty stack
, and empty set to track visited vertices. - Loop while
: - Pick up any
, Push onto . - Loop while
is not empty: - Pop
from , visit and update if . - Get
, and if (let ): - Push
onto . - If
, push onto .
- Push
- Pop
- Pick up any
For convenience, we say each
iteration at step 2 to be
one pass of DFS at
And it's quite easy to see their time and space complexity as:
Proof. It won't be hard to see
that every vertex is visited exactly
once, and every edge incident to
the vertex is checked exactly once
in a digraph or at most twice in a
graph. To book-keep the set of
enqueued or visited vertices, a
bitset of
In BFS, every vertex will be enqueued
only once; in DFS, every stack item
Due to the manner of visiting all vertices and edges in a graph or a digraph, a search in graph theory is sometimes called a graph traversal.
In practise, every search-based
graph theoretic algorithm can be
seen as a variant of the frameworks
of BFS and DFS above. They might
prematurely halt when they've found
their target, but given that we are
hopefully able to arrange the input
graph deliberatedly, so that the
target to find appear as the last
vertex to visit or edge to check,
we must expect every search-based
algorithm to traverse the whole
graph in general. So
The principle of search in graph theory is just simple as the search in general, with nothing special. But if you think it's everything about search in graph theory, then you will miss the whole forest. And this time, it's indeed a forest called the search forest.
Definition
As an example for demonstration, let a digraph with its adjacency list be shown as below:
Consider performing BFS and DFS on this digraph. We scan the adjacency list from the top to bottom for unvisited vertices. For each vertex, we visit its incident edges by iterating the linked list chained into its corresponding slot in the adjacency list.
We want to represent how the vertices are being visited during BFS or DFS in some way. One way is to arrange them in a search forest, which is defined as:
For a BFS or DFS on a specific algorithmic representation of graph or digraph, a search forest is a way to arrange vertices by their order of being visited in the search. It's a collection of search trees, in which every tree fulfils:
- Every vertex
selected as the vertex to start a pass of BFS or DFS, is the root of a search tree; - If edge
causes to be enqueued into the BFS queue, or causes to be pushed onto the DFS stack, then there's is a children of , with edge connecting them; - If
are children of through edges and , and appears earlier than in (thus is dequeued from BFS queue or popped from DFS stack earlier than ), then the subtree of is on the left of the subtree of .
Thus, every search tree is a rooted tree whose order of children matters. And in addition to the order of children, the order of the search trees in a search forest also matters, the search tree visited earlier should appear on the left of the one visited later.
For convenience, we usually add a modifier to the search forest to indicate what kind of search is it for, e.g. a depth-first search forest / DFS forest, a breadth-first search forest / BFS forest.
According to this definition, the BFS forest and DFS forest of the digraph we present can be shown as below:
The claim, that the BFS forest and DFS forest are on a specific algorithmic representation of graph, means just a minor change in the algorithmic representation can lead to another search forest of completely different structure.
For example, consider permutating the edges in the adjacency list of the digraph we've just visited above. For readability, the differences have been marked red:
The permutated adjacency list shall represent the same digraph semantically. However, when we run BFS and DFS on the digraph of permutated adjacency list, we will see:
The difference in BFS forests is relatively minor, just a permutation of subtrees can be seen. However, in the DFS forests, even the parenthood of vertices has been changed, which is a major difference, since the underlying digraph has been in a completely different isomorphism class.
Semantically, if a search-based graph theoretic algorithm would like to be correct, it must be able to handle whatever algorithmic representation of the input graph. That is, every possible visiting order of vertices and edges must be considered, which is equivalent to ensuring that the algorithm works on every possible search forest of the input graph.
In practice, many search-based algorithms are designed to extract structural information from a graph, which is preserved by graph isomorphisms, e.g. cut vertices / edges of a graph, SCCs of a digraph (actually we will introduce them later in this text). And if the algorithm is designed for this purpose, semantically it must work on every graph in the same isomorphism class, that is, it must be able to handle every search forest of every graph in the isomorphism class.
To see how hard it is, let's
consider a digraph isomorphic to
the previous digraph we
demonstrated, by
And the BFS forest and DFS forest of the isomorphic digraph with respect to the adjacency list above is shown as below:
As we can see, the search forest has nearly nothing in common with the previous ones. Thus, designing a correct search-based algorithm to work on a whole isomorphic class will be an insurmountable task, without regaining some trackability of the search forests.
Categorization of the Edges
One of the most important purposes for defining the search forests is to categorize the edges in graph and digraph:
Given a search forest of the graph
A path made up purely of tree edges is said to be a tree path, it's to be found in the search tree.
When drawing a search forest, we usually draw the non-tree edges in addition to the tree edges, so that we can study the categorization of edges with respect to this search forest. If we are to do so, we will draw those tree edges with solid lines and non-tree edges with dashed lines.
Noted that if we draw in such
a way, all vertices in
- Back edge: if
is the ancestor of , or . - Forward edge: if
is a descendant of . - Cross edge: if none of the cases above is applicable.
According to the definition, we may draw a digraph with respect to one of its BFS forest and DFS forest in the following way:
(I've hidden the labels of the tree edges in case of their being too messy.)
In the BFS forest, we can see
But what can we do with categorizing all these edges? Well, one of the simplest usages is to check whether a graph or digraph is acyclic. First, it's easy to see:
One may start to use this condition as the principle to design a cycle detection algorithm. However, the condition is sufficient, not necessary. And if we are not able to prove its necessity, there is likely a search tree without any back edge, of a graph with a cycle in it, and the algorithm based on such a condition may not be correct.
In fact, consider a BFS
forest
However, we will see later in this text, the condition is necessary in a DFS forest, so a cycle detection based on DFS may work.
Search Colorings and Timestamps
To study the properties of the search forest, given that it's generated from a specific BFS or DFS, we must stick to the behaviours of BFS and DFS themselves.
First, one can easily find a vertex should be one of these three states during a BFS or DFS:
- Unvisited: the vertex has not been dequeued from the BFS queue, or popped from the DFS stack.
- Visiting: the vertex has been dequeued from the BFS queue, or popped from the DFS stack, and its incident edges are being iterated and checked.
- Visited: the incident edges to this vertex has been iterated and checked.
And soon we will see, that keeping track of these states will be useful for studying behaviours of BFS and DFS. So we tweak our framework of BFS and DFS with search colorings and timestamps (SCT) to fulfill this purpose:
A search coloring
Clearly, the vertex state is
transient during the search,
and a monolithic search
coloring is insufficient to
describe dynamic behaviour.
So we introduce the
timestamp, which is
a counter starting at
In order to apply the search
colorings, first we need to
initialize the initial
search coloring
- Initialize empty queue
, and empty set to track enqueued vertices. Initialize the search coloring and timestamp . - Loop while
: - Pick up any
, enqueue it into , and update . - Loop while
is not empty: - Dequeue
from , and visit it. - Update the search coloring of
to . - Iterate For every
: - If
, enqueue into and update .
- If
- Update the search coloring of
to .
- Dequeue
- Pick up any
- Initialize empty stack
, and empty set to track visited vertices. Initialize the search coloring and timestamp . - Loop while
: - Pick up any
, Push onto . - Loop while
is not empty: - Pop
from , visit and update , as well as the search coloring of to , if . - Get
. - If
(let ): - Push
onto . - If
, push onto .
- Push
- If
, then update the search coloring of to .
- Pop
- Pick up any
So by integerating SCT into BFS and DFS, we are finally able to demonstrate their dynamic behaviours:
It won't be hard to find:
For every vertex
Conventionally, we shall do:
For convenience, if at timestamp
At timestamp
At timestamp
So we may also rephrase the lemmas above as:
Generally speaking, the "continuity"
and "monotonicity" of search
colorings in timeline can be utilized
to model constraints in search trees.
For example, if the vertex
Search Trees and Timestamps
For any search tree, regardless of BFS or DFS, it's easy to see:
For every search tree
However, it suffices to check whether
Proof. If
For the comment that it
suffices to check
And consider a gray vertex, whose edges are being checked, it's easy to find:
Proof. At timestamp
Assume it's at timestamp
These lemmas will lead to:
Proof. The necessity part
is clear: any
For the sufficiency part, we
assume the contrary,
that at timestamp
By applying this argument to
This shall conclude the proof
of the whole component in graph
Shortest Tree Paths in BFS Trees
Intuitively (for programmers and CS students) speaking, one of the most important motivations to apply BFS is, to find a solution with the fewest moves. In the case of search in graph theory, "the fewest moves" means the shortest path to reach a vertex, which is relative to the root of each search tree. In other word, we would like to show:
However, since intuition fools human eyes (and minds) indefinitely, it won't be useful unless we elevate the intuition to mathematical proofs.
To study the mathematical properties of a BFS forest, we tweak the BFS a little bit more, by integrating a level counter relative to each vertex:
- Initialize empty queue
, and empty set to track enqueued vertices. Initialize the search coloring and timestamp . Initialize empty level counter . - Loop while
: - Pick up any
, enqueue into , and update
. - Loop while
is not empty: - Dequeue
from , and visit it. - Update the search coloring of
to . - Iterate For every
: - If
, enqueue into and update
.
- If
- Update the search coloring of
to .
- Dequeue
- Pick up any
Again, this modification does not
change the general logic of a BFS,
except for pluging-in a level
counter
And it's very obvious that:
And if we are to prove
For any moment of the BFS queue
Proof. An intuitive
interpretation will help
understanding what we are to prove:
Clearly we must reason on
every pass of the BFS at
And noted that
Which holds for the new queue
Proof. Consider the very
pass of BFS constructing
Consider another list
Clearly
The full and overlapping
coverage of
Just a tiny remark, given that BFS in general can be viewed as BFS on the implicit tree of the problem, one might clone the procedure of the characterization of BFS queue above to prove the correctness of BFS under that problem.
What the theorem and corollary above imply are even clearer when looking into our example of BFS with SCT again, where we can clearly see a BFS can be fundamentally seen as level traversal in each search tree of its search forest, when expressed in the language of tree traversals.
And with the characterization of the BFS queue, we are finally able to show:
Proof. Assume the contrary,
that given a
Without loss of generality, let
Otherwise given that
At the timestamp of
By now, we have elevated our intuition about BFS to find the shortest path to relatively rigid mathematical proofs.
The shortest tree path property in BFS tree also forbids the existence of certain kind of edges in the search tree:
Proof.
Let
Due to the existence of the edge
Search Subtrees and Timestamps
So in the case of search trees, things have gone quite fluently, by simply setting up checkpoints on the discover and finish timestamp, we've established the white-path theorem for search trees, to determine which search tree a given vertex will be in.
We want to move a step further
now: for any given vertex
Well, the answer turns out to
be negative for the case of
BFS subtrees. Consider the
BFS tree
So what's the chasm between ensuring a vertex reachable by white-path to lie in a search tree and in a subtree? For me, I think in order to ensure the vertex to lie in the search tree, it suffices to ensure that the vertex is enqueued into the BFS queue or pushed onto the DFS stack, and the rest can be left to the algorithms' nature of emptizing the BFS queue or DFS stack per iteration with respect to search trees. However ensuring the vertex to lie in a specific subtree, a tree path must be secured by the white-path.
In fact, while the white-path fails to secure a tree path in a BFS forest, it succeeds in a DFS forest. While the counterexample above suffices to show its failure, we must come up with a mathematical proof to secure its success.
First, let's observe
the stack
For convenience, we omit the
second component
: is popped, edge is observed, and there's in this case. So both and are pushed onto the stack. Vertex is marked immediately, due to the item to be popped at step 2.2.1 in the next iteration. So we can associate the event with this transition. : is popped, is observed, and there's in this case. So is pushed onto the stack, waiting for next edge of to be checked in the next iteration. : is popped, edge is observed, and is marked at step 2.2.4 in the end of this iteration, so we can associate the event with this transition.
Let's observe the first search tree of
the DFS with SCT example
during the search, omitting
stack transitions of the case 2
(
The observation leads to the characterization of the DFS stack:
In a DFS forest, for any
subtree
Where
For any vertex
While it suffices to check whether
Proof. The proof is based on the observation of the stack during DFS above.
Due to the case 1
(
Assume for vertex
In this way, those pushed
onto the stack after
It suffices to check
One will soon notice how similar
it is to the
lemma
we use to determine whether
Proof. For necessity,
every
For sufficiency, we use a
similar argument to the
white-path theorem for search
tree. Assume the contrary,
that in the
And thus
Noted that there's no gray
vertex
Again, let's look into our example of DFS with SCT, we can see a DFS can be fundamentally seen as in-order traversal of each tree in its search forest, when expressed in the language of tree traversals. This is comparable to the BFS, which may be seen as level traversal of the search trees instead.
Categorizing Edges by Search Colorings
The DFS is discussed much more frequently than the BFS in search algorithms. One major factor might be the absent of white-path theorem in BFS, and another factor is the difficulty to categorize edges in BFS forest.
In a BFS search, it won't be
hard to find
On the other hand, in a DFS
forest, when inspecting edge
is white: is tree edge, since will be visited in the next iteration. is gray: has been on the stack, and must be on the stack top, which is on top of , therefore must be a back edge. is black: has already been visited. If is in the subtree rooted at , then is a forward edge; otherwise is a cross edge.
The situation is slightly
different in the DFS forest
of a graph, when
Proof. For an edge
For an edge
In this way, in a DFS forest of a graph, we have:
is white: is a tree edge. is gray: is a back edge. is black: is a back edge. Since when is descendant of , it will render as a back edge now; and the case of being a cross edge is impossible.
However, when case 3 happens,
it must be the second time that
The DFS also guarantees a back edge to be found in a DFS forest of a graph or digraph with a cycle:
The strategy we applied to digraph
is not directly applicable to
a graph. In a digraph,
Proof. The case is
trivial when the cycle is a loop.
Assume cycle
When
When
In this way, the cycle detection algorithm based on DFS will work, since we've guaranteed the condition it use to detect cycle is both sufficient and necessary.
The Topological Sort Problem
Let's look into some real world application of the search in graph theory.
A topological sort
- The order
is a linear order, which means . - The relation
is compatible with , that (or ).
In short, to find a topological
sort means to fill in the gap so
that uncomparable pairs in
In graph theory, we know
connection relation
Meantime, given that there're
finite elements in
There're many ways to find a topological sort in a DAG, we are going to introduce two of them: one is intuitive, and the other is based on DFS.
Reference Counting
One of the most intuitive may be to apply the reference counting:
- Initialize zero reference counter
, such that . - Iterate for every
, do update . - Initialize the empty list
, and the empty queue . - Iterate for
: - if
, enqueue into .
- if
- Loop until queue
is empty: - Dequeue
from , and append to the tail of . - Iterate for every
: - Do update
. - Enqueue
into if .
- Do update
- Dequeue
- Halt with
as the topological sort.
Proof. For convenience,
let
The reference counter
Every vertex in
However, if
Proof. To initialize
the in-degrees in
For space complexity, the
DFS-Based
The topological sort problem can also be solved by performing a DFS on the DAG:
- Initialize the empty list
. - Perform the framework of DFS,
with some tweaks:
- When finish visiting
, append to .
- When finish visiting
- Reverse the ordering of the list
. - Halt with
as the topological sort.
Simple as it is, this one is much more mind-blowing to think how it works.
Intuitively speaking, one can
quickly find the list
First, let's see how an edge in DAG affect the finish timestamps of its endpoints:
Proof. In the search forest of a DAG, any edge can only be a tree edge, a forward edge or a cross edge. Since being a back edge will sufficiently render the digraph to have a cycle, which is a contradiction.
When
When
In whatever case, we have
Why should we care about the
finish timestamps? Well, noted
that in the post-order traversal,
we append
Proof. By the previous
lemma, if
For any
Proof. The DFS requires
time complexity
The DFS requires space complexity
So we can see the DFS-based algorithm is not much harder than the reference counting one. And in practice, the DFS-based one is actually simpler to implement, as it requires just a DFS plus reversing a list.
Kosaraju's SCC Algorithm
The idea of the topological sort algorithm by DFS can be furtherly extended to finding SCCs in digraphs.
First, consider two SCCs
And after performing a DFS
on digraph
Proof. If
If
In whatever case, we are guaranteed
to have
One will soon realize how similar this lemma is to the previous lemma that is crucial to the ordering of vertices in the topological sort algorithm by DFS.
In this way, after our
performing a DFS on digraph
Noted that the list
Actually, it's quite intuitive
if one remember that
In this way, if we iterate
the representatives by the
order of topological sort
of the SCCs, and for each
representative we mark the
reachable vertices through
a DFS on
And this gives birth to the Kosaraju's SCC algorithm:
- Initialize an empty stack
. - Perform the framework of DFS
on
, with some tweaks: - When finish visiting
, push onto stack .
- When finish visiting
- Initialize the map
, as , and evaluate the transposition of input . - Initialize an empty set
for keeping track of visited vertices in DFS. - Loop until stack
is empty: - Pop
from stack . - If
: - Perform one pass of DFS on
at vertex , with the visited vertices stored in and some tweaks: - For each visited
, update .
- For each visited
- Perform one pass of DFS on
- Pop
- Halt with
as the result.
Here, we use a stack
Proof. For the time complexity,
pushing all vertices in
For the space complexity, the stack
and mapping take
In this way, assigning vertices to their SCCs in digraph is a problem can be accomplished by a DFS-based algorithm, within linear time complexity proportional to the magnitude of vertices and edges.
Tarjan's Cut Vertex / Edge Algorithms
The DFS can also be used to find cut vertices and cut edges in a graph. Without loss of generality, let the graph be connected, or we can study subgraph induced by every connected components. Clearly all vertices will be in the same search tree.
Let's consider the case of the cut vertex first:
- Vertex
is the root of a search tree, and contains or more children. - Vertex
is not the root of a search tree, and there's a direct children of , such that from the subtree rooted at , there's no back edge to the ancestor of (excluding ).
Proof. It won't be hard to
see both conditions are sufficient:
removing
Assume
First, if
Assume the contrary that
Now we conclude the proof of the necessity of the two conditions.
Then we will also consider the case of cut edges:
Proof. Again, it's easy to see the condition is sufficient, and we just need to show it's also necessary.
First,
As we can see, no matter
we would like to detect
cut vertex or cut edge
in a graph by DFS, we must
be capable of detecting
whether there's a back
edge from a subtree
rooted at specific
vertex
Luckily, the computer scientist Robert Tarjan showed us, that it suffices to introduce a properly maintained timestamp:
In a DFS forest of a graph,
the lowest timestamp
- When entering the
subtree rooted at
, we inialize . - For each edge
visited, we do: - When
is a back edge, update . - When
is a tree edge, after returning from the subtree rooted at , update .
- When
- When exiting the
subtree rooted at
, by checking whether there's , we know whether there's a back edge from inside the subtree rooted at , to an ancestor of (excluding ).
Noted that at step 2.1,
we need to fetch the
discover timestamp
Proof. The neccesity
is obvious. If there's a
back edge from inside the
subtree rooted at
For the sufficiency, when
So we can integrate the lowest timestamp into the framework of DFS, and implement the algorithms for finding cut vertices and cut edges:
- Initialize empty set
, and empty map . - Perform the framework
of DFS with SCT on
, with some tweaks: - For each pass of
DFS at
, if has or more direct children, then update . - When entering the
subtree rooted at
, update . - When checking a
back edge
, update
. - When returning from
a subtree rooted at
direct children
of (through a tree edge): - Update
. - If
is not the root of a search tree and (back edge to is allowed), then update .
- Update
- For each pass of
DFS at
- Halt with
being the set of cut vertices in .
- Initialize empty set
, and empty map . - Perform the framework
of DFS with SCT on
, with some tweaks: - When entering the
subtree rooted at
, update . - When checking a
back edge
, update
. - When returning from
a subtree rooted at
direct children
of , through the tree edge : - Update
. - If
(mind the vertex being checked), then update .
- Update
- When entering the
subtree rooted at
- Halt with
being the set of cut edges in .
When performing DFS of a graph,
one had better be super careful
about the fact, that an edge
Failing to handle such a
situation will lead to a bug:
after entering
Empirically speaking, when implementing a DFS algorithm to search a graph, I would recommend one to keep track of additional information, of through which tree edge we enter this vertex, or which vertex is the parent (in the case of no multiple edge), as an extra argument or a local state. Then when checking edges, we may use it to ignore the tree edge, or the edge to the parent.
In this way, the problems
of finding cut vertex and
cut edge can be solved
efficiently by DFS-based
search algorithms, within
the time complexity of
Tarjan's SCC Algorithm
The idea of using the lowest timestamp in DFS can also be carried out to the problem of finding SCCs in a digraph:
- By the white-path theorem
for DFS, we know every
vertex of an SCC
lies in the subtree rooted at , for . - Althought the vertex of
SCC
is also in the subtree rooted at if , but given that we just need to know if there's no edge from inside to when exiting , in order to throttle from . - Meantime, there's a lemma
we've derived while studying
Kosaraju's algorithm, that
for two distinct SCCs
, we know . So after picking all subtrees of other SCCs than , off the subtree of , we will be eventually able to identify vertices that are indeed in , thus we are able to identify all subtrees of SCCs in a digraph.
We will show how this can be accomplished in this section.
In a DFS forest of a graph, the edge going out of a subtree can only be a back edge; however in a DFS forest of a digraph, the edge going out of a subtree can be a back edge or a cross edge. So we will need to handle the cross edges properly if the algorithm would like to be true. And indeed, handling the cross edges is the hardest part:
Consider the following search forest of a digraph:
As we can see, there're
3 SCCs
So the problem becomes, for every cross edge we encounter, based on what standard can we judge it to be crucial or not?
There might be many answer
to this question, but all
of them are not
computationally helpful.
For example, a naive
(but sound) approach is to
do one pass of DFS at the
target endpoint of the cross
edge, and see the deepest
element on the stack / the
highest ancestor (closest
to the search tree root)
of the source endpoint that
it will hit. However, such
a naive approach is prone
to deliberatedly
constructed DFS forest
maximizing the number of
edges that it must visit (e.g.
Meantime, from the example
above we can see, a candidate
standard is not even worth
considering if it can't
gather sufficient information
within time complexity
In fact, the sufficiently workable standard in Tarjan's SCC algorithm are organically combined with other part of the algorithm, that is, like pieces of zigsaw puzzles nippled with each other, so I find it hard to come up with a easily inspectable standard as the one in the cut vertex / edge problem. So I decide to take out these pieces of zigsaw puzzles one by one, to let you see how they are combined into a workable version:
- When entering the subtree
rooted at
, we initialize . - For each edge
visited, we do: - When
is a back edge, update . - When
is a cross edge such that has not been assigned to an SCC yet, update . - When
is a tree edge, after returning from the subtree rooted at , update .
- When
- When exiting the
subtree rooted at
, by checking whether there's , we know whether there's an edge from inside the subtree rooted at , to a vertex such that but has not been assigned to an SCC yet.
Please notice when we inspect
a back edge to
Now we obtain the first piece
of zigsaw puzzles, which
provides a bizzare information, the
Here comes the second piece of the zigsaw puzzles:
- The algorithm is based on DFS.
- When exiting a subtree rooted
at
, if there's , we assign all vertices that has not been assigned to an SCC, under the subtree rooted at , to an SCC with as the representative.
Proof. Assume the contrary,
that there's an vertex
Given that there're finite
vertices in a graph, we may
recursively trace such
subtrees rooted at
After unwinding the logic above, we will see every vertex must be assigned to an SCC before exiting the search tree it is in.
The second piece is just what we are demanding in the beginning of this section, adding how we will use the lowest timestamp to it.
Here comes the third piece of the zigsaw puzzles:
Proof. Clearly
Clearly the last edge of the path
And clearly
If
If
, by the property of a back edge or a cross edge in a DFS forest. after is visited, and by the way we book-keep the lowest timestamp (noted that is not assigned to an SCC before exiting subtree rooted at , by presumption). , after is propagated to , again, by the way we book-keep the lowest timestamp. , by is a descendant of .
And by putting all the pieces together, we will eventually see:
So
By constructing the path
The third piece shows what we will see when we run the algorithm over an SCC, which serves as the neccesary condition and gives us some intuition.
And finally, here comes the fourth and the last piece of the zigsaw puzzles:
- Initialize empty walk
and set current vertex . - Loop until
: - Let
be the vertex that assigns to upon exiting subtree rooted at , for edge . - Append
to that , and update .
- Let
- Halt with
as the backtracking path of .
Proof. We map the elements
in
Consider every iteration of
step 2, there must be
Thus consider the vertices
in the
The idea of the backtracking algorithm is simple: to backtrack the vertex propagating its lowest or discover timestamp to the current vertex iteratively, whenever its possible. And when the terminal of the backtracking is hit, we will find:
Proof. Assume the contrary
that
Noted that
In this way, the backtracking
path
When exiting the subtree
rooted at
Proof. Run the backtracking
path algorithm on
If
If
- When
is a tree edge, is a descendant of , and thus a descendant of . - When
is a back edge or a cross edge, . Since will be propagated to , we can see .
Thus, and is a descendant of .
Therefore, no matter which
case, we always have
So the only possibility is
Now the time for solving this zigsaw puzzle comes. By putting all the pieces above together, we eventually have:
Proof. In a DFS forest
of a graph, Consider any SCC
If
If
In this way, the correctness of the algorithm is guaranteed whenever it fulfills the SCC algorithm's presumption. Now we just need to implement it.
The SCC algorithm's presumption
says, "assign all vertices
that has not been assigned to
the SCC represented by
A wiser way is to introduce
a stack
The stack
In the end, we can assemble all of these ideas above into the Tarjan's SCC algorithm:
- Initialize an empty stack
, the map , as , and empty map . - Perform the framework of DFS with
SCT on
, with some tweaks: - When entering a vertex
, push onto , update
. - When checking a back edge
, update
. - When checking a cross edge
such that , update . - When returning from a subtree
rooted at direct children
of (through a tree edge), update . - When exiting a subtree rooted
at
such that : - Pop
from , assign , until (it's a do-while structure: assign first, and then check).
- Pop
- When entering a vertex
- Halt with
as the result.
Proof. For the time complexity,
performing a DFS takes
For the space complexity, the stack,
mapping and timestamp take
Interestingly, both Kosaraju's and Tarjan's SCC algorithms use a stack and run in linear time complexity, proportional to the magnitude of vertices and edges.
However, Tarjan's SCC algorithm
eliminates the need of evaluating
the transposition of the input
digraph. Meantime, in Kosaraju's
SCC algorithm, in order to assign
vertices to their own SCCs, it's
required to do a pass of DFS at
the representative in the
transposition
In this way, the Tarjan's SCC algorithm is arguably faster and more space efficient than the Kosaraju's SCC algorithm. Although the principle of the Tarjan's SCC algorithm is far much harder to elaborate, as is seen from above. In my opinion, both of them are worth studying, to see how different minds may clash, in solving the SCC problem.