graph/quickbook/guide/directed_graph.qbk

[/
 / Copyright (c) 2007 Andrew Sutton
 /
 / Distributed under the Boost Software License, Version 1.0. (See accompanying
 / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
 /]

[section Directed Graphs]
Like the previous section, here we take a look at how to solve different
types graph problems using the Boost.Graph library. In this case however,
the problems being addressed are better modeled with directed graphs. A
directed graph (also a /digraph/) is one in which edges can only be traversed
in a specific direction.

In this section we are concerned with /dependency graphs/. A dependency graph
describes a relationship between two entities in which one (/source/) requires
a second (/target/). The source is said to be dependant upon the target.
Dependency graphs arise in many, many real-world situations such as source
code analysis, software installation, "tech-trees" in computer games, etc.
In this example, we will look at a common dependency graph: dependencies
between software documents and build targets.

[h3 File Dependencies]
If you've ever typed `make` and wondered how the program decides the order
tasks to build, then this tutorial is for you. The make software relies on
dependency information explicitly encoded into Makefiles. Binaries (executable
programs and libraries) depend on sets of source files. Source files, which
implement program features, depend on header files, which expose type and
function declarations. These, in turn, might depend on generated source files
generated by `lex`, `yacc` or any other of an assortment of code-generating
tools. These dependencies can be modeled with a directed graph.

For this example, we're actually going to use some of the files that actually
implement the examples described in this user's guide. Their dependencies are
shown in figure 1. Path names have been omitted for readability.

[$images/guide/files.png]

From this graph, it should be relatively easy to see that the build should
start at the bottom and proceed "upwards" with the executable programs being
built last - maybe. There are actually two different approaches to bulding
these programs:

* One at a time. This is probably what you're used to seeing. The compiler
builds file A, followed by file B, and finally file C.
* In parallel. If you're lucky enough to have a multi-processor computer
available to you we might be able to process (compile) a number of different
files at the same time by distributing the tasks to different processors.

The solution to both of these problems is addressed by topologically sorting
the graph. This provies the schedule of files to be processed by ordering
the vertices based on their dependencies.

A third problem we might encounter is that of cycles in the dependency
graph. Cycles are bad since topological sorts will not work in their presence.

[h4 Makefiles and Graphs]
The first step in implementing a make-ordering for source files is acquiring
some data. For this example, we our program will parse files in a stripped down,
Makefile-like format. The input looks something like something like this.

[pre
undirected_graph.hpp :  adjacency_list.hpp
directed_graph.hpp :    adjacency_list.hpp
movies.hpp :            undirected_graph.hpp
movies.cpp :            movies.hpp tokenizer.hpp
kevin_bacon.cpp :       movies.hpp visitors.hpp breadth_first_search.hpp
kevin_bacon.exe :       movies.cpp kevin_bacon.cpp
build_order.cpp :       directed_graph.hpp topological_sort.hpp
build_order.exe :       build_order.cpp
]

Obviously, we're going to have to build a parser for this input format. Just as
before our program starts by defining aliases for commonly used template types.

 struct Target;

 typedef boost::directed_graph<Target> Graph;
 typedef Graph::vertex_descriptor Vertex;
 typedef Graph::edge_descriptor Edge;

In this graph, vertex properties are encapsulated in a `Target` structure. For
this application, a target is any named file that might appear in the dependency
graph. Unlike the previous example, we really don't have any need for edge propreties,
so we can simply omit that template parameter. The `Target` is defined as:

 struct Target
 {
   int index;
   std::string name;
 };

[note
If you think you're seeing some similarities between the previous example, and this
one... just wait. There are a number of common properties and tasks in many graph
related problems such as indexing vertices, providing name labels, etc. Pay special
attention to the method of adding vertices to the graph - the mapping of a unique
name to a vertex is nearly ubiquitous in the setup of graph problems.
]

Likewise, we'll go ahead and predefine a property map that will be used later.
We also need a mapping of target name to vertex so we don't duplicate vertices
and have a convenient lookup tool later on.

 typedef boost::property_map<Graph::type, int Target::*>::type TargetIndexMap;
 typedef std::map<std::string, Vertex> TargetMap;

We can now start building a program to parse the input data and build a dependency
graph.

 using namespace std;
 using namespace boost;

 int main()
 {
   typedef char_separator<char> separator;
   typedef tokenizer<separator> tokenizer;

   Graph grapg;
   TargetMap targets;

   for(string line; getline(cin, line); ) {
     // skip comment and blank lines
     if(line[0] == '#' || line.empty()) {
       continue;
     }

     // split the string on the dependency
     size_t index = line.find_first_of(':');
     if(index == string::npos) {
       continue;
     }
     string target = trim_copy(line.substr(0, index));
     string deps = trim_copy(line.substr(index + 1));

     // add the target to the build graph
     Vertex u = add_target(graph, targets, target);

     // tokenize the dependencies
     separator sep(" \t");
     tokenizer tok(deps, sep);
     tokenizer::iterator i = tok.begin(), j = tok.end();
     for( ; i != j; ++i) {
       string dep = *i;

       // add this dependency as a target
       Vertex v = add_target(graph, targets, dep);

       // add the edge
       add_dependency(graph, u, v);
     }
   }

   // ...to be continued...

This is a fairly large chunk of code that implements input parsing and graph construction
with the help of the `add_target()` and `add_dependency()` functions. Essentially, this
snippet creates a vertex (target) for each file named in the input file. A dependency
edge is added between the first target (preceeding the ':' character) and each subsequent
target (white space separated list following the ':'). The `add_target()` and `add_dependency()`
method are implemented as:

 Vertex add_target(Graph& graph, TargetMap& targets, const string& name)
 {
   Vertex v;
   TargetMap::iterator it;
   bool inserted;
   tie(it, inserted) = targets.insert(make_pair(name, Vertex()));
   if(inserted) {
     v = add_vertex(graph);
     it->second = v;

     graph[v].index = num_vertices(graph) - 1;
     graph[v].name = name;
   }
   else {
     v = it->second;
   }
   return v;
 }

You may notice that the `add_target()` function is nearly line-for-line identical to the
`add_actor()` function in the previous example. This is no coincidence - both functions
do exactly the same thing. They associate a vertex with a unique name and assign it an
index that we can use later for various graph algorithms.

 Edge add_dependency(Graph& graph, Vertex u, Vertex v)
 {
   return add_edge(v, u, graph).first;
 }

The `add_dependency()` method is considerably more terse than its undirected counter part,
but essentially does the same thing. There is one very important difference, however:
the direction of the edge is reversed to create a subtly different graph. Although
the method is called to indicate that vertex `u` dependes on vertex `v`, the added edge
actually indicates that vertex `v` satisfies the dependency of vertex `u`. In fact, this
is the reverse of the original graph and is shown in Figure 2.

[$images/guide/reverse.png]

[h4 Obtaining the Make Order]
We are now ready to compute the make order by running a topological sort. Thanks to a
canned implementation, this is trivial.

 int main()
 {
   // ...continued from above...

   TargetIndexMap indices = get(&Target::index, graph);

   typedef list<Vertex> MakeOrder;
   MakeOrder order;
   topological_sort(graph, front_inserter(order), vertex_index_map(indices));

   BuildOrder::iterator i = order.begin(), j = order.end();
   for( ; i != j; ++i) {
     cout << graph[*i] << "\n";
   }
 }

The `topological_sort()` algorithm takes an output iterator as the second parameter.
Here, we use a standard front insertion iterator to prepend each target to the make
order. The `vertex_index_map()` named parameter is also required for the implementation.
After computation, we simply print the ordering to standard output.

[h4 Parallel Compilation]
What if we have multiple processors available? Surely there is a way to determine if
we can compile several independent files simultaneously, thereby reducing the overall
build time. In fact, there is. Consider rephrasing the question to "what is the earliest
time that a file can be built assuming that an unlimited number of files can be built
at the same time?". In our simplified example, the only criteria for when a file can
be built is that it has no dependencies (i.e., in edges). Further simplifiying the
example, we assume that each file takes the same amount of time to build (1 time unit).

For parallel compilation, we can build all files with zero dependencies in the first
time unit at the same time. For each file, the time at which it can be built is one
more than the maximum build time of the files on which it depends. In this example,
`adjacency_list.hpp` is one of the files that will compile first (in parallel).
The `directed_graph.hpp` file will compile in the second time step, and `build_order.cpp`
in the third.

To implement this, we need a vector that represents the time slots in which each vertex
will be built. By visiting the vertices in topological order, we ensure that we can
assigned the correct time slot to each vertex since values "propogate" down the ordering.
Just for fun, we'll merge the time ordering with the output so we can see a) the order
in which each file is built and b) the time slot it could be built in.

 int main()
 {
   // ...continued from above...
  vector<int> time(num_vertices(graph), 0);
   BuildOrder::iterator i = order.begin(), j = order.end();
   for( ; i != j; ++i) {
     int slot = -1;
     Graph::in_edge_iterator j, k;
     for(tie(j, k) = in_edges(*i, graph); j != k; ++j) {
       Vertex v = source(*j, graph);
       slot = std::max(time[graph[v].index], slot);
     }
     time[g[*i].index] = slot + 1;

     cout << g[*i].name << "\t[" << time[g[*i].index] << "]\n";
   }
 }

This is a code may be a little dense, but demonstrates two important aspects of
the Boost.Graph library. First this demonstrates the importantance of vertex
indices. Despite their instability with mutable graphs, many (most?) graph algorithms
use vertex indices to efficiently associate extra data with a vertex. In fact, this
approach is so ubiquitous in the examples that it leads many to believe the
`vertex_descriptor` is always the index of a vertex.

[warning
A `vertex_descriptor` is *not* its index in the graphs container of vectors!
]

The second important aspect this demonstrates is the construction of an /external
property/ for vertices. Although we don't use the `time` vector in any additional
computations, we could easily turn it into a property map for use with other
algorithms.

The output might something like this:

[pre
$ ./make_order < files
topological_sort.hpp        [0]
breadth_first_search.hpp    [0]
visitors.hpp                [0]
tokenizer.hpp               [0]
adjacency_list.hpp          [0]
directed_graph.hpp          [1]
build_order.cpp             [2]
build_order.exe             [3]
undirected_graph.hpp        [1]
movies.hpp                  [2]
kevin_bacon.cpp             [3]
movies.cpp                  [3]
kevin_bacon.exe             [4]
]

Although it probably won't since I doctored the tabbing for display purposes.

[h4 Finding Cycles]
Admittedly, cycles in dependency graphs for software probably don't occur so often
that we need to develop special software to find them. However, if the dependency
graph is big (think about all the source code, binaries, data files and thier
dependencies that constitute a typical Linux distribution), then its possible that
cycles creep into the graph. It might be nice to determine if there is such a cycle
before actually trying to build it.

To do this, we are going to provide a customized visitor for a depth-first search (DFS).
Just like the custom visitors in our undirected graph examples, we overload a visitor
event (here, the `back_edge` event) to indicate that a cycle has been found. Using the
same setup as before, our visitor follows:

 struct CycleDetector : public dfs_visitor<>
 {
   CycleDetector(bool& c)
     : has_cycle(c)
   {}

   template <class Edge, class Graph>
   void back_edge(Edge, Graph&)
   {
     has_cycle = true;
   }

   bool& has_cycle;
 };

 CycleDetector detect_cycles(bool& c)
 { return CycleDetector(c); }

That's it... When the `back_edge()` method is called, we know that a cycle exists
in the graph. This literally indicates that there is an edge to a vertex that we
have already visited, hence: a cycle. We also provide a helper function that
instantiates the visitor.

Using the cycle-detecting visitor is just as easy as before. After constructing the
graph, we would find the following in the `main()` program.

 int main()
 {
   // ...continued from above...

   TargetIndexMap indices = get(&Target::index, g);

   bool cycle = false;
   depth_first_search(g,
     vertex_index_map(indices).
     visitor(detect_cycles(cycle)));
   cout << "has cycle: " << cycle << "\n";
 }

Unfortunately, our test input file doesn't currently contain any cycles - a sign of
good engineering - so we'll have to add one. Add the following lines to the input
to create a completely superfluous cycle.

[pre
movies.exe :        kevin_bacon.exe
kevin_bacon.exe :   movies.exe
]

Running the program on the modified input should yield:

[pre
$ ./cycle < files
has cycle: 1
]

[endsect]