mirror of
https://github.com/boostorg/graph.git
synced 2026-02-26 04:42:16 +00:00
Restructuring parts of the user's guide. Removed some concepts and fixed issues with others. Added (incomplete) adjacency_matrix docs. [SVN r55976]
371 lines
14 KiB
Plaintext
371 lines
14 KiB
Plaintext
[/
|
|
/ Copyright (c) 2007 Andrew Sutton
|
|
/
|
|
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
/]
|
|
|
|
[section Directed Graphs]
|
|
Like the previous section, here we take a look at how to solve different
|
|
types graph problems using the Boost.Graph library. In this case however,
|
|
the problems being addressed are better modeled with directed graphs. A
|
|
directed graph (also a /digraph/) is one in which edges can only be traversed
|
|
in a specific direction.
|
|
|
|
In this section we are concerned with /dependency graphs/. A dependency graph
|
|
describes a relationship between two entities in which one (/source/) requires
|
|
a second (/target/). The source is said to be dependant upon the target.
|
|
Dependency graphs arise in many, many real-world situations such as source
|
|
code analysis, software installation, "tech-trees" in computer games, etc.
|
|
In this example, we will look at a common dependency graph: dependencies
|
|
between software documents and build targets.
|
|
|
|
[h3 File Dependencies]
|
|
If you've ever typed `make` and wondered how the program decides the order
|
|
tasks to build, then this tutorial is for you. The make software relies on
|
|
dependency information explicitly encoded into Makefiles. Binaries (executable
|
|
programs and libraries) depend on sets of source files. Source files, which
|
|
implement program features, depend on header files, which expose type and
|
|
function declarations. These, in turn, might depend on generated source files
|
|
generated by `lex`, `yacc` or any other of an assortment of code-generating
|
|
tools. These dependencies can be modeled with a directed graph.
|
|
|
|
For this example, we're actually going to use some of the files that actually
|
|
implement the examples described in this user's guide. Their dependencies are
|
|
shown in figure 1. Path names have been omitted for readability.
|
|
|
|
[$images/guide/files.png]
|
|
|
|
From this graph, it should be relatively easy to see that the build should
|
|
start at the bottom and proceed "upwards" with the executable programs being
|
|
built last - maybe. There are actually two different approaches to bulding
|
|
these programs:
|
|
|
|
* One at a time. This is probably what you're used to seeing. The compiler
|
|
builds file A, followed by file B, and finally file C.
|
|
* In parallel. If you're lucky enough to have a multi-processor computer
|
|
available to you we might be able to process (compile) a number of different
|
|
files at the same time by distributing the tasks to different processors.
|
|
|
|
The solution to both of these problems is addressed by topologically sorting
|
|
the graph. This provies the schedule of files to be processed by ordering
|
|
the vertices based on their dependencies.
|
|
|
|
A third problem we might encounter is that of cycles in the dependency
|
|
graph. Cycles are bad since topological sorts will not work in their presence.
|
|
|
|
[h4 Makefiles and Graphs]
|
|
The first step in implementing a make-ordering for source files is acquiring
|
|
some data. For this example, we our program will parse files in a stripped down,
|
|
Makefile-like format. The input looks something like something like this.
|
|
|
|
[pre
|
|
undirected_graph.hpp : adjacency_list.hpp
|
|
directed_graph.hpp : adjacency_list.hpp
|
|
movies.hpp : undirected_graph.hpp
|
|
movies.cpp : movies.hpp tokenizer.hpp
|
|
kevin_bacon.cpp : movies.hpp visitors.hpp breadth_first_search.hpp
|
|
kevin_bacon.exe : movies.cpp kevin_bacon.cpp
|
|
build_order.cpp : directed_graph.hpp topological_sort.hpp
|
|
build_order.exe : build_order.cpp
|
|
]
|
|
|
|
Obviously, we're going to have to build a parser for this input format. Just as
|
|
before our program starts by defining aliases for commonly used template types.
|
|
|
|
struct Target;
|
|
|
|
typedef boost::directed_graph<Target> Graph;
|
|
typedef Graph::vertex_descriptor Vertex;
|
|
typedef Graph::edge_descriptor Edge;
|
|
|
|
In this graph, vertex properties are encapsulated in a `Target` structure. For
|
|
this application, a target is any named file that might appear in the dependency
|
|
graph. Unlike the previous example, we really don't have any need for edge propreties,
|
|
so we can simply omit that template parameter. The `Target` is defined as:
|
|
|
|
struct Target
|
|
{
|
|
int index;
|
|
std::string name;
|
|
};
|
|
|
|
[note
|
|
If you think you're seeing some similarities between the previous example, and this
|
|
one... just wait. There are a number of common properties and tasks in many graph
|
|
related problems such as indexing vertices, providing name labels, etc. Pay special
|
|
attention to the method of adding vertices to the graph - the mapping of a unique
|
|
name to a vertex is nearly ubiquitous in the setup of graph problems.
|
|
]
|
|
|
|
Likewise, we'll go ahead and predefine a property map that will be used later.
|
|
We also need a mapping of target name to vertex so we don't duplicate vertices
|
|
and have a convenient lookup tool later on.
|
|
|
|
typedef boost::property_map<Graph::type, int Target::*>::type TargetIndexMap;
|
|
typedef std::map<std::string, Vertex> TargetMap;
|
|
|
|
We can now start building a program to parse the input data and build a dependency
|
|
graph.
|
|
|
|
using namespace std;
|
|
using namespace boost;
|
|
|
|
int main()
|
|
{
|
|
typedef char_separator<char> separator;
|
|
typedef tokenizer<separator> tokenizer;
|
|
|
|
Graph grapg;
|
|
TargetMap targets;
|
|
|
|
for(string line; getline(cin, line); ) {
|
|
// skip comment and blank lines
|
|
if(line[0] == '#' || line.empty()) {
|
|
continue;
|
|
}
|
|
|
|
// split the string on the dependency
|
|
size_t index = line.find_first_of(':');
|
|
if(index == string::npos) {
|
|
continue;
|
|
}
|
|
string target = trim_copy(line.substr(0, index));
|
|
string deps = trim_copy(line.substr(index + 1));
|
|
|
|
// add the target to the build graph
|
|
Vertex u = add_target(graph, targets, target);
|
|
|
|
// tokenize the dependencies
|
|
separator sep(" \t");
|
|
tokenizer tok(deps, sep);
|
|
tokenizer::iterator i = tok.begin(), j = tok.end();
|
|
for( ; i != j; ++i) {
|
|
string dep = *i;
|
|
|
|
// add this dependency as a target
|
|
Vertex v = add_target(graph, targets, dep);
|
|
|
|
// add the edge
|
|
add_dependency(graph, u, v);
|
|
}
|
|
}
|
|
|
|
// ...to be continued...
|
|
|
|
This is a fairly large chunk of code that implements input parsing and graph construction
|
|
with the help of the `add_target()` and `add_dependency()` functions. Essentially, this
|
|
snippet creates a vertex (target) for each file named in the input file. A dependency
|
|
edge is added between the first target (preceeding the ':' character) and each subsequent
|
|
target (white space separated list following the ':'). The `add_target()` and `add_dependency()`
|
|
method are implemented as:
|
|
|
|
Vertex add_target(Graph& graph, TargetMap& targets, const string& name)
|
|
{
|
|
Vertex v;
|
|
TargetMap::iterator it;
|
|
bool inserted;
|
|
tie(it, inserted) = targets.insert(make_pair(name, Vertex()));
|
|
if(inserted) {
|
|
v = add_vertex(graph);
|
|
it->second = v;
|
|
|
|
graph[v].index = num_vertices(graph) - 1;
|
|
graph[v].name = name;
|
|
}
|
|
else {
|
|
v = it->second;
|
|
}
|
|
return v;
|
|
}
|
|
|
|
You may notice that the `add_target()` function is nearly line-for-line identical to the
|
|
`add_actor()` function in the previous example. This is no coincidence - both functions
|
|
do exactly the same thing. They associate a vertex with a unique name and assign it an
|
|
index that we can use later for various graph algorithms.
|
|
|
|
Edge add_dependency(Graph& graph, Vertex u, Vertex v)
|
|
{
|
|
return add_edge(v, u, graph).first;
|
|
}
|
|
|
|
The `add_dependency()` method is considerably more terse than its undirected counter part,
|
|
but essentially does the same thing. There is one very important difference, however:
|
|
the direction of the edge is reversed to create a subtly different graph. Although
|
|
the method is called to indicate that vertex `u` dependes on vertex `v`, the added edge
|
|
actually indicates that vertex `v` satisfies the dependency of vertex `u`. In fact, this
|
|
is the reverse of the original graph and is shown in Figure 2.
|
|
|
|
[$images/guide/reverse.png]
|
|
|
|
[h4 Obtaining the Make Order]
|
|
We are now ready to compute the make order by running a topological sort. Thanks to a
|
|
canned implementation, this is trivial.
|
|
|
|
int main()
|
|
{
|
|
// ...continued from above...
|
|
|
|
TargetIndexMap indices = get(&Target::index, graph);
|
|
|
|
typedef list<Vertex> MakeOrder;
|
|
MakeOrder order;
|
|
topological_sort(graph, front_inserter(order), vertex_index_map(indices));
|
|
|
|
BuildOrder::iterator i = order.begin(), j = order.end();
|
|
for( ; i != j; ++i) {
|
|
cout << graph[*i] << "\n";
|
|
}
|
|
}
|
|
|
|
The `topological_sort()` algorithm takes an output iterator as the second parameter.
|
|
Here, we use a standard front insertion iterator to prepend each target to the make
|
|
order. The `vertex_index_map()` named parameter is also required for the implementation.
|
|
After computation, we simply print the ordering to standard output.
|
|
|
|
[h4 Parallel Compilation]
|
|
What if we have multiple processors available? Surely there is a way to determine if
|
|
we can compile several independent files simultaneously, thereby reducing the overall
|
|
build time. In fact, there is. Consider rephrasing the question to "what is the earliest
|
|
time that a file can be built assuming that an unlimited number of files can be built
|
|
at the same time?". In our simplified example, the only criteria for when a file can
|
|
be built is that it has no dependencies (i.e., in edges). Further simplifiying the
|
|
example, we assume that each file takes the same amount of time to build (1 time unit).
|
|
|
|
For parallel compilation, we can build all files with zero dependencies in the first
|
|
time unit at the same time. For each file, the time at which it can be built is one
|
|
more than the maximum build time of the files on which it depends. In this example,
|
|
`adjacency_list.hpp` is one of the files that will compile first (in parallel).
|
|
The `directed_graph.hpp` file will compile in the second time step, and `build_order.cpp`
|
|
in the third.
|
|
|
|
To implement this, we need a vector that represents the time slots in which each vertex
|
|
will be built. By visiting the vertices in topological order, we ensure that we can
|
|
assigned the correct time slot to each vertex since values "propogate" down the ordering.
|
|
Just for fun, we'll merge the time ordering with the output so we can see a) the order
|
|
in which each file is built and b) the time slot it could be built in.
|
|
|
|
int main()
|
|
{
|
|
// ...continued from above...
|
|
vector<int> time(num_vertices(graph), 0);
|
|
BuildOrder::iterator i = order.begin(), j = order.end();
|
|
for( ; i != j; ++i) {
|
|
int slot = -1;
|
|
Graph::in_edge_iterator j, k;
|
|
for(tie(j, k) = in_edges(*i, graph); j != k; ++j) {
|
|
Vertex v = source(*j, graph);
|
|
slot = std::max(time[graph[v].index], slot);
|
|
}
|
|
time[g[*i].index] = slot + 1;
|
|
|
|
cout << g[*i].name << "\t[" << time[g[*i].index] << "]\n";
|
|
}
|
|
}
|
|
|
|
This is a code may be a little dense, but demonstrates two important aspects of
|
|
the Boost.Graph library. First this demonstrates the importantance of vertex
|
|
indices. Despite their instability with mutable graphs, many (most?) graph algorithms
|
|
use vertex indices to efficiently associate extra data with a vertex. In fact, this
|
|
approach is so ubiquitous in the examples that it leads many to believe the
|
|
`vertex_descriptor` is always the index of a vertex.
|
|
|
|
[warning
|
|
A `vertex_descriptor` is *not* its index in the graphs container of vectors!
|
|
]
|
|
|
|
The second important aspect this demonstrates is the construction of an /external
|
|
property/ for vertices. Although we don't use the `time` vector in any additional
|
|
computations, we could easily turn it into a property map for use with other
|
|
algorithms.
|
|
|
|
The output might something like this:
|
|
|
|
[pre
|
|
$ ./make_order < files
|
|
topological_sort.hpp [0]
|
|
breadth_first_search.hpp [0]
|
|
visitors.hpp [0]
|
|
tokenizer.hpp [0]
|
|
adjacency_list.hpp [0]
|
|
directed_graph.hpp [1]
|
|
build_order.cpp [2]
|
|
build_order.exe [3]
|
|
undirected_graph.hpp [1]
|
|
movies.hpp [2]
|
|
kevin_bacon.cpp [3]
|
|
movies.cpp [3]
|
|
kevin_bacon.exe [4]
|
|
]
|
|
|
|
Although it probably won't since I doctored the tabbing for display purposes.
|
|
|
|
[h4 Finding Cycles]
|
|
Admittedly, cycles in dependency graphs for software probably don't occur so often
|
|
that we need to develop special software to find them. However, if the dependency
|
|
graph is big (think about all the source code, binaries, data files and thier
|
|
dependencies that constitute a typical Linux distribution), then its possible that
|
|
cycles creep into the graph. It might be nice to determine if there is such a cycle
|
|
before actually trying to build it.
|
|
|
|
To do this, we are going to provide a customized visitor for a depth-first search (DFS).
|
|
Just like the custom visitors in our undirected graph examples, we overload a visitor
|
|
event (here, the `back_edge` event) to indicate that a cycle has been found. Using the
|
|
same setup as before, our visitor follows:
|
|
|
|
struct CycleDetector : public dfs_visitor<>
|
|
{
|
|
CycleDetector(bool& c)
|
|
: has_cycle(c)
|
|
{}
|
|
|
|
template <class Edge, class Graph>
|
|
void back_edge(Edge, Graph&)
|
|
{
|
|
has_cycle = true;
|
|
}
|
|
|
|
bool& has_cycle;
|
|
};
|
|
|
|
CycleDetector detect_cycles(bool& c)
|
|
{ return CycleDetector(c); }
|
|
|
|
That's it... When the `back_edge()` method is called, we know that a cycle exists
|
|
in the graph. This literally indicates that there is an edge to a vertex that we
|
|
have already visited, hence: a cycle. We also provide a helper function that
|
|
instantiates the visitor.
|
|
|
|
Using the cycle-detecting visitor is just as easy as before. After constructing the
|
|
graph, we would find the following in the `main()` program.
|
|
|
|
int main()
|
|
{
|
|
// ...continued from above...
|
|
|
|
TargetIndexMap indices = get(&Target::index, g);
|
|
|
|
bool cycle = false;
|
|
depth_first_search(g,
|
|
vertex_index_map(indices).
|
|
visitor(detect_cycles(cycle)));
|
|
cout << "has cycle: " << cycle << "\n";
|
|
}
|
|
|
|
Unfortunately, our test input file doesn't currently contain any cycles - a sign of
|
|
good engineering - so we'll have to add one. Add the following lines to the input
|
|
to create a completely superfluous cycle.
|
|
|
|
[pre
|
|
movies.exe : kevin_bacon.exe
|
|
kevin_bacon.exe : movies.exe
|
|
]
|
|
|
|
Running the program on the modified input should yield:
|
|
|
|
[pre
|
|
$ ./cycle < files
|
|
has cycle: 1
|
|
]
|
|
|
|
[endsect] |