Control-Flow Graphs
Contents
Having considered one structural IR (AST) and one linear IR (3AC), we now turn to a hybrid IR, the Control-Flow Graph (Unfortunately, the acronym for Control-Flow Graph is CFG, which we have already used for Context-Free Grammar, nevertheless we will use CFG for Control-Flow Graph and assume that the correct CFG is clear from context). The key idea of the CFG is that it represents uninterruptable sequences of instructions (or statements) as nodes, and control transfers between nodes as edges. Because it relies on some other code representation inside of the nodes, Some would consider the CFG to be more of an overlay rather than an IR itself.
The CFG is useful for optimization, as it makes the possible connections between program points explicit. However, the visual nature of the CFG means that it can also be a helpful way of succinctly capturing the way the program operates. The CFG is such a useful abstraction that it has found use in debugging, reverse engineering, and code comprehension tools. In these various use cases, the program representation upon which the CFG serves as an overlay can change accordingly: the CFG might serve as an overlay for assembly code, source code, or another IR. We will typically represent the CFG as being on overlay for 3AC code, since it will be used in our compiler as a way to aid in the analysis of the 3AC representation of the program.
Let us develop the concept of the Control-Flow Graph with a less-contrained representation, which we simply refer to as a "flowgraph". The key difference is that a flowgraph maps each quad to its own node, and connects nodes via an edge if there is a possible control transfer from the tail of the edge to the head of the edge. For example, consider the following function:
Source code
|
3AC
|
Flowgraph
|
||
int main(){ a = 4; a = a + 2; return a; } |
(a: global, int)
enter main a := 4 a := a + 2 setout 1, a leave main |
Note that each quad node $x$ has an edge to the quad node $y$ that would execute immediately after it. We will refer to this relationship by saying that $y$ is a successor of $x$ (sometimes abbreviated to succ). We will also say that $x$ is a predecessor of $y$ (sometimes abbreviated as pred).
The value of a graph-style representation of a program is most readily apparent when a node has multiple successors, as in the following example:
Source code
|
3AC
|
Flowgraph
|
||
void fn(){ if (b) { b = 7; } a = 3; } |
(a: global, int)
(b: global, int) enter fn jnot b L b := 7 L: a := 3 leave fn |
Compared to the linear listing of the 3AC, the structure of the flowgraph
makes it somewhat easier to see that the assignment
to b
is executed conditionally. Note the small "j" next to the tail
of the edge from the jnot b L
quad to the a := 3
quad,
which denotes that the edge is a "jump" edge (so called because it requires the jnot
to jump for that successor to be taken). All other edges (including the edge from
jnot b L
to b := 7
are called "fallthrough" edges. Since
fallthrough edges are the most common, they will be not be labelled.
Our simple 3AC representation begets a simple edge representation - one can determine the execution path by knowing which edge is the jump edge, for which the j edge annotation sufficies. However, it is also quite common to annotate all edges coming out of conditionals with the condition being followed (for example by labeling the true edge "true" and the false edge "false"). Some linear IRs might support 3AC with multiple jump targets, such as a switch operator like:
switch <opd> <val1>:<L1>
...
valn>:<Ln>