- <opd> is an operand (variable or literal)
- <opr> is a logical or mathematical operator
- <lbl> is a label name
- <proc> is a procedure name
- <int> is an int literal
- <opd> = <opd> <opr> <opd>
- <opd> = <opr> <opd>
- <opd> = <opd>
- goto <lbl>
- jnot <opr> <lbl>
- call <proc>
- enter <proc>
- leave <proc>
- setin <int> <opd>
- getin <int> <opd>
- setout <int> <opd>
- getout <int> <opd>
- nop
- (You may annotate any instruction with a label followed by a colon)
Three-Address Code
The Abstract-Syntax Tree (AST) provides a suitable bridge from the concrete syntax of the input program. However, it is useful for the compiler to consider multiple IRs, in order to leverage the strengths (and mitigate the weaknesses) of each. Three-Address Code (abbreviated here as "3AC" and occasionally elsewhere as "TAC"), provides a good successor IR from the AST.
3AC represents the program as a linear sequence of high-level instructions that provide a linear layout that will serve as the basis of ordering for the final code. Apart from it's linear ordering, the distinguishing feature of 3AC (and the reason for it's name), is that every instruction has at most three operands. Furthermore, non-trivial expressions are not allowed as 3AC operands. For example,
a := b + c
(where the:=
operator refers to assignment)
Is a legal 3AC instruction, but
a := b + c + d
Is not.
In order to handle complex expressions, the computation is broken up over multiple instructions, with temporary variables introduced by the compiler to hold intermediate results. The above example may then be represented as
tmp := b + c a := tmp + dBenefits
There are two main benefits to using 3AC as an IR:
Firstly, 3AC represents the program in a way that is still highly abstract (and thus is not tied to a particular architecture) while still moving the code closer to an executable representation. In effect, 3AC can serve as a bridge from the AST to final code.
Secondly, the need for the compiler to commit to an initial instruction ordering is
itself a feature of 3AC, as it determinizes the way in which an AST will
be processed. For example, an AST for the expression a = (b * c) +
(b + c)
does not explicitly specify whether (b * c) should be
evaluated before or after evaluating (b + c). In the former case,
the compiler might translate the expression to
tmp1 = b + c tmp2 = b * cwhile in the latter case, the compiler might use
tmp1 = b * c tmp2 = b + cSome languages (including C++), leave some choices for order of evaluation between sub-expressions to the compiler-writer. Indeed, in this case either order would be considered acceptable according to the C++ language standard, but the second order might correspond to better performance in final code, due to instruction pipelining/instruction-level parallelism. Instruction Set
Just as the particular node types in the AST are subject to both the scope of the compiler and the preferences of the compiler writer, the set of instructions that correspond to a 3AC definition is not canonical. As such, it is best to view 3AC as a family of psuedocode representations. The common theme of 3AC instruction sets are that the set contains a small handful of total instructions, each instruction is simple, and oerands are atomic (i.e. operands are not expressions). With that in mind, here's a sample instruction set that Drew made up (note that descriptions of each instruction follows below).