EECS665 – Compiler Construction

Semantic Analysis

Contents

Topic Introduction
Program Meaning
Semantic Analysis
Types Self-Study

Topic IntroductionL

By using context-free grammars to define the syntax of our program, we have achieved two goals:

We create a (relatively) concise description of our language for users. Programmers can look at the grammar and puzzle out the structure of programs in our language.
We create a (relatively) precise specification for language implementation. Tools such as Bison can take the grammar as input and automatically generate parsers that ensure compliance with the grammar, and even build (implicit) parse trees.

Nevertheless, it is perhaps obvious that context-sensitive features of a language cannot be captured by a context-free grammar. At best, a context-free grammar can specify an over-approximation of valid programs. In other words, it is possible to write a program that parses, but it never the less invalid because it violates some context-sensitive property. For example, programs must be well-named (variables are declared before use in languages that require it) and well-typed (operations must be compatible with their data values).

There is a distinction made between the syntax of a program (does it have a valid structure in the language?) and the semantics of a program (does it have a valid meaning in the language?). Although this distinction can sometimes be messy, we will simply make the working assumption that any input that violates the context-free grammar specification of the language is syntactic, and any other error is semantic.

Program MeaningL

Program semantics is a subject worthy of a full course in its own right (often such courses have titles like "Programming Languages"). Semantic specifications tend to be more complicated than syntactic specifications. The inherent complexity of semantics require more elaborate notation, but we will stick to prose definitions of familiar semantic properties.

Semantic AnalysisL

The compiler performs semantic analysis for two purposes:

To detect "bad programs"; i.e. to detect if the input violates the language's semantics before it is run
To respect "good programs"; i.e. to make sure that the compiler's translation adheres to the semantic properties of the language

It is provably impossible to tell algorithmically if a program could violate even simple semantic properties (a result due to Rice's Theorem, effectively a generalization of the well-known Halting Problem). Despite these harsh realities, the compiler's gotta compile and so must stick to approximate algorithms to determine a program's semantic validity. Fortunately, most compiler-writers intuitively accept the approximate nature of semantic checking, since they themselves have experience dealing with these approximations as compiler users.

For our purposes, we will perform much of the detection of "bad programs" and the analysis of "good programs" by walking over the AST. The tree-based representation of a program is a good match for the compositional nature of many of the properties we'd like to check (The program is well typed if it's composite function code are well-typed. A function is well-typed if it's composite statements are well-typed. A statement is well typed if its composite expressions are well-typed). As such, the foundation of the checks that we will perform is a recursive walk of the AST: the correctness of each node will depend on the correctness of its children. We will also introduce mechanisms that help to propogate information across the AST, in order to handle the context-sensitive nature of semantic properties. One particularly important data structure is the symbol table, which we will discuss in detail in the next reading.