- We create a (relatively) concise description of our language for users. Programmers can look at the grammar and puzzle out the structure of programs in our language.
- We create a (relatively) precise specification for language implementation. Tools such as Bison can take the grammar as input and automatically generate parsers that ensure compliance with the grammar, and even build (implicit) parse trees.
- To detect "bad programs"; i.e. to detect if the input violates the language's semantics before it is run
- To respect "good programs"; i.e. to make sure that the compiler's translation adheres to the semantic properties of the language
By using context-free grammars to define the syntax of our program, we have achieved two goals:
There is a distinction made between the syntax of a program (does it have a valid structure in the language?) and the semantics of a program (does it have a valid meaning in the language?). Although this distinction can sometimes be messy, we will simply make the working assumption that any input that violates the context-free grammar specification of the language is syntactic, and any other error is semantic.
Program semantics is a subject worthy of a full course in its own right (often such courses have titles like "Programming Languages"). Semantic specifications tend to be more complicated than syntactic specifications. The inherent complexity of semantics require more elaborate notation, but we will stick to prose definitions of familiar semantic properties.
The compiler performs semantic analysis for two purposes:
For our purposes, we will perform much of the detection of "bad programs" and the analysis of "good programs" by walking over the AST. The tree-based representation of a program is a good match for the compositional nature of many of the properties we'd like to check (The program is well typed if it's composite function code are well-typed. A function is well-typed if it's composite statements are well-typed. A statement is well typed if its composite expressions are well-typed). As such, the foundation of the checks that we will perform is a recursive walk of the AST: the correctness of each node will depend on the correctness of its children. We will also introduce mechanisms that help to propogate information across the AST, in order to handle the context-sensitive nature of semantic properties. One particularly important data structure is the symbol table, which we will discuss in detail in the next reading.