EECS665 – Compiler Construction

Trial 2

Due on September 27th 11:59 PM (Not accepted late).

Updates

None yet!

Overview

For this trial, you will create a syntax-directed translator-generator named dragonsdt, which will take an input specification including an enriched BNF-like grammar (i.e. a context free grammar enhanced with embedded action numbers) and produce a corresponding syntax-directed translator.

Your translator-generator will use the underlying formalism of an LL(1) parser with semantic actions. The semantic actions will be input as C++ source code.

As with all dragon trials, you may use whatever coding language you like, subject to the following constraints:

Your code must build and run on the cycle servers under a non-privileged user account
Your program must be self-contained (no code in other user directories)
Your work must be substantially your own (no cheating)

Deliverables

You should turn in a tarred directory (similar to previous submissions) containing the following:

The source files for your translator-generator
A test attribute grammar (i.e. a context-free grammar with syntax directed translation rules attached) on which to run the project, called input.ag
A test token stream on which to run the translator-generator, called test.tokens
A Makefile with the following three targets:
- build: This target will build the dragonsdt executable from source code (you may use a language that does not require compilation, but make build should still succeed as a no-op.)
- run: This target will run the dragonsdt executable on a file named input.ag containing a grammar spec in the format described below. If the grammar is not LL(1), dragonsdt should print "Bad grammar" and terminate with a non-zero exit code. If the grammar is LL(1), dragonsdt should generate a translator executable named translator capable of performing a syntax-directed translation according to the actions specified by input.ag.
  If your dragonsdt executable produces source code (like bison), then this target should also compile that source code into the translator executable.
- test: This target will run the translator executable on a file named test.tokens representing a token stream in the language. If the token stream is a valid string in the language, translator should perform the syntax directed translation. If the token stream is not a valid string in the language, translatorp should print "rejected" to the output stream and terminate with exit code 1.

Attribute Grammar Format

Your dragonsdt program should take an input specification containing three sections, separated by delimiter lines containing only two percent signs (%%). Thus, the specification will be of the form:

<grammar section> %% <initialization section> %% <action section>

Grammar Section

The grammar section contains 1 or more production rule takes the form:

<symbol> ::= <symbol string>

where

Symbol (both terminal and nonterminal) represents a sequences of one or more alphanumeric characters, delimited by the space character. The set of nonterminal symbols are differentiated by appearing on the LHS of the grammar - any symbol that does not appear on the LHS of some production rule is considered to be a terminal.
Symbol string represents zero or more space-delimited symbols and SDT triggers. If a symbol string has zero symbols, it specifies an ε-production rule. An ε-production rule may still have an SDT action.
- An SDT trigger is a number prefixed by a # character, corresponding to an SDT rule (described below).

Initialization Section

The initialization section may contain C++ source code that is accesible in the global scope of any SDT action code.

Action Section

The action section consists of 0 or more SDT actions, where an SDT action consists of an SDT trigger (i.e. the # symbol followed by a number), followed by an opening curly brace ({), followed by source code, followed by a closing curly brace (}), where source code is a sequence of 1 or more statements in C++ source code. For simplicity's sake, you may assume that the source code of an SDT action will always be on one line (though it may contain multiple statements). Any C++ statements are valid. Internally, the generated translator should maintain the special variable look, which should be a string containing the lexeme value of the current lookahead token.

Example

Here is an example input.ag spec that your translator-generator should be able to handle. This particular spec is for an LL(1) translator that prints out the sum of a stream of num token lexemes.

S ::= A #4
A ::= #1
A ::= B A #2
B ::= #3 num
%%
#include <iostream>
#include <stack>
std::stack<int> semStack;
int pop(){
	int res = semStack.front();
	semStack.pop();
	return res;
}
%%
#1 { semStack.push(0); }
#2 { int t2 = pop(); int t1 = pop(); semStack.push(t1 + t2); }
#3 { int t = atoi(look); semStack.push(t); }
#4 { int t = pop(); std::cout << "Sum is " << t << "\n"; }

The set of non-terminals of this grammar is {S,A,B} (since those are the only symbols to appear on the LHS of the production rules). The set of terminals of this grammar is {num}. The set of action numbers is {#1, #2, #3, #4}. The first production of the grammar matches the empty string and adds action 1 to the syntactic stack.

The above format is designed for simplicity. While your adherence to this spec is greatly appreciated in grading, parsing the input is not really the focus of this assisgnment. You should not focus on rejecting malformed input grammars or on gotchas in the grammar spec.

Token Stream Format

The syntax-directed translators generated by dragonsdt should take an input file formatted as a stream of terminal tokens. The intention is to match the output of a lexer generated by the code in Trial 1. The format is as follows:

Token Streams

The token stream consists of zero or more token lines. There is no EOF token line, so it is recommended that you insert an EOF pseudotoken after the last token line has been read.

Token Lines

Each line of the token stream has one of two forms:

A terminal symbol, followed by a space literal, followed by a position, followed by a newline character literal.
A terminal symbol, followed by a colon literal, followed by a lexeme, followed by a space literal, followed by a position, followed by a newline character literal.

Tokens

Tokens are alphanumeric sequences of characters, and correspond to the terminals of the grammar. Our test token streams will only include tokens that are terminals of the language.

Lexemes

Lexemes correspond to the actual characters matched by the token. Note that the lexeme may include a space character (most likely as part of a string).

Positions

Positions correspond to the line and column at which a lexeme was matched. Positions consist of a left bracket, followed by a line number, followed by a column number, followed by a right bracket.

Example

Assuming the above input spec in input.ag, and a test.tokens file containing the lines num:1 num:2 num:3

Given the above, the following behavior should be observed (where lines beginning with $ are commands entered by the user):
$ make build $ make run $ make test Sum is 6

Hints

While this dragon trial is not trivial, it does not need to be overcomplicated. In particular, when processing the input.ag file, you can simply move all of the code in the initialization section to the beginning of an output file, and then put the source code in each SDT action into it's own function, then put the actual selector table manipulation into a main function at the end of that output file. Thus, the output file for the example input.ag file above might look something like the following

//User-defined initialization code
#include <iostream>
#include <stack>
std::stack<int> semStack;
int pop(){
	int res = semStack.front();
	semStack.pop();
	return res;
}
//SDT Actions
void runAction(std::string actionNum){
switch(atoi(actionNum.c_str())){
	case 1:
		int t = pop(); std::cout << "Sum is " << t << "\n";
		return;
	case 2:
		int t2 = pop(); int t1 = pop(); semStack.push(t1 + t2);
		return;
	case 2:
		int t = atoi(look); semStack.push(t);
		return;
}

//Selector table definition (not shown)
//...
//LL1Parser class definition (not shown)
//...

//Selector table manipulation
int main(int argc , char * argv[]){
	LL1Parser p;

	std::ifstream tokenStream(argv[1]);
	while(Token t = /* read token*/) {
		StackSymbol s = p.syntaxStackTop();
		if (s.type == "action"){ runAction(s.number); }
		if (s.type == "nonterminal") { runNonterminal(p, s, t); }
		if (s.type == "terminal") { runTerminal(p, s, t); }
	}
}

This file can then be compiled as part of the make run command.