Due on
September 27 |
Due on
September 27 |
None yet!
For this trial, you will create a syntax-directed translator-generator
named dragonsdt
, which will
take an input specification including an enriched BNF-like grammar
(i.e. a context free grammar enhanced with embedded action numbers)
and produce a corresponding syntax-directed translator.
Your translator-generator will use the underlying formalism of an LL(1) parser with semantic actions. The semantic actions will be input as C++ source code.
As with all dragon trials, you may use whatever coding language you like, subject to the following constraints:
You should turn in a tarred directory (similar to previous submissions) containing the following:
make build
should still succeed as a no-op.)
run: This target will run the
dragonsdt executable on a file named
input.ag containing a grammar spec in the
format described below.
If the grammar is not LL(1), dragonsdt should print "Bad grammar" and terminate with a non-zero
exit code. If the grammar is LL(1),
dragonsdt should generate a translator
executable named translator
capable of performing a syntax-directed translation according
to the
actions specified by input.ag.
If your
dragonsdt executable produces source
code (like bison), then this target should also compile
that source code into the translator
executable.
test: This target will run the
translator
executable on a file named
test.tokens
representing a token stream in the
language. If the token stream is a valid string in the
language, translator
should perform the
syntax directed translation.
If the token stream is not a valid string in the language,
translatorp
should print "rejected" to the output
stream and terminate with exit code 1.
Your dragonsdt
program should take
an input specification containing three sections, separated
by delimiter lines containing only two percent signs (%%). Thus,
the specification will be of the form:
<grammar section>
%%
<initialization section>
%%
<action section>
The grammar section contains 1 or more production rule takes the form:
<symbol> ::= <symbol string>
where
Symbol
(both terminal and nonterminal) represents a
sequences of one or more alphanumeric characters, delimited by the
space character. The set of nonterminal symbols are differentiated
by appearing on the LHS of the grammar - any symbol that does not
appear on the LHS of some production rule is considered to be a
terminal.
Symbol string
represents zero or more space-delimited symbols
and SDT trigger
s.
If a symbol string has zero symbols, it specifies an
ε-production rule. An ε-production rule may still have
an SDT action.
SDT trigger
is a number prefixed by a # character,
corresponding to an SDT rule (described below).
The initialization section may contain C++ source code that is accesible in the global scope of any SDT action code.
The action section consists of 0 or more SDT actions, where an SDT action
consists of an SDT trigger
(i.e. the # symbol followed by a number),
followed by an opening curly brace ({
),
followed by source code, followed by a closing curly brace (}
),
where source code is a sequence of 1 or more statements in C++ source code.
For simplicity's sake, you may assume that the source code of an SDT action will always be on one line (though it may contain multiple statements).
Any C++ statements are valid.
Internally, the generated translator should maintain the special variable
look, which should be a string containing the lexeme value of the current lookahead token.
S ::= A #4 A ::= #1 A ::= B A #2 B ::= #3 num %% #include <iostream> #include <stack> std::stack<int> semStack; int pop(){ int res = semStack.front(); semStack.pop(); return res; } %% #1 { semStack.push(0); } #2 { int t2 = pop(); int t1 = pop(); semStack.push(t1 + t2); } #3 { int t = atoi(look); semStack.push(t); } #4 { int t = pop(); std::cout << "Sum is " << t << "\n"; }
The set of non-terminals of this grammar is {S,A,B} (since those are the only symbols to appear on the LHS of the production rules). The set of terminals of this grammar is {num}. The set of action numbers is {#1, #2, #3, #4}. The first production of the grammar matches the empty string and adds action 1 to the syntactic stack.
The above format is designed for simplicity. While your adherence to this spec is greatly appreciated in grading, parsing the input is not really the focus of this assisgnment. You should not focus on rejecting malformed input grammars or on gotchas in the grammar spec.
The syntax-directed translators generated by dragonsdt should take an input file formatted as a stream of terminal tokens. The intention is to match the output of a lexer generated by the code in Trial 1. The format is as follows:
The token stream consists of zero or more token lines. There is no EOF token line, so it is recommended that you insert an EOF pseudotoken after the last token line has been read.
Each line of the token stream has one of two forms:
Tokens are alphanumeric sequences of characters, and correspond to the terminals of the grammar. Our test token streams will only include tokens that are terminals of the language.
Lexemes correspond to the actual characters matched by the token. Note that the lexeme may include a space character (most likely as part of a string).
Positions correspond to the line and column at which a lexeme was matched. Positions consist of a left bracket, followed by a line number, followed by a column number, followed by a right bracket.
Assuming the above input spec in input.ag
, and a test.tokens
file containing the lines
num:1
num:2
num:3
Given the above, the following behavior should be observed (where lines beginning with $ are commands entered by the user):
$ make build
$ make run
$ make test
Sum is 6
While this dragon trial is not trivial, it does not need to be overcomplicated. In particular, when processing the input.ag file, you can simply move all of the code in the initialization section to the beginning of an output file, and then put the source code in each SDT action into it's own function, then put the actual selector table manipulation into a main function at the end of that output file. Thus, the output file for the example input.ag file above might look something like the following
//User-defined initialization code #include <iostream> #include <stack> std::stack<int> semStack; int pop(){ int res = semStack.front(); semStack.pop(); return res; } //SDT Actions void runAction(std::string actionNum){ switch(atoi(actionNum.c_str())){ case 1: int t = pop(); std::cout << "Sum is " << t << "\n"; return; case 2: int t2 = pop(); int t1 = pop(); semStack.push(t1 + t2); return; case 2: int t = atoi(look); semStack.push(t); return; } //Selector table definition (not shown) //... //LL1Parser class definition (not shown) //... //Selector table manipulation int main(int argc , char * argv[]){ LL1Parser p; std::ifstream tokenStream(argv[1]); while(Token t = /* read token*/) { StackSymbol s = p.syntaxStackTop(); if (s.type == "action"){ runAction(s.number); } if (s.type == "nonterminal") { runNonterminal(p, s, t); } if (s.type == "terminal") { runTerminal(p, s, t); } } }
This file can then be compiled as part of the make run command.