Language Specification
Lexical | Syntactic | Types | IR Semantics | Output Semantics

Lexical Details

This section defines lexical details of the Jeff language.

Reserved Words

The following words are considered reserved, and should be yield a distinct token (rather than be counted as an identifier).
bool  close  console  
else  false  file  fn  
if  int  open  return  
true  void  while  

Identifiers

Any sequence of one or more letters and/or digits, and/or underscores, starting with a letter or underscore, should be treated as an identifier token. Identifiers must not be reserved words, but may include a proper substring that is a reserved word, e.g. while1 is an identifier but while is not.

Integer Literals

Any sequence of of one or more digits yields an integer literal token as long as it is not part of an identifer or string.

String Literals

Any string literal (a sequence of zero or more string characters surrounded by double quotes) should yield a string literal token. A string character is either
  • an escaped character: a backslash followed by any one of the following characters:
    1. n
    2. t
    3. a double quote
    4. another backslash
    or
  • a single character other than newline or double quote or backslash.

Examples of legal string literals:

""
"&!88"
"use \n to denote a newline character"
"use \" to  for a quote and \\ for a backslash"
Examples of things that are not legal string literals:
"unterminated
"also unterminated \"
"backslash followed by space: \ is not allowed"
"bad escaped character: \a AND not terminated

Symbol Operators

Any of the following one- or two-character symbols constitute a distinct token:
&&    =    :    ,    +    
-    ==    >    >=    {    <    
<=    (    [    !    !=    ||    
--    ++    ?    ]    <<    }    
)    ;    /    *    >>    


Comments

Text starting with # up to the end of the line is a comment (except of course if those characters are inside a string literal). For example:
 
# this is a comment
# and so is # this
# and so is # this %$!#

The scanner should recognize and ignore comments (there is no COMMENT token).

Whitespace

Spaces, tabs, and newline characters are whitespace. Whitespace separates tokens and changes the character counter, but should otherwise be ignored (except inside a string literal).

Illegal Characters

Any character that is not whitespace and is not part of a token or comment is illegal.

Length Limits

No limit may be assumed on the lengths of identifiers, string literals, integer literals, comments, etc. other than those limits imposed by the underlying implementation of the compiler's host language.

Which Token to Produce

For the most part, the token to produce should be self-explanatory. For example, the + symbol should produce the CROSS token, the -  symbol should produce the DASH token, etc. The set of tokens can be found in frontend.hh or in the switch in tokens.cpp. The LCURLY token refers to a left curly brace, {. the RCURLY refers to a right curly brace, }.

The lexical structure of Jeff is very similar to C, with a couple of small alterations:

  • The words file, console, fn, open, and close are keywords of the language. They produce the FILE, CONSOLE, FN, OPEN, and CLOSE, tokens, respectively.
  • The character sequences << and >> correspond to operators in the language. They produce the WRITE and READ tokens, respectively.

Syntactic Details

This section described the syntax of the Jeff language.

Basics

The basic syntax of Jeff is designed to evoke a simplified variant C. Jeff is a block-structured language, with most blocks delimited by curly braces. Variables and functions may be declared in the global scope, most statements and declarations are delimited by semicolons.

Notable Differences from C

While the canonical reference for Jeff syntax is its context-free grammar, there are a couple of "standout" points which deserve special attention for their deviation from C:

  • Declarations are not allowed to have initializers. Thus, int answer = 42; is illegal in Jeff. Instead, initialization must be a separate statement, i.e. int answer; answer = 1;.

  • Function declarations look like
    fn : (int a, int b, int c) bool myFunction {}
    instead of
    bool myFunction(int a, int b, int c() { }
  • Statements besides declarations are not allowed in the global scope (i.e. outside of a function body). Thus
    int i;
    i = 4;
    

    is not a legal program, but

    int i;
    fn : () void function{
      i = 4;
    }
    

    is legal.

  • Jeff uses similar stream operators to C++, so a statement like ofile << "hello"; or ifile >> var; is legal.

  • Files are a primitive type in Jeff, so a variable can be declared of type file.

  • The open operator is used to open a file, and uses the stream operators to indicate if the file is opened in read mode or write mode:
    file f;
    open << f "/tmp/writeme.txt";
    Declares a file f and opens it in write mode.
    file g;
    open >> g "/tmp/readme.txt";
    Declares a file f and opens it in read mode.
  • The close operator closes a file.
    file h; open >> h "/tmp/myfile.txt"; close h; opens and then closes the file h
  • Context-Free Grammar for Jeff

    /*********************************************************************
     Grammar for Jeff programs
     ********************************************************************/
    program         ::= globals
    
    globals         ::= globals varDecl SEMICOL
                    |   globals FN COLON fnDecl
                    | /* epsilon */
    
    varDecl         ::= type id
    
    type            ::= primType
                    |   arrayType
    
    primType        ::= INT
                    |   BOOL
                    |   VOID
                    |   FILE
    
    arrayType       ::= primType LBRACE INTLITERAL RBRACE
    
    fnDecl          ::= LPAREN formals RPAREN type id LCURLY stmtList RCURLY
    		|   LPAREN RPAREN type id LCURLY stmtList RCURLY
    
    formals         ::= formalDecl
                    | formals COMMA formalDecl
    
    formalDecl	::= type id
    
    stmtList        ::= stmtList stmt SEMICOL
                    |   stmtList blockStmt
                    |   /* epsilon */
    
    blockStmt	::= WHILE LPAREN exp RPAREN LCURLY stmtList RCURLY
    		| IF LPAREN exp RPAREN LCURLY stmtList RCURLY
    		| IF LPAREN exp RPAREN LCURLY stmtList RCURLY ELSE LCURLY stmtList RCURLY
    
    stmt            ::= varDecl
                    | id ASSIGN exp
                    | OPEN READ loc STRINGLITERAL
                    | OPEN WRITE loc STRINGLITERAL
                    | CLOSE loc
                    | loc POSTDEC
                    | loc POSTINC
                    | loc READ loc
                    | loc WRITE exp
                    | RETURN exp
                    | RETURN
                    | callExp
    
    exp             ::= exp DASH exp
                    | exp CROSS exp
                    | exp STAR exp
                    | exp SLASH exp
                    | LPAREN exp QUESTION exp COLON exp RPAREN
                    | exp AND exp
                    | exp OR exp
                    | exp EQUALS exp
                    | exp NOTEQUALS exp
                    | exp GREATER exp
                    | exp GREATEREQ exp
                    | exp LESS exp
                    | exp LESSEQ exp
                    | NOT exp
                    | DASH term
                    | term
    
    callExp         ::=  id LPAREN RPAREN   // fn call with no args
                    | id LPAREN actualsList RPAREN  // with args
    
    actualsList     ::= exp
                    | actualsList COMMA exp
                    
    
    term            ::= loc
                    | INTLITERAL
                    | STRINGLITERAL
                    | TRUE
                    | FALSE
                    | LPAREN exp RPAREN
                    | callExp
    
    loc		::= id
    		| id LBRACE exp RBRACE
    
    id              ::= ID
    		| CONSOLE
    

    Type System Details

    This section defines details of the Jeff type system.

    Type Promotions

    Any type is promotable to itself.

    Operands

    The following shows the atomic operands and the types that they are assigned:

    • numeric literals that do not have a suffix (e.g. 7, 3) are of type int
    • bool literals (i.e. true, false) are of type bool
    • string literals are of string type
    • identifiers have the type of their declaration, which is determined during name analysis

    Operators

    The operators in the language are divided into the following categories:

    • logical: not, and, or
    • arithmetic: plus, minus, times, divide, negation, postincrement, postdecrement
    • equality: equals, not equals
    • relational: less than (<), greater than (>), less then or equals (<=), greater than or equals (>=)
    • indexing: index ([ ])
    • assignment: assign (=)

    The type rules of the language are as follows:

    • logical operators and conditions are legal if and only if:

      • All operands are bool

      The result type is bool in legal cases, ERROR otherwise.

    • arithmetic operations are legal if and only if:

      • Operands are both int - the result type is int

      In all illegal cases, the result type is ERROR.

    • relational operations are legal if and only if:

      • Both operands are int

      The result type is bool in legal cases, ERROR otherwise.

    • equality operations are legal if and only if:

      • Both operands are of the same primitive type
      • and
      • Neither operand is a function type
      • Neither operand is an array

      The result type is bool in legal cases, ERROR otherwise.

    • assignment operations are legal if and only if:

      • Both types are the same and the LHS is an lvalue - the result type is that of the LHS

      It is LEGAL to assign arrays of the same base type but different lengths regardless if the LHS or RHS is bigger

      The result type is that of the LHS operand in legal cases, ERROR otherwise.

    • output (<<) operations are legal if and only if:

      • The operand is of int or bool or array byte[η], for any length η. . .
    • read (>>) operations are legal if and only if:

      • The operand is of type int
      • The operand is of type bool
      • The operand is of type string
    • open and close operations are legal if and only if:

      • The operand is of type file

    • indexing operations are legal if and only if:

      • The operand is int

      The result type is the base type of the array in legal cases, and ERROR otherwise.

    • function calls are legal if and only if:

      • The identifier is of function type (i.e., the callee is actually a function).
      • and
      • The number of actuals must match the number of formals.
      • and
      • The type of each actual must match the type of the corresponding formal.

      If the identifier is not a function name, the result type is ERROR. Otherwise, it is the return type of the function even if the arguments are ill-typed.

    • function returns: The follow rules hold for function returns:

      • A return statement in a void function may not have a value.
      • A return statement in a non-void function must have a value.
      • The return expression must match the function's declared type.

      It is LEGAL for a non-void function to skip a return statement. For example, this code is ok:

      int f() {
          //I should return an int, but I don't
      }