This section defines lexical details of the
A
language.
Reserved Sequences (Keywords)
The following character sequences are considered
reserved, and should be yield a distinct
token (rather than be counted as an identifier).
and
bool
custom
else
eh?
false
fromconsole
if
immutable
int
or
otherwise
means
toconsole
return
true
void
while
Identifiers
Any sequence of one or more letters and/or digits, and/or underscores,
starting with a letter or underscore, should be treated as an identifier
token.
Identifiers must not be reserved sequences, but may include a proper substring that is a reserved word, e.g. while1
is an identifier but while
is not.
Integer Literals
Any sequence of of one or more digits yields an integer literal token as long
as it is not part of an identifer or string.
String Literals
Any string literal (a sequence of zero or more string characters
surrounded by double quotes) should yield a string literal token.
A string character is either
- an escaped character: a backslash followed by any one of the
following characters:
- n
- t
- a double quote
- another backslash
or
- a single character other than newline or double quote or backslash.
Examples of legal string literals:
""
"&!88"
"use \n to denote a newline character"
"use \" to for a quote and \\ for a backslash"
Examples of things that are not legal string literals:
"unterminated
"also unterminated \"
"backslash followed by space: \ is not allowed"
"bad escaped character: \a AND not terminated
Symbol Operators
Any of the following one- or two-character
symbols constitute a distinct token:
= : , + -
== > >= { < <=
( ! & != -- ++
} ) ; / * ->
Comments
-
Text starting with
# up to the end of the line
is a comment (except of course if those characters are
inside a string literal).
For example:
# this is a comment
# and so is # this
# and so is # this %$!#
The scanner should recognize and ignore comments (there is no
COMMENT token).
Whitespace
-
Spaces, tabs, and newline characters are whitespace.
Whitespace separates tokens and changes the character counter,
but should otherwise be ignored (except inside
a string literal).
Illegal Characters
-
Any character that is not whitespace and is not part of a token or
comment is illegal.
Length Limits
-
No limit may be assumed on the lengths of identifiers, string literals,
integer literals, comments, etc. other than those limits imposed by the
underlying implementation of the compiler's host language.
Which Token to Produce
For the most part, the token to produce should be self-explanatory. For
example, the
+
symbol should produce the
CROSS
token, the
-
symbol should produce the
DASH
token, etc. The set of tokens can be found in
frontend.hh
or in the switch in tokens.cpp
. The LCURLY
token refers to a left curly brace,
{
.
the RCURLY
refers to a right curly brace,
}
.
The lexical structure of A is very
similar to C, with a couple of small alterations:
- The string
->
produces the ARROW
token.
- The sequences
custom
,
means
,
otherwise
,
and
eh?
,
are keywords of the language. They produce the
CUSTOM
,
MEANS
,
OTHERWISE
,
and
EH
,
tokens, respectively.
- The string
&&
and ||
are NOT in the language . Instead "logical and" is represented by the string and
and "logical or" is represented by the string or
.
Program Behavior
Additional details (like what the behavior and syntax of
the tokens unique to
A will be specified as future projects approach.
This section described the syntax of the
A
language.
Basics
The basic syntax of
A
is designed to evoke a simplified variant C. A is a block-structured language, with most blocks delimited by curly braces. Variables and functions may be declared in the global scope, most statements and declarations are delimited by semicolons.
Notable Differences from C
While the canonical reference for A syntax is
its context-free grammar, there are a couple of "standout" points
which deserve special attention for their deviation from C:
Function declarations look like
myfn : (a:int, b:int, c:int) -> bool { }
instead of
bool myFunction(int a, int b, int c() { }
Statements besides declarations are not allowed in the global scope (i.e. outside of a function body). Thus
is not a legal program, but
i:int;
fn : () -> void {
i = 4;
}
is legal.
This section defines details of the
A
type system.
Type Promotions
Any type is promotable to itself.
Operands
The following shows the atomic operands and the types
that they are assigned:
- numeric literals (e.g. 7, 3) are of type
int
- bool literals (i.e. true, false) are of type
bool
- string literals are of string type
- identifiers have the type of their declaration, which
is determined during name analysis
Operators
The operators in the language
are divided into the following categories:
- logical: not, and, or
- arithmetic: plus, minus, times, divide, negation, postincrement, postdecrement
- equality: equals, not equals
- relational: less than (<), greater than (>), less then or equals (<=), greater than or equals (>=)
- field access: dereference (--, when not used for postdecrement)
- assignment: assign (
=
)
The type rules of the language are as follows:
logical operators and conditions are legal if and only if:
The result type is bool in legal cases, ERROR otherwise.
-
arithmetic operations are legal if and only if:
- Operands are both int - the result type is int
In all illegal cases, the result type is ERROR.
relational operations are legal if and only if:
The result type is bool in legal cases, ERROR otherwise.
-
equality operations are legal if and only if:
- Both operands are of the same primitive type
- Neither operands are of the same class type
and
- Neither operand is a function type
The result type is bool in legal cases, ERROR otherwise.
-
assignment operations are legal if and only if:
- Both types are the same and the LHS is an lvalue - the result type is that of the LHS
It is ILLEGAL for the operand of an assignment to be a class
name or instance of a class.
The result type is that of the LHS operand in legal cases, ERROR otherwise.
-
consolein
operations are legal if and only if:
-
The operand is of
int
or bool
.
.
-
consoleout
operations are legal if and only if:
- The operand is of type
int
- The operand is of type
bool
- The operand is of type
string
-
member access
operations are legal if and only if:
- The index operand is a field name of the base class type
The result type is the type of the field in legal cases, and ERROR otherwise.
-
function calls are legal if and only if:
- The identifier is of function type (i.e., the callee is actually a function).
and
- The number of actuals must match the number of formals.
and
- The type of each actual must match the type of the corresponding formal.
If the identifier is not a function name, the result type is ERROR. Otherwise, it is the return type of the function even if the arguments are ill-typed.
function returns:
The follow rules hold for function returns:
-
A return statement in a
void
function
may not have a value.
-
A return statement in a non-
void
function must have a value.
-
The return expression must match the function's declared type.
It is LEGAL for a non-void function to skip a return statement.
For example, this code is ok:
f : () -> int {
//I should return an int, but I don't
}