EECS665 – Compiler Construction – Drew Davidson $\newcommand{\bigsqcap}{\mathop ⨅}$

    Code Generation


    Contents
    • Overview
    • Spim
    • Auxiliary Fields and Methods
    • Code Generation for Global Variable Declarations
    • Code Generation for Functions
    • Code Generation for Statements
    • Code Generation for Expressions
    • Control-Flow Code

      Overview

      Code can be generated by a syntax-directed translation while parsing or by traversing the abstract syntax tree after the parse (i.e., by writing a codeGen method for the appropriate kinds of AST nodes). We will assume the latter approach, and will discuss code generation for a subset of the C language. In particular, we will discuss generating MIPS assembly code suitable for input to the Spim interpreter. Some information on Spim is provided in the next section; the following sections discuss code generation for:

      • global variables
      • functions (entry and exit)
      • statements
      • expressions
      Spim

      Documentation on Spim is available on-line:

      To run the (plain) Spim interpreter, type:

        spim -file <name>
      where <name> is the name of a file that contains MIPS assembly code (the file produced by your compiler). This will cause Spim to process the code in the file; if there are syntax errors, they will be reported, and the code will not execute. Otherwise, the code will execute; output will be printed to your terminal, and you will get error messages for any run-time errors that result.

      To run Spim with an X-windows interface (on a Linux machine), type just: xspim. This will cause a window to open. Click on the load button in that window, then type in the name of your assembly-code file, then press return. If there are syntax errors, you will see error messages. If there are no errors, you can run your program by clicking on run, then (in the small window that will be opened) on OK. If your program generates any output, a new window will be opened to display that output.

      Spim uses the following special registers (there are others; you can see the on-line Spim documentation for those, but you should not need them for the class project):

      Register Purpose
      $sp stack pointer
      $fp frame pointer
      $ra return address
      $v0 used for system calls and to return int values from function calls, including the syscall that reads an int
      $f0 used to return double values from function calls, including the syscall that reads a double
      $a0 used for output of int and string values
      $f12 used for output of double values
      $t0 - $t7 temporaries for ints
      $f0 - $f30 registers for doubles (used in pairs; i.e., use $f0 for the pair $f0, $f1)

      Auxiliary Constants and Methods

      To simplify the task of code generation, it is convenient to have a set of constants (final static fields) that define the string representations of the registers that will be used in the generated code and the values used to represent true and false, as well as a set of methods for actually writing the generated code to a file. We will assume that we have the following register constants: SP, FP, RA, V0, A0, T0, T1, F0, F12, as well as the constants TRUE and FALSE (and that TRUE is represented as 1, and false as 0). We will also assume that we have the following methods:

      Method Purpose
      generate write the given op code and arguments, nicely formatted, to the output file
      generateIndexed the arguments are: an op code, a register R1, another register R2, and an offset; generate code of the form: op R1, offset(R2)
      genPush(String reg, int bytes) generate code to push the value in the given register onto the stack;
      parameter bytes is 4 for an int and 8 for a double
      genPop(String reg, int bytes) generate code to pop the top-of-stack value into the given register
      nextLabel return a string to be used as a label (more on this later)
      genLabel given a label L, generate: L:
      Code Generation for Global Variable Declarations

      For each global variable v, generate:

      	.data
      	.align 2  # align on a word boundary
          _v: .space N
      
      where N is the size of the variable in bytes. (Scalar integer variables require 4 bytes; double variables require 8 bytes.) This code tells the assembler to set aside N bytes in the static data area, in a location labeled with the name _v.

      Example: Given this source code:

      	int x;
      	double y;
      
      you should generate this code:
      	.data
      	.align 2
          _x: .space 4
      	.data
      	.align 2
          _y: .space 8
      
      It is not actually necessary to generate .data if the previous generated code was also for a global variable declaration; however, since function declarations can be intermixed with global variable declarations (and cause code to be generated in the text area, not the static data area), this may not be the case; it is safe (and easier) just to generate those directives for every global variable.

      Code Generation for Functions

      For every function you will generate code for:

      • the function "preamble"
      • the function entry (to set up the function's Activation Record)
      • the function body (its statements)
      • function exit (restoring the stack, and returning to the caller).

      Function Preamble

      For the main function, generate:

      	.text
      	.globl main
         main:
      

      For all other functions, generate:

              .text
              _<functionName>:
      
      using the actual name in place of <functionName>. This tells the assembler to store the following instructions in the text area, labeled with the given name.

      After generating this "preamble" code, you will generate code for (1) function entry, (2) function body, and (3) function exit.

      Function Entry

      We assume that when a function starts executing, the stack looks like this:

      image/svg+xml caller's AR parameters FP SP
      Before starting to execute the statements in the function body, we want it to look like this:
      image/svg+xml caller's AR parameters FP SP new AR return address control link(saved FP) local variables
      The parameters will already be on the stack (pushed by calling function). So the code for function entry must do the following:
      1. push the return address
      2. push the control link
      3. set the FP
      4. push space for local variables
      Here's the code you need to generate:
        # (1) push return addr
          sw	 $ra, 0($sp)
          subu $sp, $sp, 4
        # (2) push control link
          sw   $fp, 0($sp)
          subu $sp, $sp, 4
        # (3) set the FP
        # note: the following sets the FP to point to the "bottom"
        #       of the new AR; the reason for "+ 8" is:
        #       4 bytes each for the control link and the return addr
          addu $fp, $sp, <size of params in bytes + 8>
        # (4) push space for locals
          subu $sp, $sp, <size of locals in bytes>
      
      Note: <size of params> and <size of locals> will need to be available to the code generator. The symbol-table entry for the function name will have information about the parameters (because that will have been used for type checking). For example, it might have a list of the symbol-table entries for the parameters. You could also store the total size of the parameters in the function name's symbol-table entry, or you could write a method that takes the list of parameters as its argument and computes the total size. It is not so easy to compute the total size of the local variables at code-generation time; it is probably a better idea to do that during name analysis. The name-analysis phase will be computing the offsets for the parameters and local variables anyway; it should not be difficult to extend that code to also compute the total size of the locals (and to store that information in the function name's symbol-table entry).

      Function Body

      Note: we are talking about the codeGen method for the FnBodyNode, whose subtree will look like this:

      image/svg+xml FnBodyNode StmtListNode DeclListNode

      There is no need to generate any code for the declarations. So to generate code for the function body, just call the codeGen method of the StmtListNode, which will in turn call the codeGen method of each statement in the list. What those methods will do is discussed below in the section on Code Generation for Statements.

      Function Exit

      Just before a function returns, the stack looks like this:

      We need to generate code to pop off this function's AR, then to jump to the address in the "return address" field. Popping off the AR means restoring the SP and FP. Note that we want to move the SP to where the FP is currently pointing, but if there may be an interrupt that could use the stack, we don't want to change the SP until we're finished with all of the values in the current AR (in particular, the control link, which is used to restore the FP). Therefore, we use a temporary register (t0) to save the address that is initially in the FP

      Here is the code that needs to be generated:

        lw   $ra, -<param size>($fp)    # load return address
        move $t0, $fp                   # save control link
        lw   $fp, -<paramsize+4>($fp)   # restore FP
        move $sp, $t0                   # restore SP
        jr   $ra                        # return
      
      Note that there are two things that cause a function to return:
      1. A return statement is executed, or
      2. The last statement in the function is executed (i.e., execution "falls off the end" of the function).
      You could generate the "return" code given above for each return statement as well as after the last statement in the function body. A more space-efficient approach would be:
      • Generate the "return" code just once after generating the code for the function body. Label that code with a unique label (e.g., the result of calling nextLabel).
      • For each return statement, generate a jump to the label you used (the op code for an unconditional jump is just b).

      What about a return statement that returns a value? As discussed below, the codeGen method for the returned expression will generate code to evaluate that expression, leaving the value on the stack. The MIPS convention is to use register V0 to return a int value from a function and to use register F0 to return a double value. So the codeGen method for the return statement should generate code to pop the value from the stack into the appropriate register (before generating the "return" code or the jump to the return code discussed above).

      Code Generation for Statements

      You will write a different codeGen method for each kind of StmtNode. You are strongly advised to write this method for the WriteIntStmtNode, WriteDblStmtNode, and WriteStrStmtNode first. Then you can test code generation for the other kinds of statements and the expressions by writing a program that computes and prints a value. It will be much easier to find errors in your code this way (by looking at the output produced when a program is run) than by looking at the assembly code you generate.

      Write Statement

      To generate code for a write statement whose expression is of type int you must:

      1. Call the codeGen method of the expression being printed. That method will generate code to evaluate the expression, leaving that value on the top of the stack.
      2. Generate code to pop the top-of-stack value into register A0 (a special register used for output of strings and ints)
      3. Generate code to set register V0 to 1.
      4. Generate a syscall instruction.

      Below is the code you would write for the codeGen method of the WriteIntStmtNode.

          // step (1)
          myExp.codeGen();
          
          // step (2)
          genPop(A0, 4);
          
          // step (3)
          generate("li", V0, 1);
          
          // step (4)
          generate("syscall");
      
      The code for the WriteStrStmtNode and the WriteDblStmtNode is similar, except for the following:
      • For a string, the codeGen method of the expression being printed will leave the address of the string on the stack.
      • For a double, the value to be written must be popped into register F12 instead of A0.
      • For a string, register V0 must be set to 4.
      • For a double, register V0 must be set to 3.

      If-Then Statement The AST for an if-then statement looks like:

                          ------------
                         | IfStmtNode |
                          ------------
                       /       |        \
             ---------   --------------   -------------
            | ExpNode | | DeclListNode | | StmtListNode|
             ---------   --------------   -------------
      
      There are two different approaches to generating code for statements that involve conditions (e.g., for if statements and while loops):
      1. The numeric method, and
      2. the control-flow method.
      We will discuss code generation for if-then statements assuming the numeric method here; the control-flow method will be discussed later. The code generated by the IfStmtNode's codeGen method will have the following form:
      1. Evaluate the condition, leaving the value on the stack.
      2. Pop the top-of-stack value into register T0.
      3. Jump to FalseLabel if T0 == FALSE.
      4. Code for the statement list.
      5. FalseLabel:

      Labels

      Note that the code generated for an if-then statement will need to include a label. Each label in the generated code must have a unique name (although we will refer to labels in these notes using names like "FalseLabel" as above). As discussed above, we will assume that there is a method called nextLabel that returns (as a String) a new label every time it is called, and we will assume that there is a method called genLabel that prints the given label to the assembly-code file.


      TEST YOURSELF #1

      Question 1: What is the form of the code generated by an IfElseStmtNode's codeGen method?

      Question 2: What is the actual code that needs to be written for the IfStmtNode's codeGen method?

      Question 3: What is the form of the code generated by a WhileStmtNode's codeGen method?


      Return Statement

      The AST for a return statement is either:

                          ----------------
                         | ReturnStmtNode |
                          ----------------
      
      or:
                          ----------------
                         | ReturnStmtNode |
                          ----------------
                                 |
                            -----------
                            | ExpNode |
                            -----------
      
      As discussed above, if a value is being returned, the ReturnStmtNode's codeGen method should call its ExpNode's codeGen method (to generate code to evaluate the returned expression, leaving the value on the stack), then should generate code to pop that value into register V0 or register F0 (depending on its type).

      To generate the code that actually does the return, use one of the following approaches:

      1. For each return statement in the program, generate a copy of the code that pops the AR off the stack then jumps to the return address (that code was discussed above under Function Exit), or
      2. For each return statement in the program, generate a jump to the "return" code that is generated at the end of the function. Note that in this case you will need to label that return code, and you will need to know what that label is when generating code for a return statement.

      Read Statement

      The AST for a read statement is one of the following:

                  -------------------                 -------------------
                  | ReadIntStmtNode |                 | ReadDblStmtNode |
                  -------------------                 -------------------
                         |                                     |
                    ----------                           ----------
                    | IdNode |                           | IdNode |
                    ----------                           ----------
      

      To read an integer value into register V0, you must generate this code:

            li    $v0,  5
        syscall
        
      The code loads the special value 5 into register V0, then does a syscall. The fact that V0 contains the value 5 tells the syscall to read an integer value from standard input, storing the value back into register V0. So we must start by generating the above code, then we must generate code to store the value from V0 to the address of the IdNode.

      To read a double value, load the value 7 into V0, do a syscall, and expect the input value to be in register F0 (not V0).

    Digression: Code Generation for IdNodes

    Before considering other kinds of statements, let's think about the role identifiers will play in code generation. Names show up in the following contexts:

    • function calls (the name of the called function)
    • expressions (an expression can be just a name, or a name can be the operand of any operator)
    • assignment statements (on the left-hand side)
    The code that needs to be generated for the name will be different in each context:
    1. For a function call, we will need to generate a jump-and-link instruction using the name of the function (the same name that was generated as a label in the function's "preamble" code).
    2. For an expression, we will need to generate code to fetch the current value either from the static data area or from the current Activation Record, and to push that value onto the stack.
    3. For an assignment, we will need to generate code to push the address of the variable (either the address in the static data area, or in the current Activation Record) onto the stack. Then we will generate code to store the value of the right-hand-side expression into that address.
    Therefore, it seems reasonable to write three different code-generation methods for the IdNode class; for example:
    • genJumpAndLink,
    • codeGen,
    • genAddr.

    We use "codeGen" for the second case (fetching the value and pushing it onto the stack) since that is what the codeGen methods of all ExpNodes must do.

    genJumpAndLink

    The genJumpAndLink method will simply generate a jump-and-link instruction (with opcode jal) using the appropriate label as the target of the jump. If the called function is "main", the label is just "main". For all other functions, the label is of the form:
      _<functionName>

    codeGen

    The codeGen method must copy the value of the global / local variable into a register (e.g., T0 for an int variable and F0 for a double), then push the value onto the stack. Different code will be generated for a global and a local variable. Below are four examples.
    lw $t0 _g // load the value of int global g into T0
    lw $t0 0($fp) // load the value of the int local stored at offset 0 into T0
    l.d $f0 _d // load the value of dbl global d into F0
    l.d $f0 -4($fp) // load the value of the dbl local stored at offset -4 into F0

    Note that this means there must be a way to tell whether an IdNode represents a global or a local variable. There are several possible ways to accomplish this:

    • The symbol-table entry includes a "kind" field (which distinguishes between globals and locals).
    • Different sub-classes of the Sym class are used for globals and for local variables (so you can tell whether you have a global or a local using "instanceof", or using an IsGlobal method that you write for each sub-class of Sym).
    • The symbol-table entry includes an "offset" field; for local variables, that field has a value less than or equal to zero, while for globals, the value is greater than zero.

    genAddr

    The genAddr method must load the address of the identifier into a register (e.g., T0), then push it onto the stack. The code is very similar to the code to load the value of the identifier; we just use the la (load address) opcode instead of the lw (load word) or l.d (load double) opcode.

    Here is the code you need to generate to load the address of a global variable g into register T0 (this works whether g is int or double, since an address always takes 4 bytes):

      la $t0, _g
    and here is the code for a local, stored at offset -8:
      la $t0, -8($fp)

    Assignment Statement

    The AST for an assignment statement looks like this:
    		 ----------------
    		| AssignStmtNode |
    		 ----------------
    		         |
    		    ------------
    	 	   | AssignNode |
    		    ------------
    		 /               \
               --------          ---------
              | IdNode |        | ExpNode |
               --------          ---------	
    
    The AssignStmtNode's codeGen method can call the AssignNode's method to do the assignment, but be careful: an AssignNode is a subclass of ExpNode, so like all nodes that represent expressions, it must leave the value of the expression on the stack. Therefore, the AssignStmtNode must generate code to pop (and ignore) that value.

    Code Generation for Expressions

    The codeGen method for the subclasses of ExpNode must all generate code that evaluates the expression and leaves the value on top of the stack. We have already talked about how to do this for IdNodes; in the subsections below we discuss code generation for other kinds of expressions.

    Assign

    The codeGen method for an assignment expression must generate code to:

    1. Evaluate the right-hand-side expression, leaving the value on the stack.
    2. Push the address of the left-hand-side Id onto the stack.
    3. Store the value into the address.
    4. Leave a copy of the value on the stack.
    Most of the work is done by calling the AssignNode's children's methods: the codeGen method of the right-hand-side ExpNode, and the genAddr method of the left-hand-side IdNode. How to accomplish the rest of the assignment is left to you to figure out (it isn't too difficult).

    Literals

    The codeGen methods for IntLitNodes must simply generate code to push the literal value onto the stack. The generated code will look like this:

               li    $t0, <value>        # load value into T0
               sw    $t0, ($sp)          # push onto stack
               subu  $sp, $sp, 4
    

    The code for a DblLitNode is similar, except that the value must be loaded into a double register using the appropriate opcode, and the "push" code is different for a double value. For example:

               li.d  $f0, <value>    # load value into F0
               s.d   $f0, -4($sp)    # push onto stack
               subu  $sp, $sp, 8
    

    For a StringLitNode, the string literal itself must be stored in the static data area, and its address must be pushed. The code to store a string literal in the static data area looks like this:

               .data
      <label>: .asciiz <string value>
    
    Note:
    1. <label> needs to be a new label; e.g., returned by a call to nextLabel.
    2. The <string value> needs to be a string in quotes. You should be storing string literals that way, so just write out the value of the string literal, quotes and all.
    To avoid storing the same string literal value more than once, keep a hashtable in which the keys are the string literals, and the associated information is the static-data-area label. When you process a string literal, look it up in the hashtable: if it is there, use its associated label; otherwise, generate code to store it in the static data area, and add it to the hashtable.

    The code you need to generate to push the address of a string literal onto the stack looks like this:

               .text
               la   $t0, <label>       # load addr into $t0
               sw   $t0, ($sp)         # push onto stack
               subu $sp, $sp, 4
    

    Function Call

    The AST for a function call looks like:

    		 --------------
    		| CallExpNode |
    		 --------------
    		 /             \
                --------         -------------
               | IdNode |       | ExpListNode |
                --------         -------------
    
    We need to generate code to:
    1. Evaluate each actual parameter, pushing the values onto the stack;
    2. Jump and link (jump to the called function, leaving the return address in the RA register).
    3. Push the returned value (which will be in register V0 or F0 depending on the return type) onto the stack.
    Since the codeGen method for an expression generates code to evaluate the expression, leaving the value on the stack, all we need to do for step 1 is call the codeGen method of the ExpListNode (which will in turn call the codeGen methods of each ExpNode in the list). For step 2, we just call the genJumpAndLink method of the IdNode. For step 3, we just call genPush(V0, 4) or genPush(F0, 8).

    Note that there is also a call statement:

    		 --------------
    		| CallStmtNode |
    		 --------------
                           |
    		 --------------
    		| CallExpNode |
    		 --------------
    		 /             \
                --------         -------------
               | IdNode |       | ExpListNode |
                --------         -------------
    
    In this case, the called function may not actually return a value (i.e., may have return type void). It doesn't hurt to have the CallExpNode's codeGen method push the value in V0 (or F0) after the call (it will just be pushing some random garbage), but it is important for the CallStmtNode's codeGen method to pop that value.

    Non Short-Circuited Operators

    In general, the codeGen methods for the non short-circuited operators (PlusNode, MinusNode, ..., NotNode, LessNode, ..., EqualsNode, etc.) must all do the same basic sequence of tasks:

    1. Call each child's codeGen method to generate code that will evaluate the operand(s), leaving the value(s) on the stack.
    2. Generate code to pop the operand value(s) off the stack into register(s) (e.g., T0 and T1 for ints; F0 and F2 for doubles). Remember that if there are two operands, the right one will be on the top of the stack.
    3. Generate code to perform the operation (see Spim documentation for a list of opcodes).
    4. Generate code to push the result onto the stack.
    The relational and equality operators are more complicated if they are applied to double operands (because there are no opcodes that "directly" compute the desired result). However, implementing those cases is not required for the class project (unless you want extra credit), so we will not discuss the details here.

    To minimize the amount of code you must write, you might want to define subclasses of ExpNode that have very similar code-generation methods. For example, you could define a BinaryExpNode class whose codeGen method does all of the steps above, calling an opCode method (that would be implemented in each BinaryExpNode subclass) to get the appropriate opcode for use in step 3.

    Note that the comparison operators (described in Section 2.6 of the Spim Reference Manual) produce one when the result of the comparison is true. However, the not operator does a bitwise logical negation, so it will not work correctly if you use 0 and 1 to represent true and false. Thus, it is better to use the seq operator instead of the not operator.

    Example: Recall that the AST for an addition looks like this:

    		 ----------
    		| PlusNode |
    		 ----------
    		 /        \
              ---------      ---------
             | ExpNode |    | ExpNode |
              ---------      ---------
    
    Here is the codeGen method for the PlusNode, assuming that both operands are ints.
      public void codeGen() {
          // step 1: evaluate both operands
          myExp1.codeGen();
          myExp2.codeGen();
      
          // step 2: pop values in T0 and T1
          genPop(T1, 4);
          genPop(T0, 4);
          
          // step 3: do the addition (T0 = T0 + T1)
          generate("add", T0, T0, T1);
          
          // step 4: push result
          genPush(T0, 4)
      }
      

    To illustrate how code is generated for an expression involving several operators, consider generating code for the expression: b + c * d. Here is the AST for the expression:

                    PlusNode
                    /       \
    	     IdNode   TimesNode
                   b       /     \
                         IdNode  IdNode
                            c      d
    
    
    Below is the sequence of calls that would be made at compile time to generate code for this expression, and a description of what the generated code does.
            Sequence of calls          What the generated code does
    	-----------------	   ----------------------------
    
      +--- PlusNode.codeGen()
      |      IdNode.codeGen()  --------->       push b's value
      | +-   TimesNode.codeGen()
      | |      IdNode.codeGen() --------->      push c's value
      | |      IdNode.codeGen() --------->      push d's value
      | |
      | +------------------------------->   pop d's value into T1
      |                                     pop c's value into T0
      |                                     T0 = T0 * T1
      |                                     push T0's value
      |
      +---------------------------------> pop result of * into T1
                                          pop b's value into T0
                                          T0 = T0 + T1
                                          push T0's value
    

    Short-Circuited Operators

    The short-circuited operators are represented by AndNodes and OrNodes. "Short-circuiting" means that the right operand is evaluated only if necessary. For example, for the expression (j != 0) && (k/j > epsilon), the sub-expression (k/j > epsilon) is evaluated only if variable j is not zero. Therefore, the code generated for an AndNode must work as follows:

      evaluate the left operand
      if the value is true then
         evaluate the right operand;
         that value is the value of the whole expression
      else
         don't bother to evaluate the right operand
         the value of the whole expression is false
    
    Similarly, for an OrNode:
      evaluate the left operand
      if the value is false then
         evaluate the right operand;
         that value is the value of the whole expression
      else
        don't bother to evaluate the right operand
        the value of the whole expression is true
    
    This means that the code generated for the logical operators will need to involve some jumps depending on the values of some expressions.


    TEST YOURSELF #2

    Expand the outlines given above for the code generated for AndNodes and OrNodes, giving a lower-level picture of the generated code. Use the outline of the code generated for an if-then statement as a model of what to write.


    Control-Flow Code

    As mentioned above in the section on If-Then Statements, there are actually two different approaches to generating code for statements with conditions (like if statements and while loops):

    1. The numeric approach: This is the approach that we have assumed so far. Using the numeric approach, the codeGen method for a statement with a condition generates code to evaluate the condition, leaving the value on the stack. That value is then popped, and a jump is executed if it has a particular value.
    2. The control-flow or jump-code approach: In this case, the code-generation method for the condition has two parameters, both of which are labels (i.e., Strings) named TrueLabel and FalseLabel. Instead of leaving the value of the condition on the stack, the code generated to evaluate the condition jumps to the TrueLabel if the condition is true, and jumps to the FalseLabel if the condition is false.
    We will assume that the new code-generation method for the condition is called genJumpCode. As we will see, the reason to prefer the control-flow approach over the numeric approach is that, using the control-flow approach, we will generate fewer instructions for statements that involve conditions (so the generated code will be smaller and will run faster).

    First, let's reconsider code generation for an if-then statement; this time we'll use the control-flow method instead of the numeric method for evaluating the condition part of the statement. Under this new assumption, the code generated by the IfStmtNode's codeGen method will have the following form:

               -- code to evaluate the condition, jumping to TrueLab if it is true,
    	      and to DoneLab if it is false --
      TrueLab:
               -- code for the statement list --
      DoneLab:
    
    The actual code written for the IfStmtNode's codeGen method will be:
      public void codeGen() {
         String trueLab = nextLabel();
         String doneLab = nextLabel();
         myExp.genJumpCode(trueLab, doneLab);
         genLabel(trueLab);
         myStmtList.codeGen();
         genLabel(doneLab);
      }
      

    To implement the control-flow approach, in addition to changing the codeGen method for the statements that have conditions, we must also write a new genJumpCode method for each AST node that could represent a condition. Below, we look at two representative cases, for IdNode and LessNode. We give the code generated for the (old) numeric approach, and for the (new) control-flow approach so that we can see which code is better (in terms of the number of instructions). (The code given below assumes that all operands are int, and it is not quite assembly code -- for example, for clarity, we use "push" instead of the actual two instructions that implement a push operation.)

      IdNode:  numeric                            control-flow
               -------                            ------------
    
                lw  $t0, <var's addr>             lw  $t0, <var's addr>
                push $t0                          beq $t0, FALSE, falseLab
                                                  b   trueLab
    
    Note that in both approaches to generating code for an IdNode, 3 instructions are generated (because "push" is actually 2 instructions). However, while all 3 of the numeric-approach instructions will always execute, the last instruction generated by the control-flow approach will execute only if the value of the Id is true.
      LessNode:  numeric                                     control-flow
                 -------                                     ------------
    
                 -- code to evaluate both operands           -- ditto
                 -- pop values into T1, T0                   -- ditto
                 slt  $t2, $t0, $t1                          blt $t0, $t1, trueLab
                 push $t2                                    b   falseLab
    
    The operands of a LessNode are integer expressions, not boolean expressions, so both approaches start by generating (the same) code to evaluate the operands and to pop their values into registers T0 and T1. After that, however, the numeric approach will generate 3 instructions (to set $t2 to TRUE or FALSE as appropriate, then to push that value onto the stack -- remember that "push" is really two instructions), while the control-flow code will generate only 2 instructions. Furthermore, as was the case for the IdNode, all three of the numeric-approach instructions will always execute, while the last instruction generated by the control-flow approach will only execute if the comparison evaluates to FALSE.

    Now let's consider how to write the new genJumpCode method for the short-circuited operators (AndNodes and OrNodes).

    Recall that the AST for an && expression looks like this:

    		 ---------
    		| AndNode |
    		 ---------
    		 /        \
               ---------       ---------
              | ExpNode |     | ExpNode |
               ---------       ---------	
    
    Here's how the genJumpCode method of the AndNode works:
    • Start by calling the genJumpCode method of the left child. That call will generate code to evaluate the left operand, jumping to the "TrueLabel" that we pass if its value is true, and jumping to the "FalseLabel" that we pass if its value is false.
    • So what labels should be passed?
      • If the left operand is false, then the value of the whole expression is false, so pass the given "FalseLabel" as the "FalseLabel" in the recursive call.
      • If the left operand is true, then we must evaluate the right operand, so pass a new label as the "TrueLabel" in the recursive call.
    • After the call to the left child's genJumpCode method, call genLabel to generate the new label.
    • Finally, call the genJumpCode method of the right child.
    • What labels should be passed as the arguments to the right child's genJumpCode method?
      • This expression will only be evaluated if the left operand evaluated to true; in that case, the value of this expression (the right operand) is the value of the whole expression. Therefore, for this recursive call, pass the original "TrueLabel" and "FalseLabel".

    The AndNode's genJumpCode method would be:

      public void genJumpCode(String trueLab, String falseLab) {
          String newLab = nextLabel();
          
          myExp1.genJumpCode(newLab, falseLab);
          genLabel(newLab);
          myExp2;genJumpCode(trueLab, falseLab);
      }
      
    Example: Consider the code that would be generated for the statement: if (a && b>0) { ... }, represented by the AST:
                          ------------
                         | IfStmtNode |
                          ------------
                       /       |        \
             ---------   --------------   -------------
            | AndNode | | DeclListNode | | StmtListNode|
             ---------   --------------   -------------
             /        \
        --------       ----------
       | IdNode |     | LessNode |
        --------       ----------    
                      /          \
                    ...         ...
      
    The IfStmtNode's codeGen method would create two labels, TrueLab and DoneLab, and would call the AndNode's genJumpCode method, passing those labels as the arguments. The AndNode's genJumpCode method would create one new label (NewLab) and then would call the genJumpCode method of its left child (the IdNode), passing NewLab and DoneLab as the arguments. It would then generate the NewLab label, and then would call its right child's genJumpCode method, passing TrueLab and DoneLab. The code generated for the whole condition would look like this:
               Generated Code                                  Generated By
      	 --------------					 ------------
      
                  -- code to load the value of a into T0          IdNode
                  jump to DoneLab if T0 == FALSE                  IdNode
                  jump to NewLab                                  IdNode
      NewLab:                                                     AndNode
                  -- code to push the value of b                  LessNode's child
                  -- code to push the iteral 0                    LessNode's child
                  pop into T1                                     LessNode
                  pop into T0                                     LessNode
                  jump to TrueLab if T0 < T1                      LessNode
                  jump to DoneLab                                 LessNode
      
    (Of course, the actual label would be something like L3, not NewLab.) After calling the AndNode's genJumpCode method, the IfStmtNode's codeGen method would call genLabel to print TrueLab, then would call its StmtList child's codeGen method to generate code for the list of statements. Finally, it would call genLabel to print DoneLab. So the code generated for this if statement would be like this:
                  -- code to load the value of a into T0
                  jump to DoneLab if T0 == FALSE
                  jump to NewLab
      NewLab:
                  -- code to push the value of b
                  -- code to push the iteral 0
                  pop into T1
                  pop into T0
                  jump to TrueLab if T0 < T1
                  jump to DoneLab
      TrueLab:
      	    -- code for the list of statements
      DoneLab:
      

    TEST YOURSELF #3

    Question 1: What is the form of the code generated by an OrNode's genJumpCode method?

    Question 2: What is the form of the code generated by a NotNode's genJumpCode method?


    How does the code generated for an AndNode using the control-flow method compare to the code generated using the numeric method? Here are outlines of the code generated in each case:

         Numeric Code                                  Control-Flow Code
         ------------                                  -----------------
      
         -- code to evaluate left                         -- code to evaluate left
         -- operand, leaving the                          -- operand, including jumps
         -- value on the stack                            -- to NewLab and FalseLab
         pop into T0                                  
         goto TrueLab if T0 == TRUE                 newLab:
         push FALSE
         goto DoneLab
      TrueLab:
         -- code to evaluate right                        -- code to evaluate right
         -- operand, leaving the                          -- operand, including
         -- value on the stack                            -- jumps to TrueLab and
                                                          -- FalseLab
      DoneLab:
      
    Note that the numeric code includes 6 instructions (shown in bold) in addition to the ones generated by the codeGen methods of the two children, while in the control-flow case, no instructions are generated by the AndNode itself (just a label); the instructions are all generated by the genJumpCode methods of its two children. Those children could represent names, boolean literals, comparisons or logical expressions. We have already seen that in the first three cases, better code is generated using the control-flow approach. Now we see that for logical expressions (at least for AndNodes) fewer instructions are generated using the control-flow approach than using the numeric approach.

    TEST YOURSELF #4

    Compare the code generated by the two approaches for an OrNode and for a NotNode.


    Finally, let's compare the code generated by the numeric and control-flow approaches for an if-then statement. Here are outlines of the two different versions of the code that would be generated:

         Numeric Code                                  Control-Flow Code
         ------------                                  -----------------
      
         -- code for condition, leaving                -- code for condition,
         -- value on the stack                         -- including jumps to
                                                       -- trueLab and falseLab
         pop into T0
         goto falseLab if T0 == FALSE               trueLab: 
         -- code for "then" stmts                      -- code for "then" stmts
         goto doneLab                                  goto doneLab
      falseLab:                                     falseLab:          
         -- code for "else" stmts                      -- code for "else" stmts
      doneLab:                                      doneLab:
      
    Note that much of the code is the same for the two methods; the code generated to evaluate the condition will be different, and the numeric method has three extra instructions: a pop, followed by a conditional goto (those instructions are shown in bold font). So as long as the code generated for the condition is no worse for the control-flow method than for the numeric method, the control-flow method is better (both in terms of the number of instructions generated, and in terms of the number of instructions that will be executed).

    But we have already looked at the code generated for all the different kinds of conditions, for each of the two approaches, and in fact the control-flow method was never worse, and was sometimes better. Thus, the control-flow method is the winner, in terms of generating less and more efficient code!