Department of Computer Science and Engineering
CS 784 Spring 2012                                                                                                                                                                          Prasad
Assignment 3 (Due: May 30) (15 pts)

Compiling and Evaluating Simple Expressions

        The primary goal of this assignment is to gain familiarity with language processing concepts and techniques using the infrastructure developed for the EOPL text.   
        An EBNF (Extended Backus-Naur Form) grammar for arithmetic expressions containing variables ( i, j, k) and  binary operators (+, *)  is given below.

     <expr>   ->  <term>   { + <term> }

    <term>   -> <factor>  { * <factor> }

    <factor> ->  <var>  |  ( <expr> )

    <var>  ->  i | j | k  

{<expr>, <term>, <factor>, <var>} are the non-terminals. {i,j,k,+,*,(,)} are the terminals. <expr> is the start symbol. The meta-symbol "->" separates the lhs and the rhs of a production rule, the meta-symbol "|" represents alternatives on the rhs, and the paired curly braces "{...}" stands for Kleene-star operator (that is, 0 or more iterations of the enclosed regular expression).

        Some example arithmetic expressions derivable in the grammar are "i",  "( k + k ) * i",  " (i * k) + (j + i)",  etc. (Note that the double quotes are not part of the expression and whitespace characters are insignificant.)

        Now consider the following template for a collection of Java programs. 

class Test {
    static int f(int i, int j, int k) {

       return  <expr>;
    public static void main(String[] args) {


      To obtain a valid Java program (that is, valid function body), replace  <expr> with an expression derived from the above grammar.   

      A Java compiler  takes the source code and generates Java bytecodes, which resemble assembly language instructions for an abstract stack machine. The translation of  the return-expression  "((i + k) * j)" into bytecodes can be given symbolically as:



        The formal arguments i, j, and k are encoded as variables in registers 0, 1, and 2. A typical instruction encodes information about the type and the location of operands, and the nature of operation. For example, iload_2 stands for pushing the value of the integer variable  i on top of the stack ; istore_2 stands for moving the value from the top of the stack to the location of the integer variable  i; iadd (resp. imul) stands for popping the top two integer values from the stack, adding (resp. multiplying) them, and pushing the result on top of the stack. (For other details about the JVM per se, refer to The Java Virtual Machine Specification,  specifically the chapter on the Instruction Set.)

On the right is an applet that performs the required translation of the expression into Java bytecodes for your reference. To generate more examples of such translations, instantiate the above template by replacing <expr> with a legal arithmetic expression in "Test.java", compile it using 
%javac Test.java",  and reverse engineer the class file using 
%javap -c Test", focussing on "Method int f(int, int, int)".  

Note also that the solution written in Java exemplifies object-oriented style of programming, while the Scheme solution to be developed below exemplifies functional style of programming.

PART I: Write a function run in Scheme that takes an expression string derivable from the above grammar and outputs the value of the expression, assuming that i, j, and k are 2, 4 , and 8 respectively. Use the SLLGEN scanner and parser generator to convert the expression string into an abstract syntax tree prior to evaluation. (Observe that the value reflects the precedence of  "*" over "+".)

>(run "((i * i) + j * (k) + j)") 


PART II: Now write a function compile in Scheme that takes an expression string derivable from the above grammar and outputs a list representation of the bytecode fragment corresponding to the expression as explained above. Reuse the code for creating the abstract syntax tree from above for this part. (Observe that the generated code reflects left associativity of "+" and "*".)

>(compile "((i * i) + j * (k) + j)") 

(iload_0 iload_0 imul iload_1 iload_2 imul iadd iload_1 iadd)

PART III: Now write a function interpret in Scheme that takes the output of compile and returns the same result as run when given the same expression strings. That is, it should satisfy the identity:  

                    (run  "EXPR") = (interpret (compile "EXPR"))

(Observe that the simulation of runtime stack object requires Scheme's set! and let constructs.)

>(interpret (compile "((i * i) + j * (k) + j)")) 


What to hand in?    
        Submit your well-documented solution in a file named
codegen.scm by running the following turn-in command on unixapps1.wright.edu