Wright State University
Department of Computer Science and Engineering
CS 480/680 Comparative Languages

     Summer 2012                                Assignment  2                                   Prasad



Compiling and Evaluating Simple Expressions  (15 pts)   (Due:  August 8)

          
        An EBNF (Extended Backus-Naur Form) grammar for arithmetic expressions containing variables ( a, i) and  binary operators (+, *)  is given below.

     <expr>   ->  <term>   { + <term> }

    <term>   -> <factor>  { * <factor> }

    <factor> ->  <var>  |  ( <expr> )

    <var>  ->  a | i   
 

{<expr>, <term>, <factor>, <var>} are the non-terminals. {a,i,+,*,(,)} are the terminals. <expr> is the start symbol. The meta-symbol "->" separates the lhs and the rhs of a production rule, the meta-symbol "|" represents alternatives on the rhs, and the paired curly braces "{...}" stands for Kleene-star operator (that is, 0 or more iterations of the enclosed regular expression).

        Some example arithmetic expressions derivable in the grammar are "i",  "( a + a ) * i",  " (i * a) + (a)",  etc. Some illegal expressions are "(b)", "6",  etc. (Note that the double quotes are not part of the expression.)


        Now consider the following template for a collection of Java programs. 

class Test {
    static double f(int i, double a) {

       return  <expr>;
    }
    public static void main(String[] args) {

       System.out.println( f(2,1.0));
    }
}

      To obtain a valid Java program (that is, valid function body), replace  <expr> with an expression derived from the above grammar.   


      A Java compiler  takes the source code and generates Java bytecodes, which resemble assembly language instructions for a stack machine. The translation of  the return-expression  "(i + a * i)" into bytecodes can be given symbolically as:

        iload_0
        i2d
        dload_1

        iload_0
        i2d

        dmul
       
dadd

        The formal arguments i and a are encoded as variables in registers 0, and 1. (The double value requires two registers, that is, registers 1 and 2 are needed for storing a.) A typical instruction encodes information about the type and the location of the operands, and the nature of the operation. For example, iload_0 (dload_1) stands for pushing the value of the integer (double) variable  i (a) on top of the stack ; istore_0 (dstore_1) stands for moving the value from the top of the stack to the location of the integer (double) variable  i (a) ; dadd (imul) stands for popping the top two double (integer) values from the stack, adding (multiplying) them, and pushing the result on top of the stack; and  i2d(d2i) stands for coercing an integer (a double) value to a double (an integer) value. (For other details about the JVM, refer to The Java Virtual Machine Specification,  specifically the chapter on the Instruction Set.)

On the right is an applet that performs the required translation of the expression into Java bytecodes, for your reference. (If you have difficulty viewing Java applet in Internet Explorer, try Firefox.) Ideally, the translation of "i" will be "iload_0" followed by "i2d" but for this assignment the final coercion code is optional. To generate more examples of such translation, instantiate the above template by replacing <expr> with a legal arithmetic expression in "Test.java", compile it using 
"
%javac Test.java",  and reverse engineer the class file using 
"
%javap -c Test", focussing on "Method static double f(int, double)". 
 


PART I: Write a Java program Exprc.java that (1) reads in a sequence of expressions, one expression per line, from a file called expr.dat, (2) determines, for each legal expression, its equivalent bytecodes, and (3) outputs  this compiled form, for each expression on line <lineno> of the file expr.dat, in the output file called <lineno.jbc>, in one bytecode instruction per line format.  Note that, you are expected to detect errors in expression, if any. 

Asg2.ppt illustrates basics of code generation and ExprcEg.java gives an incomplete program that you need to understand and then modify to get a working solution. It already provides code illustrating file I/O, scanning, and abstract syntax tree construction. (Specifically, it uses java.io.StreamTokenizer for scanning. Feel free to change it, if necessary) 

Determine the associativity of "+" and "*" from the code generated by the "compiler" applet, and document it in your code.

Even though the file name expr.dat has been fixed for uniformity, for generality, make this file name optional command line argument defaulting to expr.dat when the input file name is not explicitly provided on the command line. 


PART II: Write a Java program Exprv.java that simulates a stack machine in order to evaluate the bytecode file output in Part I. In particular, the program should be capable of taking the initial values for the variables (that is,  i and a) as command line arguments, and output the value returned by the corresponding function call. For example, if the file "1.jbc" contains the following bytecodes

        iload_0
        i2d
        dload_1
        iload_0
        i2d
        dmul
        dadd

then the result of executing the command   

   %java Exprv 1.jbc  2  1.0

should be 4.0.


CS680 Students Only: Repeat Part I and Part II in either C++ or C# or Scala or Scheme  or Python naming the files turned-in appropriately: Exprc.* and Exprv.* respectively.


What to hand in?    
     
        Submit your solution files
Exprc.java and  Exprv.java  by running the following turn-in command on unixapps1.wright.edu

/common/public/tkprasad/cs480/turnin-pa2  Exprc.java Exprv.java
README.txt

CS680 students can modify the command as follows:

/common/public/tkprasad/cs480/turnin-pa2  Exprc.java Exprv.java Exprc.* Exprv.* README.txt

        Prior to submission, make sure that your code compiles and runs on using the following commands:


javac  Exprc.java 

java   Exprc   expr.dat   

javac  Exprv.java

java   Exprv   1.jbc   2   1.0