Summer 2012 Assignment 2 Prasad
Compiling and Evaluating Simple Expressions
(15
pts) (Due: August 8)
An EBNF
(Extended Backus-Naur Form) grammar for arithmetic expressions containing
variables ( a,
i) and binary operators (+,
*) is given below.
<expr> -> <term> { + <term> }
<term> -> <factor> { * <factor> }
<factor> -> <var> | ( <expr> )
<var> ->
a | i
{<expr>, <term>, <factor>, <var>} are the non-terminals. {a,i,+,*,(,)} are the terminals. <expr> is the start symbol. The meta-symbol "->" separates the lhs and the rhs of a production rule, the meta-symbol "|" represents alternatives on the rhs, and the paired curly braces "{...}" stands for Kleene-star operator (that is, 0 or more iterations of the enclosed regular expression).
Some example arithmetic expressions derivable in the grammar are "i", "( a + a ) * i", " (i * a) + (a)", etc. Some illegal expressions are "(b)", "6", etc. (Note that the double quotes are not part of the expression.)
Now consider the following template for a collection of Java programs.
class Test {
static double f(int i, double a) {
return <expr>;
}
public static void main(String[] args) {
System.out.println( f(2,1.0));
}
}
To obtain a valid Java
program (that is, valid function body), replace <expr>
with an expression derived from the above grammar.
A Java compiler takes the source code and generates Java bytecodes, which resemble assembly language instructions for a stack machine. The translation of the return-expression "(i + a * i)" into bytecodes can be given symbolically as:
iload_0
i2d
dload_1
iload_0
i2d
dmul
dadd
The formal arguments i and a are encoded as variables in registers 0, and 1. (The double value requires two registers, that is, registers 1 and 2 are needed for storing a.) A typical instruction encodes information about the type and the location of the operands, and the nature of the operation. For example, iload_0 (dload_1) stands for pushing the value of the integer (double) variable i (a) on top of the stack ; istore_0 (dstore_1) stands for moving the value from the top of the stack to the location of the integer (double) variable i (a) ; dadd (imul) stands for popping the top two double (integer) values from the stack, adding (multiplying) them, and pushing the result on top of the stack; and i2d(d2i) stands for coercing an integer (a double) value to a double (an integer) value. (For other details about the JVM, refer to The Java Virtual Machine Specification, specifically the chapter on the Instruction Set.)
On the right is an applet that
performs the required translation of the expression into Java bytecodes, for
your reference. (If you have difficulty viewing Java applet in Internet
Explorer, try Firefox.) Ideally, the translation of "i"
will be "iload_0" followed by "i2d"
but for this assignment the final coercion code is optional. To generate more examples of such
translation, instantiate the above template by replacing <expr>
with a legal arithmetic expression in "Test.java",
compile it using
"%javac
Test.java", and reverse engineer
the class file using
"%javap -c
Test", focussing on "Method
static double f(int, double)".
PART I: Write a Java program Exprc.java that (1) reads in a sequence of expressions, one expression per line, from a file called expr.dat, (2) determines, for each legal expression, its equivalent bytecodes, and (3) outputs this compiled form, for each expression on line <lineno> of the file expr.dat, in the output file called <lineno.jbc>, in one bytecode instruction per line format. Note that, you are expected to detect errors in expression, if any.
Asg2.ppt illustrates basics of code generation and ExprcEg.java gives an incomplete program that you need to understand and then modify to get a working solution. It already provides code illustrating file I/O, scanning, and abstract syntax tree construction. (Specifically, it uses java.io.StreamTokenizer for scanning. Feel free to change it, if necessary)
Determine the associativity of "+" and "*" from the code
generated by the "compiler" applet, and document it in your code.
Even though the file name expr.dat has been fixed for uniformity, for generality, make this
file name optional command line argument defaulting to
expr.dat when the input file name is not explicitly
provided on the command line.
PART II: Write a Java program Exprv.java that simulates a stack machine in order to evaluate the bytecode file output in Part I. In particular, the program should be capable of taking the initial values for the variables (that is, i and a) as command line arguments, and output the value returned by the corresponding function call. For example, if the file "1.jbc" contains the following bytecodes
iload_0
i2d
dload_1
iload_0
i2d
dmul
dadd
then the result of executing the command
%java Exprv 1.jbc 2 1.0
should be 4.0.
CS680 Students Only: Repeat Part I and Part II in either C++ or C# or Scala or Scheme or Python naming the files turned-in appropriately: Exprc.* and Exprv.* respectively.
What to hand in?
Submit your solution files Exprc.java
and Exprv.java
by running the following turn-in command on
unixapps1.wright.edu.
/common/public/tkprasad/cs480/turnin-pa2 Exprc.java Exprv.java
README.txt
CS680 students can modify the command as follows:
/common/public/tkprasad/cs480/turnin-pa2 Exprc.java Exprv.java Exprc.* Exprv.* README.txt
Prior to submission, make sure that your code compiles and runs on using the following commands:
javac Exprc.java
java Exprc expr.dat
javac Exprv.java
java Exprv 1.jbc
2 1.0