Compiling and Evaluating Simple Expressions
The primary goal of this assignment is to gain familiarity with language
processing concepts and techniques using the infrastructure developed for the
EOPL text.
An EBNF
(Extended Backus-Naur Form) grammar for arithmetic expressions containing
variables ( i,
j,
k) and binary operators (+,
*) is given below.
<expr> -> <term> { + <term> }
<term> -> <factor> { * <factor> }
<factor> -> <var> | ( <expr> )
<var> ->
i |
j
| k
{<expr>, <term>, <factor>, <var>} are the non-terminals. {i,j,k,+,*,(,)} are the terminals. <expr> is the start symbol. The meta-symbol "->" separates the lhs and the rhs of a production rule, the meta-symbol "|" represents alternatives on the rhs, and the paired curly braces "{...}" stands for Kleene-star operator (that is, 0 or more iterations of the enclosed regular expression).
Some example arithmetic expressions derivable in the grammar are "i", "( k + k ) * i", " (i * k) + (j + i)", etc. (Note that the double quotes are not part of the expression and whitespace characters are insignificant.)
Now consider the following template for a collection of Java programs.
class Test {
static int f(int i, int j, int k) {
return <expr>;
}
public static void main(String[] args) {
System.out.println(f(2,4,8));
}
}
To obtain a valid Java
program (that is, valid function body), replace <expr>
with an expression derived from the above grammar.
A Java compiler takes the source code and generates Java bytecodes, which resemble assembly language instructions for an abstract stack machine. The translation of the return-expression "((i + k) * j)" into bytecodes can be given symbolically as:
iload_0
iload_2
iadd
iload_1
imul
The formal arguments i, j, and k are encoded as variables in registers 0, 1, and 2. A typical instruction encodes information about the type and the location of operands, and the nature of operation. For example, iload_2 stands for pushing the value of the integer variable i on top of the stack ; istore_2 stands for moving the value from the top of the stack to the location of the integer variable i; iadd (resp. imul) stands for popping the top two integer values from the stack, adding (resp. multiplying) them, and pushing the result on top of the stack. (For other details about the JVM per se, refer to The Java Virtual Machine Specification, specifically the chapter on the Instruction Set.)
On the right is an applet that
performs the required translation of the expression into Java bytecodes for
your reference. To generate more examples of such
translations, instantiate the above template by replacing <expr>
with a legal arithmetic expression in "Test.java",
compile it using
"%javac
Test.java", and reverse engineer
the class file using
"%javap -c
Test", focussing on "Method
int f(int, int,
int)".
Note also that the solution written in Java exemplifies object-oriented style of programming, while the Scheme solution to be developed below exemplifies functional style of programming.
PART I: Write a function run in Scheme that takes an expression string derivable from the above grammar and outputs the value of the expression, assuming that i, j, and k are 2, 4 , and 8 respectively. Use the SLLGEN scanner and parser generator to convert the expression string into an abstract syntax tree prior to evaluation. (Observe that the value reflects the precedence of "*" over "+".)
>(run "((i * i) + j * (k) + j)")
40
PART II: Now write a function compile in Scheme that takes an expression string derivable from the above grammar and outputs a list representation of the bytecode fragment corresponding to the expression as explained above. Reuse the code for creating the abstract syntax tree from above for this part. (Observe that the generated code reflects left associativity of "+" and "*".)
>(compile "((i * i) + j * (k) + j)")
(iload_0 iload_0 imul iload_1 iload_2 imul iadd iload_1 iadd)
PART III: Now write a function interpret in Scheme that takes the output of compile and returns the same result as run when given the same expression strings. That is, it should satisfy the identity:
(run "EXPR") = (interpret (compile "EXPR"))
(Observe that the simulation of runtime stack object requires Scheme's set! and let constructs.)
>(interpret (compile "((i * i) + j * (k) + j)"))
40
What to hand in?
Submit your well-documented solution
in a file named codegen.scm
by running the following turn-in command on
unixapps1.wright.edu.
%/common/public/tkprasad/cs784/turnin-pa3
codegen.scm