L3. Different from L2: - an expression based language - function call convention is hidden - numbers are not encoded, i.e. calling (print 1) prints out "1\n". - no direct memory references (have to use aref, aset, etc) Like L2: - every intermediate result has a name ============================================================ The grammar. p ::= (e (label arglist e) ...) arglist ::= () | (var) | (var var) | (var var var) ;; note that functions have arguments now ;; but still at most three arguments e ::= (let ([var d]) e) | (if v e e) | d d ::= (biop v v) (pred v) (v) ;; fn call with 0 arg (v v) ;; fn call with 1 arg (v v v) ;; fn call with 2 args (v v v v) ;; fn call with 3 args (new-array v v) (new-tuple v ...) (aref v v) (aset v v v) (alen v) (print v) (make-closure label v) (closure-proc v) (closure-vars v) v v :: = var | label | num biop ::= + | - | * | < | <= | = pred ::= number? | a? Other non-terminals (e.g., label, var) are given in lecture02 and lecture04. ============================================================ Programs in this language make the order of evaluation explicit via lets. So instead of writing something like this: (+ (+ 1 2) (+ 3 4)) you have to write something like this: (let ([x (+ 1 2)]) (let ([y (+ 3 4)]) (+ x y))) showing that the (+ 1 2) happens first and then the (+ 3 4). This also means that every sub-expression has its own name. Here's our old friend fib: ((:fib 18) (:fib (x) (let ([xlt2 (< x 2)]) (if xlt2 1 (let ([x1 (- x 1)]) (let ([f1 (:fib x1)]) (let ([x2 (- x 2)]) (let ([f2 (:fib x2)]) (+ f1 f2))))))))) ============================================================ Semantics: Booleans are represented as numbers: everything that's not zero is true; zero is false. Primitives that produce booleans always produce them as either 1 or 0. So, (<= 1 2) is 1 (= 1 2) is 0 and (if 3 1 2) is 1. new-array: the first argument is the size of the array and the second argument is the initial value for the positions in the array. new-tuple: also creates an array. The size of the array is the number of arguments and each argument goes into the corresponding spot in the array. aref: first argument is the array and the second argument is the position in the array. It returns the element at the corresponding position. aset: the first argument is the array, the second is the position, and the third is a new value for that position in the array. aset always returns 0. alen: accepts a vector and returns its length a?: returns true when given an array (the result of new-tuple or new-array), a closure, or a procedure, false otherwise number?: returns true when given a number, false otherwise print: accepts any value and prints it out make-closure: just like new-tuple, but restricted to two arguments where the first must be a label. (closure-proc v): just like (aref v 0) (closure-vars v): just like (aref v 1) The closure operations have extra checking in the interpreter to make it easier to debug earlier transformations ============================================================ So, compilation: - linearizes the expression, - explicates the calling convention (including tail calls vs non-tail calls) - handles the encoding of pointer values & integer values Be careful: at any call to allocate, there must not be any unencoded integers or any pointers that point somewhere other than the beginning of a block that are in a variable. The GC traverses the stack (things that get spilled in the L2->L1 compiler) and looks at the registers; it expects those to all be valid values. That is, it treats all of them as either immediate integers or as pointers to live memory. Three cases for compiling an 'e': 1) the e is a let: (let ([x d]) e) -> compile the d, store the result in x, and continue with the body. an application expression here is a non-tail call. 2) the e is an if: (if v e1 e2) -> generate a test for the v that goes to either then-label or else-label. generate the then-label, generate the code for e1 generate the else-label generate the code for e2 Why don't we need a join here? The last thing inside an 'e' is always the result of our program, so if it is a call, we're fine, the result went away (a tail call), or if it isn't then we're going to insert a return. 3) the e is a d: -> if it is an application, make a tail call otherwise, generate the code for the d, store the result in eax, and return. Many cases for compiling a 'd'. When compiling a 'd', we always have a destination for it; from a let, the destination is the variable. From the 'd' at the end of the expression, the destination is eax, since that's the result of the function. Lets look at a couple. ------------------------------------------------------------ (let ([x (+ y z)]) ...) -> `(,x <- ,y) `(,x += ,z) `(,x -= 1) What if the 'y' or 'z' were constants? Do we have four cases here? Nah, we just encode any constants we see and let something else clean up. (let ([x (+ v1 v2)]) ...) -> `(,x <- ,(encode v1)) `(,x += ,(encode v2)) `(,x -= 1) where encode turns a number into the encoded version and leaves variables alone. Why is adding them together and then subtracing one the right thing? Well, if x is initialized with a number 2a+1, and then we increment that by 2b+1, we have 2a+2b+2 in x. The number we want is 2(a+b)+1, since that's the encoding of the sum. The difference between these: 1. So just subtract one. Note that if L1 signalled errors on overflow, this would not be correct, since 2a+2b+2 might overflow when 2(a+b)+1 would not. But since we have modular arithmetic, this equivalence holds. (This is a place where Racket, altho it uses the same encoding trick, has to have slower code.) Also note that, in the middle of this sequence of instructions, the GC invariant doesn't hold. But that's okay because we don't call allocate there and by the time we're done, it does hold again. ------------------------------------------------------------ (let ([x (* v1 v2)]) ...) -> In this case, we don't have some kind of a clever trick since the product (2a+1) * (2b+1) is not so useful when trying to compute 2(a*b)+1 So instead we just decode the numbers and re-encode them: `(,tmp <- ,(encode v1)) `(,tmp >>= 1) `(,x <- ,(encode v2)) `(,x >>= 1) `(,x *= ,tmp) `(,x *= 2) `(,x += 1) where 'tmp' is a new, fresh variable ------------------------------------------------------------ (let ([x (<= y z)] ...) -> `(,x <- ,y <= ,z) `(,x <<= 1) `(,x += 1) It is fine to compare the encoded values, since 2x+1 <= 2y+1 iff x <= y. But, don't forget to encode the result! Also note that boolean values are still represented as integers (zero is false, everything else is true). ------------------------------------------------------------ (let ([x (a? v1)]) ...) -> `(,x <- ,(encode v1)) `(,x &= 1) `(,x *= -2) `(,x += 3) ------------------------------------------------------------ (let ([x (alen v)]) ...) -> `(,x <- (mem ,v)) ;; v can't be a constant here or ;; else the L3 program doesn't work anyways. `(,x <<= 1) `(,x += 1) The size stored in the array is the decoded version of the size, so we need to encode it so it cooperates with the rest of the program. ------------------------------------------------------------ (let ([x (aset v1 v2 v3)]) ...) -> `(,x <- ,(encode v2)) `(,x >>= 1) `(,x *= 4) `(,x += ,v1) `((mem ,x 4) <- ,(encode v3)) `(,x <- 1) ;; put the final result for aset into x (always 0). What's wrong with that? No bounds checking! How do we do the bounds checking? Here we use the array-error L2 instruction: (eax <- (array-error s s)) It accepts an array and an (attempted) index, prints out an error message and terminates the program. Using that we can do the bounds checking: `(,x <- ,(encode v2)) `(,x >>= 1) `(,tmp <- (mem ,v1 0)) `(cjump ,x < ,tmp ,bounds-pass-label ,bounds-fail-label) bounds-fail-label `(eax <- (array-error ,v1 ,(encode v2))) bounds-pass-label `(,x *= 4) `(,x += ,v1) `((mem ,x 4) <- ,(encode v3)) `(,x <- 1) ;; put the final result for aset into x (always 0). Note that tmp, bounds-fail-error and bounds-pass-label all have to be freshly generated. Also note that this does not completely check the bounds, since the index may also be less than 0. ------------------------------------------------------------ To compile the closure primitives, treat: (make-closure a b) as (new-tuple a b) (closure-proc a) as (aref a 0) (closure-vars a) as (aref a 1) ------------------------------------------------------------ (let ([w (f x y z)]) -> `(ecx <- ,(encode x)) `(edx <- ,(encode y)) `(eax <- ,(encode z)) `(call ,f) ;; note that 'f' might be a variable ;; that refers to a label, but not a constant `(,w <- eax) Function calls are straightforward when it isn't a tail call. But what if this was a tail call? Tail calls are the ones at the bottom of "e"s, right? (If the call is in a let, there is more to do, namely the body of the let.) In that case, we can just do this: `(ecx <- ,(encode x)) `(edx <- ,(encode y)) `(eax <- ,(encode z)) `(tail-call ,f) Since it is a tail call, we let 'f' update eax and just let that sit there for this function too. ------------------------------------------------------------ Also note that we need to deal with compiling functions. These cases handle compiling the body but we need to do a little setup, namely moving the argument registers into the variables that name the function parameters. Eg, (:label (x y z) e) --> `(,x <- ecx) `(,y <- edx) `(,z <- eax) ... compilation of e goes here ... And you also need to put those instructions into an L2 function with the right label.