L3. Different from L2: - an expression based language - function call convention is hidden - numbers are not encoded, i.e. calling (print 1) prints out "1\n". - no direct memory references (have to use aref, aset, etc) Like L2: - every intermediate result has a name ============================================================ The grammar. p ::= (e (label (var ...) e) ...) ;; note that functions have arguments now(!) e ::= (let ([var d]) e) | (if v e e) | d d ::= (biop v v) (pred v) (v v ...) ;; fn call with explicit args (new-array v v) (new-tuple v ...) (aref v v) (aset v v v) (alen v) (print v) (make-closure label v) (closure-proc v) (closure-vars v) v v :: = var | label | num biop ::= + | - | * | < | <= | = pred ::= number? | a? Other non-terminals (e.g., label, var) are given in lecture02 and lecture04. ============================================================ Programs in this language make the order of evaluation explicit via lets. So instead of writing something like this: (+ (+ 1 2) (+ 3 4)) you have to write something like this: (let ([x (+ 1 2)]) (let ([y (+ 3 4)]) (+ x y))) showing that the (+ 1 2) happens first and then the (+ 3 4). This also means that every sub-expression has its own name. Here's our old friend fib: ((:fib 18) (:fib (x) (let ([xlt2 (< x 2)]) (if xlt2 1 (let ([x1 (- x 1)]) (let ([f1 (:fib x1)]) (let ([x2 (- x 2)]) (let ([f2 (:fib x2)]) (+ f1 f2))))))))) ============================================================ Semantics: Booleans are represented as numbers: everything that's not zero is true; zero is false. Primitives that produce booleans always produce them as either 1 or 0. So, (<= 1 2) is 1 (= 1 2) is 0 and (if 3 1 2) is 1. new-array: the first argument is the size of the array and the second argument is the initial value for the positions in the array. new-tuple: also creates an array. The size of the array is the number of arguments and each argument goes into the corresponding spot in the array. aref: first argument is the array and the second argument is the position in the array. It returns the element at the corresponding position. aset: the first argument is the array, the second is the position, and the third is a new value for that position in the array. aset always returns 0. alen: accepts a vector and returns its length a?: returns true when given an array (the result of new-tuple or new-array), a closure, or a procedure, false otherwise number?: returns true when given a number, false otherwise print: accepts any value and prints it out make-closure: just like new-tuple, but restricted to two arguments where the first must be a label. (closure-proc v): just like (aref v 0) (closure-vars v): just like (aref v 1) The closure operations have extra checking in the interpreter to make it easier to debug earlier transformations ============================================================ So, compilation: - linearizes the expression, - explicates the calling convention (including tail calls vs non-tail calls) - handles the encoding of pointer values & integer values Be careful: at any call to allocate, there must not be any unencoded integers or any pointers that point somewhere other than the beginning of a block that are in a variable. The GC traverses the stack (things that get spilled in the L2->L1 compiler) and looks at the registers; it expects those to all be valid values. That is, it treats all of them as either immediate integers or as pointers to live memory. Three cases for compiling an 'e': 1) the e is a let: (let ([x d]) e) -> compile the d, store the result in x, and continue with the body. an application expression here is a non-tail call. 2) the e is an if: (if v e1 e2) -> generate a test for the v that goes to either then-label or else-label. generate the then-label, generate the code for e1 generate the else-label generate the code for e2 Why don't we need a join here? The last thing inside an 'e' is always the result of our program, so if it is a call, we're fine, the result went away (a tail call), or if it isn't then we're going to insert a return. 3) the e is a d: -> if it is an application, make a tail call otherwise, generate the code for the d, store the result in eax, and return. Many cases for compiling a 'd'. When compiling a 'd', we always have a destination for it; from a let, the destination is the variable. From the 'd' at the end of the expression, the destination is eax, since that's the result of the function. Lets look at a couple. ------------------------------------------------------------ (let ([x (+ y z)]) ...) -> `(,x <- ,y) `(,x += ,z) `(,x -= 1) What if the 'y' or 'z' were constants? Do we have four cases here? Nah, we just encode any constants we see and let something else clean up. (let ([x (+ v1 v2)]) ...) -> `(,x <- ,(encode v1)) `(,x += ,(encode v2)) `(,x -= 1) where encode turns a number into the encoded version and leaves variables alone. Why is adding them together and then subtracing one the right thing? Well, if x is initialized with a number 2a+1, and then we increment that by 2b+1, we have 2a+2b+2 in x. The number we want is 2(a+b)+1, since that's the encoding of the sum. The difference between these: 1. So just subtract one. Note that if L1 signalled errors on overflow, this would not be correct, since 2a+2b+2 might overflow when 2(a+b)+1 would not. But since we have modular arithmetic, this equivalence holds. (This is a place where Racket, altho it uses the same encoding trick, has to have slower code.) Also note that, in the middle of this sequence of instructions, the GC invariant doesn't hold. But that's okay because we don't call allocate there and by the time we're done, it does hold again. ------------------------------------------------------------ (let ([x (* v1 v2)]) ...) -> In this case, we don't have some kind of a clever trick since the product (2a+1) * (2b+1) is not so useful when trying to compute 2(a*b)+1 So instead we just decode the numbers and re-encode them: `(,tmp <- ,(encode v1)) `(,tmp >>= 1) `(,x <- ,(encode v2)) `(,x >>= 1) `(,x *= ,tmp) `(,x *= 2) `(,x += 1) where 'tmp' is a new, fresh variable ------------------------------------------------------------ (let ([x (<= y z)] ...) -> `(,x <- ,y <= ,z) `(,x <<= 1) `(,x += 1) It is fine to compare the encoded values, since 2x+1 <= 2y+1 iff x <= y. But, don't forget to encode the result! Also note that boolean values are still represented as integers (zero is false, everything else is true). ------------------------------------------------------------ (let ([x (a? v1)]) ...) -> `(,x <- ,(encode v1)) `(,x &= 1) `(,x *= -2) `(,x += 3) ------------------------------------------------------------ (let ([x (alen v)]) ...) -> `(,x <- (mem ,v 0)) ;; v can't be a constant here or ;; else the L3 program doesn't work anyways. `(,x <<= 1) `(,x += 1) The size stored in the array is the decoded version of the size, so we need to encode it so it cooperates with the rest of the program. ------------------------------------------------------------ (let ([x (aset v1 v2 v3)]) ...) -> `(,x <- ,(encode v2)) `(,x >>= 1) `(,x *= 8) `(,x += ,v1) `((mem ,x 8) <- ,(encode v3)) `(,x <- 1) ;; put the final result for aset into x (always 0). What's wrong with that? No bounds checking! How do we do the bounds checking? Here we use the array-error L2 instruction sequence: (rdi <- arr) (rsi <- idx) (call array-error 2) It accepts an array and an (attempted) index, prints out an error message and terminates the program. Using that we can do the bounds checking: `(,x <- ,(encode v2)) `(,x >>= 1) `(,tmp <- (mem ,v1 0)) `(cjump ,x < ,tmp ,bounds-pass-label ,bounds-fail-label) bounds-fail-label `(rdi <- ,v1) `(rsi <- ,(encode v2)) `(call array-error 2) bounds-pass-label `(,x *= 8) `(,x += ,v1) `((mem ,x 8) <- ,(encode v3)) `(,x <- 1) ;; put the final result for aset into x (always 0). Note that tmp, bounds-fail-error and bounds-pass-label all have to be freshly generated. Also note that this does not completely check the bounds, since the index may also be less than 0. Your compiler must do both checks. ------------------------------------------------------------ To compile the closure primitives, treat: (make-closure a b) as (new-tuple a b) (closure-proc a) as (aref a 0) (closure-vars a) as (aref a 1) ------------------------------------------------------------ (let ([w (f x y z)]) -> (define return-label (fresh-label)) `((mem rsp -8) <- ,return-label) ;; Put return label on stack `(rdi <- ,(encode x)) `(rsi <- ,(encode y)) `(rdx <- ,(encode z)) `(call ,f 3) ;; note that 'f' might be a variable ;; that refers to a label, but not a constant return-label `(,w <- rax) The return label must be put on the stack so that the callee can pop it from the stack and jump to it. The same label must be the next instruction after the function call so the call returns correctly. Function calls must specify the arity of the function they call, which should match the number of arguments passed. Function calls with more than 6 arguments need to put argument 7 and up on the stack. (let ([res (f a1 a2 a3 a4 a5 a6 a7 a8)]) -> (define return-label (fresh-label)) `((mem rsp -8) <- ,return-label) ;; Put return label on stack `(rdi <- ,(encode a1)) `(rsi <- ,(encode a2)) `(rdx <- ,(encode a3)) `(rcx <- ,(encode a4)) `(r8 <- ,(encode a5)) `(r9 <- ,(encode a6)) `((mem rsp -16) <- ,(encode a7)) `((mem rsp -24) <- ,(encode a8)) `(call ,f 8) return-label `(,res <- rax) Function calls are straightforward when it isn't a tail call. But what if this was a tail call? Tail calls are the ones at the bottom of "e"s, right? (If the call is in a let, there is more to do, namely the body of the let.) In that case, we can just do this: `(rdi <- ,(encode x)) `(rsi <- ,(encode y)) `(rdx <- ,(encode z)) `(tail-call ,f 3) Since it is a tail call, we let 'f' update rax and just let that sit there for this function too. We also do not need to put a return address on the stack because the callee will use our return address (which is already on the stack). And when a function call is in tail position, but has more than 6 arguments, well we just generate the code to do the call and return (because tail-call only works for functions that don't have stack args). ------------------------------------------------------------ Also note that we need to deal with compiling functions. These cases handle compiling the body but we need to do a little setup, namely moving the argument registers into the variables that name the function parameters. Eg, (:label (x y z) e) --> `(,x <- rdi) `(,y <- rsi) `(,z <- rdx) ... compilation of e goes here ... If a function has 7 or more arguments, it must retrieve arguments from the stack like this: (:label (a1 a2 a3 a4 a5 a6 a7 a8) e) --> `(,a1 <- rdi) `(,a2 <- rsi) `(,a3 <- rdx) `(,a4 <- rdx) `(,a5 <- rdx) `(,a6 <- rdx) `(,a7 <- (stack-arg 8)) `(,a8 <- (stack-arg 0)) ... compilation of e goes here ... Since the stack args are put on the stack in order, we address them the same way. So if there are n stack arguments, we start from offset 8*(n-1) and subtract 8 until we hit 0 for the last argument.