| Name      | ID  |
|-----------|-----|
| . (callie | 113 |

## Midterm Exam ECE 361 – Computer Architecture Instructor: Prof. Alok Choudhary November 17, 2005, 11:00 – 12:20 (1 hour and 20 minutes)

| Name: |  |  |  |
|-------|--|--|--|
|       |  |  |  |
| ID:   |  |  |  |

Note: There are four problems and one bonus problem. Q1 through Q4 are required problems and account for 100 points. Q5 is an extra-credit problem and accounts for 10 points. Please show all your steps in arriving at the answers in addition to the answers themselves.

**Q1:** General Concepts (25 Points)

**Q2:** Cache Organization (25 Points)

Q3: Data Hazards and Performance (25 Points)

Q4: Datapath for a Modified Load Instruction (25 Points)

**Q5: Bonus Problem (10 Points)** 

- Q1. (25 Points) For each of the following statements, answer True or False. Then give a concise (less then 4 sentences or numerical expressions) justification for your answer.
- A. (5 Points) Reducing the clock cycle time results in a reduction of the CPI for a pipelined processor.

B. (5 Points) An overall speedup by a factor of 2 can be achieved by improving 50% of the execution by a factor of 4 of a typical application.

C. (5 Points) Two different compilers can result in different observed CPIs for the same program on the same processor.

- D. (5 Points) Pipelining improves the throughout of a processor and not the execution time of an individual instruction as compared to a multi-cycle processor.
- E. (5 Points) A MIPS program's executable code occupies 4096 bytes of memory. The instruction count for the execution of this program can be determined by 4096/4 = 1024.

- Q2. (25 Points) Assume a 32-bit address space for a processor. Show the addressing schemes for caches of 16 Kbytes <u>and</u> 32 Kbytes for the following configurations. You must show the place and size for the cache index, cache tags and byte offsets.
- A. (5 Points) Block size = 64 bytes, direct mapped cache

B. (5 Points) Block size = 64 bytes, 4-way set-associative cache

C. (5 Points) Block size = 128 bytes, direct mapped cache

D. (5 Points) Block size = 128 bytes, fully set-associative cache

E. (5 Points) What are the benefits and disadvantages of increasing set associativity?

Name ID

Q3. (25 Points) Given the following code segment, assuming the standard 5-segment pipeline consisting of IF, ID, EX, MEM and WB stages:

A. (15 Points) Identify all the data dependencies in the code segment. Which dependencies are data hazards that can be resolved with forwarding? Which may need pipeline stalls?

B. (10 Points) Rewrite the code to <u>MAXIMIZE</u> performance on the pipelined datapath with forwarding and avoid stalls on a use following a load; reorder the instructions so that this sequence takes the LEAST clock cycles to execute while still obtaining the same result. That is, we want code reorganization so that the best performance is obtained without adding or changing any instructions.

Name ID

Q4. (25 Points) Recall the design of the single cycle processor from the class. Suppose we want to add an instruction to the instruction set to automatically increment the value of register Rs during a load instruction. The result of the operation is stored in Rt, the specified destination register.

This means that register Rt will be loaded with data from the memory location specified by Rs and then Rs is incremented by the value Imm. That is,

$$Rt \leftarrow M[Rs];$$
  $Rs \leftarrow Rs + Imm$ 

You can assume that the register file has the capabilities to <u>write to two registers simultaneously</u> whose values appear on buses BusW1 and BusW2, and correspondingly, there are two control signals to enable the writes, RegWr1 and RegWr2. Furthermore, if RegWr1 is asserted then BusW1 values are written into Rs. If RegWr2 is asserted, then register Rt or Rd is written with the values on BusW2 based on which register is selected as the destination.

Modify the datapath to permit the execution of the LD+ instruction. Describe all your changes and the control signals (and their values) needed to implement this instruction.

Explain your design below, and show the datapath connections and control signals (and their values) on the next page where schematics are provided.

Name ID

The following figure illustrates the datapath and control signals for the single cycle processor designed in the class for your reference.



Use the schematic below to describe your changes. Show your connections and the values of the control signals (you may use add, subtract, etc. to describe ALUctr values rather than using binary numbers).



| Name | ID |
|------|----|
|      |    |
|      |    |

Q5. (10 Points) Bonus problem.

A. (5 Points) Give a possible pair of 32-bit values  $\alpha$  and  $\beta$  such that the operation  $\alpha + \beta$  will overflow the carry look-ahead unit while having <u>all</u> propagate bits equal to "1" and a <u>single</u> generate bit equal to "1".

B. (5 Points) A single-cycle MIPS processor has a 10ns cycle time. You would like to improve its performance via pipelining. If the registers have a 1ns delay, what should be the minimum cycle time required if you use a 5-stage pipeline?