# 361 Computer Architecture Lecture 12: Designing a Pipeline Processor

pipeline.1

## Overview of a Multiple Cycle Implementation

- ° The root of the single cycle processor's problems:
  - The cycle time has to be long enough for the slowest instruction
- ° Solution:
  - · Break the instruction into smaller steps
  - · Execute each step (instead of the entire instruction) in one cycle
    - Cycle time: time it takes to execute the longest step
    - Keep all the steps to have similar length
  - ullet This is the essence of the multiple cycle processor
- $^{\circ}\,$  The advantages of the multiple cycle processor:
  - Cycle time is much shorter
  - · Different instructions take different number of cycles to complete
    - Load takes five cycles
    - Jump only takes three cycles
  - · Allows a functional unit to be used more than once per instruction



# **Outline of Today's Lecture**

- ° Recap and Introduction
- ° Introduction to the Concept of Pipelined Processor
- ° Pipelined Datapath and Pipelined Control
- ° How to Avoid Race Condition in a Pipeline Design?
- ° Pipeline Example: Instructions Interaction
- ° Summary









# Why Pipeline?

- ° Suppose we execute 100 instructions
- ° Single Cycle Machine
  - 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
- ° Multicycle Machine
  - 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns
- ° Ideal pipelined machine
  - 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns



## The Five Stages of Load



- ° Ifetch: Instruction Fetch
  - Fetch the instruction from the Instruction Memory
- ° Reg/Dec: Registers Fetch and Instruction Decode
- ° Exec: Calculate the memory address
- Mem: Read the data from the Data Memory
- ° Wr: Write the data back to the register file

pipeline.11

#### **Pipelining the Load Instruction**



- $^{\circ}\,$  The five independent functional units in the pipeline datapath are:
  - · Instruction Memory for the Ifetch stage
  - Register File's Read ports (bus A and busB) for the Reg/Dec stage
  - ALU for the Exec stage
  - · Data Memory for the Mem stage
  - · Register File's Write port (bus W) for the Wr stage
- ° One instruction enters the pipeline every cycle
  - ${\boldsymbol{\cdot}}$  One instruction comes out of the pipeline (complete) every cycle
  - The "Effective" Cycles per Instruction (CPI) is 1







## Can pipelining get us into trouble?

- ° Yes: Pipeline Hazards
  - structural hazards: attempt to use the same resource two different ways at the same time
    - E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)
  - · data hazards: attempt to use item before it is ready
    - E.g., one sock of pair in dryer and one in washer; can't fold until get sock from washer through dryer
    - instruction depends on result of prior instruction still in the pipeline
  - control hazards: attempt to make a decision before condition is evaulated
    - E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in
    - branch instructions
- ° Can always resolve hazards by waiting
  - · pipeline control must detect the hazard
- take action (or delay action) to resolve hazards



# Structural Hazards limit performance

- $^{\circ}\,$  Example: if 1.3 memory accesses per instruction and only one memory access per cycle then
  - average CPI 1.3
  - otherwise resource is more than 100% utilized
  - · More on Hazards later



- ° We have a problem:
  - Two instructions try to write to the register file at the same time!

pipeline.19

## The Four Stages of R-type



- ° Ifetch: Instruction Fetch
  - Fetch the instruction from the Instruction Memory
- ° Reg/Dec: Registers Fetch and Instruction Decode
- ° Exec: ALU operates on the two register operands
- ° Wr: Write the ALU output back to the register file

#### **Important Observation**

- ° Each functional unit can only be used once per instruction
- $^{\circ}\,$  Each functional unit must be used at the same stage for all instructions:
  - Load uses Register File's Write Port during its 5th stage

· R-type uses Register File's Write Port during its 4th stage

pipeline.21





- ° Insert a "bubble" into the pipeline to prevent 2 writes at the same cycle
  - The control logic can be complex
- $^{\circ}\,$  No instruction is completed during Cycle 5:
  - The "Effective" CPI for load is >1





# The Four Stages of Beq



- ° Ifetch: Instruction Fetch
  - Fetch the instruction from the Instruction Memory
- ° Reg/Dec: Registers Fetch and Instruction Decode
- ° Exec: ALU compares the two register operands
  - · Adder calculates the branch target address
- $^{\circ}\,$  Mem: If the registers we compared in the Exec stage are the same,
  - Write the branch target address into the PC



















## **Pipeline Control**

- ° The Main Control generates the control signals during Reg/Dec
  - · Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later
  - · Control signals for Mem (MemWr Branch) are used 2 cycles later
  - · Control signals for Wr (MemtoReg MemWr) are used 3 cycles later





 $^{\circ}$  We have a race condition between Address and Write Enable!







° End of Cycle 7: Store's Wr, Beq's Mem













- ° Although Load is fetched during Cycle 1:
  - The data is NOT written into the Reg File until the end of Cycle 5
  - · We cannot read this value from the Reg File until Cycle 6
  - · 3-instruction delay before the load take effect
- $^{\circ}\,$  This is referred to as Data Hazard:
  - · Clever design techniques can reduce the delay to ONE instruction

pipeline.45

#### Summary

- ° Disadvantages of the Single Cycle Processor
  - · Long cycle time
  - · Cycle time is too long for all instructions except the Load
- ° Multiple Clock Cycle Processor:
  - · Divide the instructions into smaller steps
  - · Execute each step (instead of the entire instruction) in one cycle
- ° Pipeline Processor:
  - · Natural enhancement of the multiple clock cycle processor
  - · Each functional unit can only be used once per instruction
  - If a instruction is going to use a functional unit:
    - it must use it at the same stage as all other instructions
  - · Pipeline Control:
    - Each stage's control signal depends ONLY on the instruction that is currently in that stage