THE PROCESSOR: PIPELINING
Pipelining
Analogy
}
Pipelined laundry: overlapping
execution
◦
Parallelism improves
performance
n
Four loads:
n
Speedup
= 8/3.5 = 2.3
= 8/3.5 = 2.3
n
Non-stop:
n
Speedup
= 2n/0.5n + 1.5 ≈ 4
= number of stages
= 2n/0.5n + 1.5 ≈ 4
= number of stages
MIPS
Pipeline
}
Five stages, one step per stage
1.
IF: Instruction fetch from
memory
2.
ID: Instruction decode &
register read
3.
EX: Execute operation or
calculate address
4.
MEM: Access memory operand
5.
WB: Write result back to
register
The first Pipeline Register (delay module) shown by the red line, IF_ID ,separates the IF stage from the next stage (ID).
We are going to break two Signals (wires): the PC+4 and the Instruction signals. We need an input for each of these, and an output that will reflect the input values at the instant of time when the clock (Event) changes from positive to negative. The VHDL code for Entity "Ifid" is created in the text file "IFID.VHD" which contains:
Pipeline
Performance
}
Assume time for stages is
◦
100ps for register read or
write
◦
200ps for other stages
}
Compare pipelined datapath
with single-cycle datapath
}
Refer to slide 5, the example
does not reflect fourfold improvement for three instructions
◦
2400/1400 ≈
1.7
}
Add 1,000,000 instructions,
each add 200 ps to the total execution time,
◦
Total execution time =
1,000,000 x 200ps + 1400ps
=
200,001,400ps
◦
Nonpipelined total execution
time
= 1,000,000 x 800ps + 2400ps
= 800,002,400 ps
◦
Speedup = 800,002,400/200,001,400
- The Cortex-M3 processor has a three-stage pipeline. The pipeline stages are instruction fetch, instruction decode, and instruction execution (see Figure 6.1).
figure 6.1 |
Figure 6.1: The Three-Stage Pipeline in the Cortex-M3
- Some people might argue that there are four stages because of the pipeline behavior in the bus interface when it accesses memory, but this stage is outside the processor, so the processor itself still has only three stages.
- When running programs with mostly 16-bit instructions, you will find that the processor might not fetch instructions in every cycle.
- This is because the processor fetches up to two instructions (32-bit) in one go, so after one instruction is fetched, the next one is already inside the processor. In this case, the processor bus interface may try to fetch the instruction after the next or, if the buffer is full, the bus interface could be idle.
- Some of the instructions take multiple cycles to execute; in this case, the pipeline will be stalled.
- In executing a branch instruction, the pipeline will be flushed.
- The processor will have to fetch instructions from the branch destination to fill up the pipeline again.
- However, the Cortex-M3 processor supports a number of instructions in v7-M architecture, so some of the short-distance branches can be avoided by replacing them with conditional execution codes.[1]
- Due to the pipeline nature of the processor and to ensure that the program is compatible with Thumb codes, when the program counter is read during instruction execution, the read value will...
- Interested in this book and others like it? Try EngineeringPro™ from Books24x7®
- http://www.youtube.com/watch?v=DxOGkwFQ8EU
By Nurul Izzati Farhanah
No comments:
Post a Comment