### **Pipelining** - · Reconsider the data path we just did - · Each instruction takes from 3 to 5 clock cycles - · However, there are parts of hardware that are idle many time - · We can reorganize the operation - Make each hardware block independent - 1. Instruction Fetch Unit - 2. Register Read Unit - 3. ALU Unit - 4. Data Memory Read/Write Unit - 5. Register Write Unit - Units in 3 and 5 cannot be independent, but operations can be - · Let each unit just do its required job for each instruction - If for some instruction, a unit need not do anything, it can simply perform a noop 1 ### **Pipelining** - · What makes it easy - all instructions are the same length - just a few instruction formats - memory operands appear only in loads and stores - · What makes it hard? - structural hazards: suppose we had only one memory - control hazards: need to worry about branch instructions - $\,-\,$ data hazards: an instruction depends on a previous instruction - · We'll study these issues using a simple pipeline - Other complication: - exception handling - $\,-\,$ trying to improve performance with out-of-order execution, etc. 3 ### Pipelined Data Path Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? ### **Pipeline Operation** - · In pipeline one operation begins in every cycle - · Also, one operation completes in each cycle - · Each instruction takes 5 clock cycles (k cycles in general) - · When a stage is not used, no control needs to be applied - In one clock cycle, several instructions are active - Different stages are executing different instructions - · How to generate control signals for them is an issue Program execution order offer in stop, 20(51) is \$10, \$1 **Graphically Representing Pipelines** Pipeline Control Output Description Desc ## Pipeline control We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back How would control be handled in an automobile plant? a fancy control center telling everyone what to do? should we use a finite state machine? 10 ### Solution: Software No-ops/Hardware Bubbles - · Have compiler guarantee no hazards - · Where do we insert the "no-ops" ? sub \$2, \$1, \$3 and \$12, \$2, \$5 or \$13, \$6, \$2 add \$14, \$2, \$2 Problem: this really slows us down! - Also, the program will always be slow even if a techniques like forwarding is employed afterwards in newer version - Hardware can detect dependencies and insert no-ops in hardware by not accepting a new instruction - This is a bubble in pipeline and waste one cycle at all stages - Need two or three bubbles between write and read of a register 14 # Hardware detection and no-op insertion is called stalling We stall the pipeline by keeping an instruction in the same stage Personal Time (in clock grides) Oct