  Intel's Pentium 4, a closer look 
  May 17, 2001, 12:00pm EDT 

Pipelining and Performance

By: Sander Sassen

A 1+ GHz CPU runs into its own set of problems, especially that the time available to execute an instruction is reduced to the point that execution times are too short to be feasible. The CPU needs time to execute the instruction, or, in case of a pipelined CPU, needs time to execute multiple instructions.

In essence a CPU is nothing more than an extremely fast calculator, capable of only simple arithmetic and simple logical decisions. For example, take the value of 'A', and add it to the value of 'B', or determine if 'A' is greater than 'B'. The processor must first know where the values are stored, and what specifically to do with the values (e.g., add, multiply). Further, once the instructions and data have been located, interpreted, and executed, the result must be stored in memory for later use. To process an instruction, the processor must:

  • Locate and retrieve the data from memory: Fetching
  • Interpret or translate the instruction from the software: Decoding
  • Perform the given instruction on the given data: Executing
  • Place the result back into a memory location: Store

Of course, the above is an extremely simplified version of the process. Suffice it to say that each time an instruction is to be performed, the processor must fetch the data, decode the instruction, execute the instruction, and store the result. All of which has to be performed in one clock cycle; the time required is known as the execution latency.

1. Introduction
2. Clockspeed and Bandwidth
3. Pipelining and Performance
4. Pipelining and Performance Cont.
5. Branch Prediction
6. Branch Prediction Cont.
7. SSE2 and Misc. Features
8. Conclusion

Discuss This Article (2 Comments)



