Please register or login. There are 0 registered and 1185 anonymous users currently online. Current bandwidth usage: 326.30 kbit/s October 17 - 01:35am EDT 
Hardware Analysis
Forums Product Prices

  Latest Topics 

More >>


  Intel's Pentium 4, a closer look 
  May 17, 2001, 12:00pm EDT 

Branch Prediction

By: Sander Sassen

Unfortunately pipelining cannot be used indiscriminately. The biggest problem facing, for example, a 100-stage pipeline is cost and complexity. The die-size needed for such a CPU would be approximately 100 times larger than for a non-pipelined processor; each pipeline needs its own fetch-decode-execute-store loop, and that's too expensive. But there's more to it; often one instruction depends on the result of a previous instruction. In that case, the second instruction cannot execute without having the result of the first, and could end up waiting a few cycles for it to finish. That would waste clock cycles, and drive efficiency down and execution latency up, which is unacceptable.

One of the possible solutions is called Branch Prediction. Essentially, the CPU makes an educated guess as to what it expects the result will be, and proceeds under that assumption. Since most code uses repetitive loops, Branch Prediction isn't exceedingly difficult. In fact, today's Prediction units are able to predict correctly over 90% of the time.

Suppose a set of 20 instructions dependent on the result from a first instruction were predicted incorrectly. They must be repeated with the correct data. This is called Flushing the Pipeline. In effect, the processor must discard the contents of the pipeline, since it was calculated based on a false assumption. The clock cycles spent processing those instructions have been wasted.

Now, in a 2-stage pipeline, only a half of one other instruction was carried out incorrectly, and flushed, and one cycle was wasted. For a 10-stage pipeline, the processor has been proceeding under a false assumption, misprediction, for nine clock cycles, all of which have now been wasted. Likewise, with a 20-stage pipeline as implemented on the Pentium 4, a misprediction wastes 19 clockcycles.

To give an indication of the impact of such mispredictions, it has been estimated that mispredicting just less than 10% of branches would slow the performance of Intel's Pentium III by anywhere from 20-40%. Considering that only about 10-20% of the instructions are branched to begin with, that means mispredictions occur only about once in every 50-100 instructions (10% of the 10-20% of branched instructions), on average. Restated, if those one in fifty to one-hundred instructions were predicted correctly, the processor would perform 20-40% faster. So you can appreciate the importance of good prediction algorithms. So for a 100-stage pipeline the performance hit from relatively few branch mispredictions would become enormous.

1. Introduction
2. Clockspeed and Bandwidth
3. Pipelining and Performance
4. Pipelining and Performance Cont.
5. Branch Prediction
6. Branch Prediction Cont.
7. SSE2 and Misc. Features
8. Conclusion

Discuss This Article (2 Comments) - If you have any questions, comments or suggestions about the article and/or its contents please leave your comments here and we'll do our best to address any concerns.



  Related Articles 

A weekly newsletter featuring an editorial and a roundup of the latest articles, news and other interesting topics.

Please enter your email address below and click Subscribe.