To further illustrate the effects of the cache and pipeline changes, we turned to ScienceMark 2.0's matrix multiplication tests.
Here we see an interesting result. When matricies of single-precision numbers are used, the Northwood core is actually faster, performing about 4% more floating point operations per cycle than Prescott (as this is measured in FLOPs/cycle
, clock speed correction is not necessary). The longer pipeline slows Prescott enough here to put it behind a Northwood processor at the same clock speed. Conversely, though, when we move to double-precision matricies, Prescott steps up and takes the lead. We're likely seeing here that the larger, double-precision numbers overflow Northwood's smaller cache, giving Prescott the edge. This is yet another beautiful example of the 'faster in some cases, slower in others' behavior of Prescott.
Again, ScienceMark 2.0's core tests show a similar result. Here the deeper pipeline hurts Prescott, as the slower-speed Northwood is able to defeat it in two of the three tests.