Let's now shift our attention to some performance metrics. We mentioned earlier that the Prescott core features larget L1 and L2 caches, but that those larger caches came at an expense. That expense is latency. We've used Cachemem to illustrate:
First consider the L1 region. Notice that for data sizes below 8kB (Northwood's L1 size), Northwood's cache latency is just two cycles, while Prescott's is bumped to three. If you recall, the quick 2-cycle L1 cache was a key feature of the P4 when it launched years ago. Intel has told us that the L1 latency had to be increased simply because a 2-cycle latency was not practical for a larger 16kb cache. This is a typical size-versus-speed tradeoff, as Intel evidently feels that the larger cache will result in better performance than a smaller, faster cache. This is nevertheless a large percentage increase in the L1 latency. Consider the table below.
Notice that for data sizes between 8 and 16kB, Prescott is much quicker, as Northwood will have moved into its L2 area, while Prescott can stay in the quicker L1 area.
Moving now to the L2 area, we see a similar result. Prescott's L2 latency is much higher than Northwood's; again, a necessary evil in order to accommodate a large 1MB L2 cache. The table below again summarizes, indicating that Prescott waits almost 3ns longer for data from its L2 cache when a cache hit occurs.
Again, the speed-size tradeoff hopes that, while Prescott waits 3ns longer in the best case, it also hits the best case scenario (a cache hit) more frequently, resulting in less trips main memory, which are extremely costly. The idea is that four waits of 8ns are better than three waits of 5ns, plus one wait of 70ns.
Effectively, what we'll see with Prescott is that data sizes between 8-16kB, and 512-1024kB will perform much better on the Prescott core, while data sizes between 0-8kB and 16-512kB should perform slightly better on the old Northwood core.