Please register or login. There are 0 registered and 1070 anonymous users currently online. Current bandwidth usage: 326.30 kbit/s December 15 - 03:07am EST 
Hardware Analysis
      
Forums Product Prices
  Contents 
 
 

  Latest Topics 
 

More >>
 

    
 
 

  The next Pentium 4 processor, Prescott arrives 
  Feb 02, 2004, 07:30am EST 
 

Performance - Cache Latency


By: Dan Mepham

Let's now shift our attention to some performance metrics. We mentioned earlier that the Prescott core features larget L1 and L2 caches, but that those larger caches came at an expense. That expense is latency. We've used Cachemem to illustrate:


First consider the L1 region. Notice that for data sizes below 8kB (Northwood's L1 size), Northwood's cache latency is just two cycles, while Prescott's is bumped to three. If you recall, the quick 2-cycle L1 cache was a key feature of the P4 when it launched years ago. Intel has told us that the L1 latency had to be increased simply because a 2-cycle latency was not practical for a larger 16kb cache. This is a typical size-versus-speed tradeoff, as Intel evidently feels that the larger cache will result in better performance than a smaller, faster cache. This is nevertheless a large percentage increase in the L1 latency. Consider the table below.


Notice that for data sizes between 8 and 16kB, Prescott is much quicker, as Northwood will have moved into its L2 area, while Prescott can stay in the quicker L1 area.

Moving now to the L2 area, we see a similar result. Prescott's L2 latency is much higher than Northwood's; again, a necessary evil in order to accommodate a large 1MB L2 cache. The table below again summarizes, indicating that Prescott waits almost 3ns longer for data from its L2 cache when a cache hit occurs.


Again, the speed-size tradeoff hopes that, while Prescott waits 3ns longer in the best case, it also hits the best case scenario (a cache hit) more frequently, resulting in less trips main memory, which are extremely costly. The idea is that four waits of 8ns are better than three waits of 5ns, plus one wait of 70ns.

Effectively, what we'll see with Prescott is that data sizes between 8-16kB, and 512-1024kB will perform much better on the Prescott core, while data sizes between 0-8kB and 16-512kB should perform slightly better on the old Northwood core.



1. Introduction
2. Caching In
3. Branching Off
4. Round 3, SSE Gets a Refresh
5. Intel's 2004 Roadmap, Sock-et to Me!
6. Incremental Improvements
7. Something Rotten in Santa Clara
8. Performance - Cache Latency
9. Performance - Cache Bandwidth
10. Performance - Cache Throughput
11. Performance - ScienceMark 2.0
12. Performance - Sandra & PCMark
13. Performance - PCMark & AquaMark
14. Performance - SPECviewperf
15. Summary
16. Appendix A - Benchmark Configuration

Discuss This Article (16 Comments) - If you have any questions, comments or suggestions about the article and/or its contents please leave your comments here and we'll do our best to address any concerns.


Rate This Product - If you have first hand experience with this product and would like to share your experience with others please leave your comments here.

 

    
 
 

  Related Articles 
 
 

  Newsletter 
 
A weekly newsletter featuring an editorial and a roundup of the latest articles, news and other interesting topics.

Please enter your email address below and click Subscribe.