The use of the 90nm transistor allows Intel to construct much larger (in terms of the number of transistors) processors, while keeping the physical size small. When processors are manufactured, the yield rates on those processors are directly related to how large, physically, the processors are. A processor that is twice the size of another is essentially twice as likely to contain manufacturing impurities, and therefore will be subject to much lower yield rates.
We’ve seen this on a simple basis with respect to Intel’s server products. Later versions of the Pentium III Xeon, for example, incorporated huge on-die caches that bumped the transistor count into the hundreds of millions, and resulted in a die size two to three times the size of a typical desktop processor at the time. These huge Xeons were difficult to manufacture, and came with a corresponding price premium.
The move to 90nm technology has allowed Intel to cram a comparatively huge amount of cache memory onto the Pentium 4 die. Prescott improves on the previous Northwood processor by boasting a huge 1MB L2 cache. Despite the larger cache, which helps to drive Prescott’s transistor count to over 125 million, the processor’s physical size remains manageable at only 112 square millimeters – roughly 50% smaller than Intel’s first Willamette Pentium 4 with its tiny 256kB L2 cache.
In addition, Intel has also taken the opportunity to increase the size of the Pentium 4’s L1 cache as well. Prescott’s L1 data cache is now doubled to 16kB, while the L1 instruction cache (or Execution Trace Cache) remains at 12k micro-ops. The Pentium 4 was initially designed with a small 8kB L1 data cache as a tradeoff in order to maximize the speed of the cache. Set-associativity of the L1 data cache has also increased from 4-way to 8-way.
Figs. 1 & 2 - Color-enhanced photos of Intel's Pentium 4 processor dies. On the left is the 130nm Northwood core; the 90nm Prescott core is on the right. Notice the larger L2 area on the Prescott die.
As you'll see later in the benchmarks, however, there are tradeoffs necessary in order to implement such a large cache.