If you think the advances made by Intel and AMD in the past years are impressive then I have news for you, graphic cards have far surpassed them in that respect. Although processor clockspeeds have climbed to touch upon 4GHz by now, the performance they offer has not kept pace. Graphic cards such as NVIDIA’s GeForce 6800 Ultra or ATI Radeon X800XT have easily doubled in performance in comparison to the previous generation in just eight months time. However I think graphic processor manufacturers such as NVIDIA and ATI are going to run into a wall rather sooner than later, for the very same reason Intel and AMD are now having trouble scaling their processors to higher clockspeeds and are moving to dual-core processors instead. Graphic processors are very similar to desktop processors in many respects as they both share the same technology.
One of the things you see with graphic processors is that the transistor count keeps increasing; graphic card processors already have a higher transistor count than desktop processors. These millions of transistors eat up valuable silicon real estate and are best kept small and closely packed together. This obviously means that to keep graphic cards affordable they are rapidly closing the gap in process technology used to create them compared to desktop processors. With that they inherit all of the problems modern process technology is faced with as well. For example thermal management has become a problem on graphics cards already, NVIDIA moved to a powerful dual-slot cooling solution with their NV30/35 architecture and now ATI debuts it as well with their X850 based on R480.
Power consumption also becomes a big issue with modern graphics cards, for example a system consisting of an nForce 4 SLI motherboard, Athlon 64 4000+ processor and two GeForce 6800 Ultras has a power drain of 400-watts. Not to be outdone by NVIDIA, ATI's X850 PE on a similar system manages to consume nearly 300-watts. From that it is obvious that graphic processors are quickly becoming the largest contributor to power drain and heat production in a PC. Many manufacturers keep a close eye on clockspeed and IPC and try to optimize accordingly, balancing performance and power consumption in the process. Just look at ATI, they initially had a low clockspeed, high IPC architecture with the R350, but moved away from that with the R360 and newer graphic processors. Whereas NVIDIA did exactly the opposite and moved back to a highly parallel architecture with lower clockspeed but high IPC.
The advantage of scaling the R350 architecture to higher clockspeeds by shrinking the die-size and optimizing the design accordingly is that it accomplishes two things; you have a higher performance part, which uses less die-size, so can be manufactured more economically, without the added cost of designing a whole new, from the ground up, architecture. Downside is that the number of transistors per square mm also increases, which are clocked higher than before, hence you create hotspots on the die in areas that are particularly active. But more importantly the heat output per mm is increasing exponentially as the number of transistors per mm is squared and the clockspeed is increased. The design of the R480 is there to counter just that, eliminate hotspots and have a cooler running core overall, by switching off sections of the chip which aren't used, when running 2D for example.
Complexity at the hardware level is actually a big concern for graphic card manufacturers; due to cost issues you'd like to use as little silicon as possible. This means that an 8-layer metal deposit design has preference over a 6-layer design as you can pack much more traces in the same space. This is actually closely correlated with the hotspots I talked about in the above paragraph. This could mean that it all works fine in simulation but the actual silicon does not perform as expected. Most complex designs have the ability to shut off certain parts of the chip just in case this happens, or have extra transistors as a backup on the die that can be used to reroute parts of the chip if some areas are problematic. The concept of simply cut and pasting SIMD units or ALUs to a graphics processor as if they're building blocks doesn't really apply here. It does on the design table for the logical layout of the chip, but once we're looking at the hardware level, at the silicon, we're talking about individual transistors that need to be routed and connected to each other and that's a whole different story.
With the upcoming WGF 2.0 standard which will be used for Windows Longhorn we’ll only see more complex graphic processors that will use an even larger number of transistors. It’ll be interesting to see what technology next graphic processors will use; clearly the same approach as with desktop processors only gets you sofar, as Intel and AMD have shown. Massive parallelism is definitely the way to go for graphics processors when scenes get more and more complex, this does mean heaps of die-size will be devoted to identical execution engines. Rest assured though that future graphic cards won’t get any less complex, nor run any cooler than today’s.
Discuss This Article (2 Comments) - If you have any questions, comments or suggestions about the article and/or its contents please leave your comments here and we'll do our best to address any concerns.