When NVIDIA announced its first DirectX 10 hardware, I could not have been more excited. I knew this new generation of graphics cards would be a huge leap in the development of the GPU. What makes the NVIDIA GeForce 8800 series special? First, we have to understand the architecture (and how a GPU works) before we can really talk about the benefits of the G80.
When word of the G80 and its basic specifications were leaked, tech sites lead the world to believe NVIDIA was going to turn its focus away from a unified architecture to a non-unified one. Let’s examine the difference between the two.
A non-unified architecture means each step of the process done separately. A traditional GPU is built on a stacked flow. It starts in the vertex engine, which breaks down the information into small units. Then, it’s sent to the triangle setup engine. Then, the data goes to the pixel pipes, which apply shading and texturing. Almost there. Next, the data goes to the raster operation partitions (ROPs), which cull the data, and anti-alias it. Finally, it goes to the frame buffer, which is when the image appears on your screen. That sounds incredibly complicated – and it is! The GPU is doing this millions of times in a split second.
|
That process layout was popular in DirectX 9, but programming has become even more complex. It was becoming pretty inefficient to handle the massive amount of data being processed. Engineers had to decide what was going to be more important in the future: vertex or pixel processing.
A unified architecture does away with separate pixel and texture pipelines in favor of a single pipeline that can be used for both. Pixel pipes have traditionally been converted into what the industry calls floating point units (FPUs). To make it more complicated, NVIDIA is instead calling them stream processing units (SPUs) – I’m going to opt for the latter since we’re discussing NVIDIA’s product.
SPUs have much simpler shader functionality than past technologies, but they are much more efficient. SPUs allow the system to work on both pixel and vertex processing, as well as tacking on new processes like geometry and physics. The G80 is equipped with 128 SPUs, which are then supported by 32 texture addressing units (TAUs, of course) which have access to 64 texture filtering units (TFUs) which in turn can deliver 64 pixels per clock. That may sound small but, when we are talking about texture filtering, that is a huge number. These original SPUs send the data to the ROP system and output the image to our screen. By having a SPU setup, you skip sending data to five or more cycles; it streams through one cycle in a much more comprehensive and efficient layout.
|
Now that we know how data is being processed, let’s look at how NVIDIA is making a better graphics card. In the past NVIDIA, had massive issues with anisotropic filtering (AF), so they decided to keep more performance in the past architecture rather than make the AF look better. Therefore, NVIDIA’s 7-series of graphic cards were known to have some of the worst AF in the industry. With the new architecture of the G80, NVIDIA went form worst to best. Since the TAUs each access two TFUs, NVIDIA created 64 pixel-per-clock processing, which allowed them to substantially enhance clarity.
Huge progress in the G80 comes from Multi Sampling (MSAA), Super Sampling (SSAA) and Transparency Adaptive (TAAA) Anti-Aliasing. NVIDIA was able to add two new simple modes in the 8x and 16x area: Coverage Sample Anti-Aliasing (CSAA). This produces anti-aliased images that rival the quality of 8x or 16x MSAA, with only a minimal performance hit over standard MSAA. It works through a new sample type:one that represents coverage. This differs from previous anti-aliasing techniques where coverage was always inherently tied to another sample type. CSAA optimizes this process by breaking apart coverage from color/z/stencil, thus reducing bandwidth and storage.
Another important improvement in the G80 is the addition of FP16 and FP32 render targets. Breaking down FP16 allows high dynamic range (HDR) and MSAA to run at the same time. This was not an option in NVIDIA’s 7 series. They also have FP32 which allows HDR and anti-aliasing to run together. This was a huge issue in Oblivion for systems using the NVIDIA 7 series.
NVIDIA’s CUDA (compute unified device architecture) is a technology I find intriguing, though it isn’t something the average Joe is going to benefit from. Since this is new to me as well, I am going to paraphrase what’s on the official CUDA site: It’s a new architecture that lets the GPU solve complex computational problems in applications. It gives applications access to the power of the GPU through a new programming interface. Providing “orders of magnitude” more performance and simplifying development by using the C language, it enables developers to create solutions for data-intensive problems. It also includes a low-level assembly language layer and driver interface.
So, why is CUDA important enough to talk about? Short-Media is a part of Stanford University’s Folding@Home project, which uses your idle CPU power to research protein folding, which we hope leads to cures for diseases. In recent months, the project has started to use GPUs because of their ability to calculate and crunch numbers is about three times faster than an average CPU. The main focus was on ATI’s GPUs, because their architecture was able to support it. Now, with CUDA being added to the NVIDIA architecture, the project can develop an application for the G80 series of GPUs.
The basics of NVIDIA’s G80 has taken a huge leap in performance and efficiencies in calculating data. With the introduction of the stream processing units (those SPUs) and the more comprehensive TAUs and TFUs, we will get more realistic images. While the more complex explanations are only fit for advanced engineers, you can walk away from this knowing that graphics are only going to get significantly better, and very soon.



RSS Feeds