ASSIGNMENT 2 -BENCHMARKING GRAPHICS HARDWARE
REAL TIME RENDERING - CS 446   David Luebke
Ben Miller and Kim Dylla


Part Ithe crossover point between geometry and rasterization

PART I:

Graphics performance is limited generally by two factors, geometry and rasterization. The smaller the triangle, the less rasterization work the system has to do, so its speed is limited rather by the speed at which the vertices can be processed. Things like decreasing the amount of vertices declared per triangle, such as using glElements, as the gfxbench program does, or a Triangle Fan instead of 3 glVertex calls per triangle can assist performance here. However, as the size of the triangles increases, as does the complexity of the shading, rasterization becomes the bottleneck in the rendering pipeline. Our objective in this part of the assignment was to explore the crossover point at which geometry ceases to be the limiting factor in the speed of rendering.
Attempting to collect data first on the generic graphics cards in Small Hall computer lab, we modified the gfxbench skeleton code to render 1000 50x50 meshes of triangles increasing in size from 1 pixel to 15, in increments of 0.25 pixels. Although this took an exorbitant amount of time, the progression of the triangle rate was very linear until we reached about 8.5 pixel triangles, at which the rate hovered around 0.53 Mtri/sec. This was definitely the limit of geometry-limited rendering. At this point, we decided to take the remainder of our data on an ATI Radeon9600 card. The crossover point for this card occurred at about 15-16 px triangles, at a rate of 3.9 Mtri/sec. At the point where fill rate became the limiting factor in speed, we tested lighting and texture mapping to see which would hinder the process the most. We found that for lit triangles, the data basically stayed the same. This is due in part to the efficiency of the Radeon as well as the simple nature of our lighting, requiring only an extra interpolation. To create textured triangles, we generated a simple checkerboard texture using OpenGL. Textured triangles actually improved performance slightly, probably requiring the efficient graphics card to do more of the rasterization calls. We used gl_REPLACE instead of gl_BLEND mode, so only a single lookup/calculation was required. Textured and lit triangles produced results inbetween the original test and the test of lit triangles. This is probably due to the fact that textures actually increased performance. The graphics card actually increases performance when it deals with the rasterization end of the calculations.


GRAPHS of RESULTING DATA




Part II: Moving pixel data

The second part of this experiment was to explore how fast data moves across the AGP bus. This is affected by format, or how many bits and channels are contained in an image, data alignment, or padding data and arrays holding RGB values, and similarly, stride. Loosely packed data is sure to affect cache performance.
Experimenting with how various OpenGL modes affect performance, we changed the various parameters in the pixel blitting operation DrawPixels to see which was the most efficient. On the ATI card, rendering a 64x64 checkerboard mesh 100 times in RGBA mode, the rate was 529.51 Mtri/sec. Using RGB mode was much more efficient, yielding a rate of 798.67. BGR was almost equally inefficient, with a performance of 549.83, for the same reasons. Not using the often wasted fourth bit increases cache performance due to better format/alignment/stride. STENCIL_INDEX had significantly low performance, probably because the stencil buffer is called even before the depth buffer in the rendering pipeline. Using a signed byte instead of an unsigned one as the parameter, did not really affect performance, however using signed/unsigned ints affected it significantly.

We then used the ReadPixels function to determine the impact of data type (RGB, RGBA, etc) on the rate color pixels are read from the framebuffer to main memory. The results were in many cases the same as with DrawPixels. LUMINANCE, LUMINANCE_ALPHA, STENCIL_INDEX, and DEPTH_COMPONENT had remarkably low performance due to the order in which their buffers are called, as well as format. These state variables have a lot of at this point irrelevant information stored within them, thus decreasing the amount of triangles per second.

Using glTexImage() and glTexSubImage() we experimented with moving data from host memory to texture memory. glTexSubImage2D() was almost three times more efficient that glTexImage2d(), with a rate of 949.96 vs.294.48Mtri/sec. These data points were taken under RGBA mode, and using signed/unsigned bytes did not make much difference. This makes complete sense, because glTexSubImage2D() only replaces part of the existing texture, instead of using the entire image. Thus, no memory accesses are wasted.

Through these tests, it can be show that efficient data packing is extremely important to performance.

 





Part III: Detailed examination of rasterization in  the graphics pipeline

As a detailed examination of the rasterization end of the graphics pipeline, we decided to explore the effect of differently shaped triangles on performance. We used skinny and fat triangles, along a 45 degree axis, as well as vertically and horizontally skinny and fat triangles of various sizes. Positioning of the triangles did not affect the rate, around 1928.26 for all cases. The skinny triangles rendered amazingly fast, being very small, at a whopping rate of 12, 439.02 Mtri/sec. The fatter triangles, taking up more of the screen, took almost six times more time to render, with only a slight increase with the vertically fat triangles to 3824.4. The more of the screen the triangle takes up, the more time it takes in the rasterization end of the pipeline. This is the same result be implied with our previous triangle benchmarking experiment, with the fill rate plateauing somewhere around triangles taking up 30 % of the screen. The maximum rate that we achieved out of the graphics card was rendering triangles that took up only 10% of the screen, at a rate of 36,449.7 Mtri/sec. The rasterization process on differently shaped triangles, when graphed, has a similar slight curve as does the geometry end of the pipeline.





SOURCE CODE (gfxbench.c)