Triangle Processor

Michael Frank Deering: Hardware: Triangle Processor

Motivation

The whole point of real-time 3D shaded computer graphics is to render interesting scenes in real-time. For complex CAD data bases in the mid nineteen eighties the depth complexity (average number of layers of surfaces covering each pixel) was 5 to 8; for immersive scenes the complexity was 3 to 5. For megapixel and larger displays, this dictates minimum fill rates. Example 1 million pixels display, depth complexity of 5, frame update rate of 20 Hz: 5*20*1M = 100 million pixel updates per second. Existing shipping shaded graphics hardware (all based on the traditional read-modify-write Z-buffer) at that time was nearly two orders of magnitude slower.

Solution to Fill Bottleneck

While the depth complexity was high, the total number of triangles wasn’t: the average triangle was fairly large (50 to 200 pixels in size). Software renderers from the nineteen seventies, where their wasn’t enough room for keeping the entire frame buffer in memory, used scan-line rendering techniques where only the triangles active for the current scan line being rendered were kept on an active rendering list. I took this (fairly common software) idea and inverted it into hardware. Rather than rendering each triangle into a frame buffer, I passed the frame buffer, on scan line at a time, pass the triangles. A “triangle processor” was a piece of hardware that contained all the screen area coverage and interpolated parameter values (normal, color, material, etc) for a triangle. As sequential pixels from the current scan line can flowing through the triangle processor, the processor would determine if this particular pixel was within the screen boundaries of its current triangle. If it is, it would compare the interpolated z value of the triangle at that pixel with the z value of the pixel flowing through. If the triangle’s pixel value was closer, it would replace the incoming pixel’s values with the triangle’s interpolated pixel’s value. A serial pipeline of such triangle processors (several to a chip, multiple chips run together) would allow several thousand triangles to be active on each scan line. Because most triangles are only active one to a few dozen scan lines, a single triangle processor could have many different triangles (non overlapping in screen y) sequentially loaded into it. The system did require all the scene’s triangles to be sorted by their uppermost screen y coordinate, and cached until rendered. Because the video pixels flow through at video pixel rates, and dozens of triangles can overlap at the same pixel, the potential fill rate was very high.

Solution to Lighting and Shading Problem

In addition to providing very high fill rates, I wanted to provide much better lighting and shading quality than the simple per vertex lighting with bilinear interpolation available in the machines of the time. Thus the second custom chip, the Normal Vector Shader (NVS). The NVS would perform the complete multiple specular light source lighting computation for every pixel passed through it. In today’s terms, it was a programmable pixel shader, replacing the older programmable vertex shaders. The initial shader microcode supported 5 simultaneous directional specular light sources (plus one ambient), plus a 1D environment map. Because pixels were only subject to shading after they had passed all visibility tests, this was a hardware deferred shading technique: only the pixels that need shading calculations done would have it performed. (There were the usual transparency complications).

Solution to Full Screen Anti-aliasing Problem

I also wanted to have high quality full screen anti-aliasing capabilities, if at a reduced frame rate. Thus the system included an accumulation buffer, to allow multi-pass anti-aliasing, as well as depth of field and motion blur effects. While such capabilities were first described in my paper, and later properly referenced the SGI accumulation Buffer SIGGRAPH paper two years later, the SGI paper is usually referenced in the literature as a fuller description of the technique.

Relation to Other Designs

There had been a previous non custom chip paper design for a triangle processor like architecture by another group outside of Schlumberger (I believe it was only published as a technical report). Otherwise, the next most similar work was Henry Fuchs Pixel-Planes at UNC. His system was the literal inverse of mine: each hardware unit held a pixel, and all the triangles were passed by each pixel. A major difference can be seen by relative silicon efficiency. In the triangle processor, an individual triangle processor unit might be active for only 10 pixels out of 1,000 on a scan line, or 1% of the time. In the Pixel Planes, an individual pixel unit might be active for only 5 pixels out of 1,000,000, or 2,000 times less efficient than the triangle processor. (Admittedly, each triangle processor was several times larger in area than a single pixel processor.) Later generations of the UNC Pixel Planes (including two briefly commercialized versions) tried to reduce this inefficiency by pre-sorting triangles into rectangular sub-areas of the screen (just as I had to sort triangles by Y), and then rendering these sub-screen areas sequentially by re-using the same small array of pixel processors. These same later versions also added some forms of per-pixel programmable shading. The resulting machines had incredible effective fill rates (I remember seeing a depth complexity 100 molecular modelling example), but as the average size of a “real” triangle approached that of a pixel, other hardware rendering techniques were to dominate the commercial market.

Implementation

The first prototype triangle processor chip (containing just one triangle processor) came back fully functional, showing that the overall design and hand layout of the triangle processor cell was correct. However, Schlumberger had made the strategic decision to pull the merged Applicon/MDSI CAD/CAM company from the business of building its own workstations or attached graphics processors. So although the triangle processor project architecture was validated and considered otherwise ready for the process of commercial production, there was no longer an internal corporate customer to proceed with this. Schlumberger did make some efforts to see if any other vendors remaining in the market wanted to pick up this innovative design, but none did, and the project terminated in early 1988.

Innovations

Despite the lack of a commercially delivered product, the triangle processor was not a paper design or a university research product; it had been validated for real commercial usage all during its development. Its relatively high utilization high fill rate, the concept of hardware deferred shading, the concept of complex per-pixel programmable hardware shading, the concept of the accumulation buffer and associated techniques for full screen anti-aliasing, depth of field effects, and motion blur are all generally considered novel innovations of this project, first published in the 1988 SIGGRAPH paper.

Publications

The only publication on the system was the 1988 SIGGRAPH paper:

Michael F. Deering, Stephanie Winner, Bic Schediwy, Chris Duffy, Neil Hunt. The Triangle Processor and Normal Vector Shader: a VLSI System for High Performance Graphics. ACM SIGGRAPH Computer Graphics 22(4),21-30. August 1988.