Ansatsu
Power Member
uch has been said about Cell?s presumed inability to texture map well. Given the small (256KB) local stores and DMA memory access, the SPEs were relegated by many to only handle nice streaming geometry type workloads. This seemed like an issue ripe for a little prototyping.
First, colleague Mark Nutter, implemented a software cache abstraction layer for the SPE giving us the ability to both hide the complexity of DMAs and benefit from transparent data reuse. Next, given the lessons learned from this paper, we tiled our textures, optimized our access patterns, and implemented several cache replacement policies. We then rewrote the shader in the Quaternion Julia Set Raytracer to add five cubemap texture lookup passes - 3 refraction lookups, a reflection lookup, plus a background lookup. These five texture lookups were then blended together with a fresnel calculation and modulated with the base lighting computation to form the final sample color.
The results were very pleasing.
We found that even with small 4-way set associative software cache sizes (8 KB), miss rates for this renderer were a low 7% and hit access times were only 12 SPE cycles.
Using only seven 3.2 GHz SPEs we were able to raytrace 15 frames per second with a frame resolution of 1024×1024. The texture buffer held a cubemap with 1024×1024x16 bit texel faces resulting in a 12.5 MB texture buffer in XDR system memory. The performance penalty for using the five pass texture shader vs the lighting only shader was just 13%.
Our miss handler was implemented as a blocking function and we still have ideas pending to further reduce the 12 cycle software cache hit access time so we believe the 13% performance gap between the two shaders will continue to close.
Video: http://www.gametomorrow.com/minor/barry/julia.mov +/- 16Megas
Source: http://gametomorrow.com/blog/index.php/2006/03/24/cell-cant-texture/
Mto fixe o Video