I have spent a whole week for solving an aliasing problem on 4xMSAA and Light pre-pass technique. I think it is a time for me to write about my experience.
Light pre-pass consists of three steps: rendering normal/depth buffer, rendering lights and rendering objects with proper materials.
Light pre-pass is more friendly to MSAA than deferred shading, because on the last rendering phase objects can utilize MSAA. On the other hand, objects on deferred shading cannot get any benefits from MSAA while lights may get some. I believe this is why Wolfgang said "It works with MSAA definitely": http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html
However, there are two problems: one is between the light buffer and the last accumulation buffer, and another problem is on between normal/depth buffer and the light buffer.
When we render lights on the light buffer, MSAA doesn't work. It means that even if we use 4xMSAA setting for the light buffer, the result values on the four colors of each pixel are all the same. It is because light geometries are usually a simple sphere or a small quad. Thus when we render objects as the last step, we cannot get light values at MSAA sampling level, because we don't have enough light information for each sampling point.
One possible way to solve this problem is to do the light rendering per MSAA sampling point. Executing the pixel shader per sampling point will allow us to store each light value per sampling point. Then the light value will be selected by centroid value. Since not every graphic cards allows to run at sampling point, storing the averaged light values is an alternative way.
Another problem is on between the normal/depth buffers and light buffer. Since we are rendering objects on the normal and depth buffers, MSAA properly works on the normal/depth buffers. Then when we render the light buffer, we need to fetch four sampling points from the normal buffer and four sampling points from the depth buffer. It is because we need to calculate light value at sampling points and then we need to average the light values. If we just use linear filter for normal buffer and take the average of the normal values on a pixel, the normal value will have no proper meaning. The depth value is also the same. For example if we have 4 depth values on a pixel: 0, 0, 1, and 1. Then the linearly averaged value will be 0.5, but there was no objects at the depth position.
Since the pixel shader needs to fetch 8 texels, it is very expensive. One way to solve this problem is to differentiate edge pixels from non-edge pixels. On non-edge pixel, we perform one time calculation, while we still do four times calculation on edge pixels.
To practically implement this idea, a cheap edge detection step is required. On Wolfgang's blog, a guy, benualdo, left an interesting idea: http://diaryofagraphicsprogrammer.blogspot.com/2010/03/edge-detection-trick.html
From his idea, I postulated an interesting characteristics of normal values. The length of a normal value is always one by the definition. However, if we linearly interpolate two normal values, the length may decrease. The length will be kept at one only when the normal values are the same. By this characteristics, we can determine whether a pixel includes four same normal values or not by checking the length of the averaged normal value. In other words, by one time of texture fetching with linear filter, we are able to check four normal values; we save three times of texture fetching. I will call this "normal length checking".
The bigger problem was on the depth checking part. I spent about 3 days thinking of depth checking with any similar way to the normal length checking trick. The first way I came up with is to make the 1 dimensional depth value to be 2 dimensional value; ( y = depth, x = 1 - depth ), and normalize it. When the depth values are different, linearly interpolating the normalized value will make the length smaller. This interesting idea didn't work, because we usually use 24 bits for depth and we need 24bits + 24bits to do this. Although I found that 24bits + 16bits is enough to detect edges, I could not accommodate the 2 bytes on any buffers. Normal buffer needs to use 3 channels for the normal and only one channel is left. I tried to encode the normal values onto 2 channels, but I found 2 channels are not enough to do the normal length checking trick. Thus I had to find another way.
My last choice was to use normal direction checking. The way is similar to the toon shading edge detection. When a normal value points to outward from the screen, the pixel is an edge pixel; edge = 0.2 > dot( normal, (0, 0, 1) ). A difference is that on our case false-positive is allowed. In other words, the result will be fine even if we misjudge a non-edge as an edge; we will need little bit more calculations but the result will be the same. On toon shading, this kind of false-positive will make the edge line thicker, which make the result bad.
To prevent too much performance leak, I adopted the Centroid trick that is well explained on ShaderX7. The assumption is that if a pixel is inside of a triangle, it is not an edge pixel, so that we can easily reject those non-edge pixels by checking the centroid value. This reduces a big amount of false-positive edges from the normal direction checking. The centroid information is stored on the alpha channel of the normal buffer.
I like to add some of comments about the centroid trick. The basic idea was very interesting. However, after I implemented it, I soon found that it gave me almost wire-frame-like edges. For example if we have a sphere which consists of several triangles, the centroid trick will indicate those seam parts between triangles as edges. But those pixels are not edges in the sense of normal continuity and depth continuity. In addition, if we use normal map during the normal buffer rendering, pixels in the middle of triangles may need to be considered as edges due to the normal discontinuity. Furthermore, PS3 is based on tiles. The wire-frame-like edges are actually covering almost every screen, although they are sparse. The Stencil cull implementation on PS3 was almost disabled in the situation.
PS3 had another problem with the Centroid trick. PS3 sends the identical value as a centroid value where a pixel is partially covered if the polygon covers the middle of the pixel. It was mentioned on the document and a DevNet supporting guy told me the behavior is hardware-wired so that they have no way to change it. According to my rough calculation only 2/3 of actual edges are detected by the Centroid trick. In other words 1/3 are missed. I couldn't give up this trick, because it makes the edge detection very fast, although the quality may decrease.On my implementation, first the normal length checking will return edges. Remain pixels are tested with the centroid trick, and then the normal direction checking takes place. This requires only 1 time of texture fetching and was fast enough to perform in 1.25ms at 720p.
The result between the normal direction checking and the depth checking on four texels, which is expensive, was little bit different, but the direction checking was very good enough with respect to the cost.