This article is after the prior article, "Fast Edge Detection on 4xMSAA with Light pre-pass": http://wrice.blogspot.com/2010/03/fast-edge-detection-on-msaa-with-light.html
Today I found that I can run fragment shaders per sampling point on PS3. It wasn't quite obvious on the document so that I wasn't sure until I see the result. The way PS3 allows us to run at sampling point is to set "sample mask" on MSAA surface. By doing that, I can decide which sampling point I will store the result from the fragment shader. It is a bit mask style, so that the result can go to more than one sampling point.
On my prior article, I said "we need to calculate light value at sampling points and then we need to average the light values." Now we can fetch only one normal sampling point and one depth sampling point per sampling point; no need to average any. After this, I found the Light buffer became more like MSAA buffer.
There were several things to notice on the major change. Since we store the light values per sampling point, we don't need sum up or average it. Later, we can get the averaged value by using linear filter on the light buffer. I expect this can buy us some time.
A downside is that now we need to render the light geometry, or quad, three times more on the edges. Including non-edge lighting, we need to render the light geometry total five times; one for non-edge and four for each sampling point.
Although we need to run 4 times of the edge fragment shader, each fragment shader became lighter. Now it does only two times of texture fetching: one on the normal buffer and one on the depth buffer. It was eight times before: four on normal buffer and four on depth. Therefore the total cost seems almost same; in fact, I need to do the performance test more.
The next change was to use Centroid on the fragment shader that actually renders each object with materials. Without centroid, the last result may not have any difference. :-)
I was very happy. The result looked just perfect. I kept seeing dark pixels around edges even after I adopted the fast edge detection. Now the result is very very nice and even beautiful.
*PS : please leave any similar papers or articles so that I can improve this method.