Dec 2, 2010

Kinect works on PC

I found an article that says I can use Microsoft Kinect on my PC:,2817,2373107,00.asp

It reminds me of a similar history on PS3 side that Linux was allowed to be installed and later it was renounced.

Although MS says it was open 'by design', who knows how long it will last; even MS wouldn't know.

I may need to hurry to experience of the exciting device under my control.

Sep 4, 2010

Netflix is amazing

Netflix now provides streaming service on iPhone. In other words you can watch any movie anyplace.

Notice that amazon sells kindle with free 3G and it gives customer experiences that they can read any book any place, which is awesome.

Until now every iPhone users could use YouTube, but it wasn't enough in the sense of contents. Now it is expended to movies. What else can we expect more.

Next thing that can surprise me will be streaming game service on iPhone.

Jul 18, 2010

Artistic game

It is so cool. I wish we can make this kind of real games:
Those pictures are from here, I believe:

Customer Experience

I remember this presentation was introduced by some web sites, but I missed the chance to watch it at the time. Long time latter I got the same URL link and I held the URL:

I strongly recommend everybody watch it.
It is a very nice talking about User Interface and Customer Experience.
Imagine you’re on the first slide of your powerpoint presentation and want to move to the next slide. Your remote control has two buttons. They are unmarked, but one button points up and one button points down.

Which button do you press?
The result is half and half.

I like the statement Emotion wins over logical thinking. I think it really does.I also learned that complexity is fine but needs to be well organized.

Jul 16, 2010

Turn on/off console dev kits: XBox360 and PS3.

Today I found Xbox360 also had a way to turn on and off remotely.

1. Turning off and on Xbox360

"%XEDK%\bin\win32\xbemulate" /Power off
"%XEDK%\bin\win32\xbemulate" /Process %errorlevel% /Quit

choice /d y /t 5 /n /m "Wait 5 sec…"

"%XEDK%\bin\win32\xbemulate" /Power on
"%XEDK%\bin\win32\xbemulate" /Process %errorlevel% /Quit

2. Turning off and on PS3.

D:\projects\Cell\host-win32\bin\dtpoff –d ip_address

D:\projects\Cell\host-win32\bin\dtpon –d ip_address

Jul 13, 2010

Unbelievable eye illusion.

I still cannot believe the color on A is same with that on B.But they are the same...Does it look same now???

*PS: These images are from the homepage of edward H. adelson.

Jul 10, 2010

CHOICE for batch file

Last week, I found there is a command-line utility, CHOICE. It is for Batch files, *.BAT files.

The usage is simple:
CHOICE /M "Do you agree?"
This will print out a message "Do you agree? [Y,N]" and you can type in "y" key or "n" key to answer. Other keys are ignored and enter key after "y/n" is not needed.

We can also change the key choices like this:
CHOICE /C XPD /M "Do you like [X]box360 games or [P]layStation3 games? ([D]on't care)"
Now you will have "X", "P", and "D" choices instead of simply "Y/N".

The user input is then cared out with "IF ERRORLEVEL" command, eg:
CHOICE /C XPD /M "Do you like [X]box360 games or [P]layStation3 games? ([D]on't care)"
There is one more thing to note.

It is always very hard to parse user input arguments. Let's say we have a batch file, "vote.bat", that takes an argument, "xbox" or "ps3".
For example, when we want to vote for PS3, the command will be like this:
vote.bat ps3
Then in the batch file, the argument is parsed in this way:
IF "%1" == "ps3" GOTO :VOTE_FOR_PS3
IF "%1" == "xbox" GOTO :VOTE_FOR_XBOX
Now, we need to remember that the batch file takes either "xbox" or "ps3". It shouldn't be "xbox360" or "pS3". So we now need to add HELP message like this:
IF "%1" == "ps3" GOTO :VOTE_FOR_PS3
IF "%1" == "xbox" GOTO :VOTE_FOR_XBOX
ECHO USAGE: vote.bat [xbox|ps3]
With "CHOICE" command, we can make this batch file much simple:
CHOICE /C XP /M "Vote for [X]box or [P]s3?"
There is no need to have dedicated Help sub-routine. And don't need to bother with Case-sensitiveness.

In addition to that, we can also use the batch file in this way:
ECHO x | vote.bat
This will have a same effect to a command, "vote.bat xbox", without any interactive pause.

I am thinking whenever a batch file needs to take additional input as its argument, it can be replaced with CHOICE.

Unfortunately this useful command is not available on every Windows family. It seems like CHOICE.EXE is available from Windows XP with Service pack 3.

Jul 6, 2010

ShaderX 8 is published

This may not be a new to many of graphic programmers. The author changed its publisher and had to change the name of series, so the new name is not ShaderX8 but GPU Pro.

I also found the author is preparing GPU Pro 2, so I guess now we need to remember this new name.
I spent about an hour skimming the book here and there. One thing I noticed is that now many of articles are based on Shader Model 4 and 5.

Jul 1, 2010

Automate physical action with batch file.

*PS : This is just an idea, meaning I am not doing this in this way....

One of challenge for software testing automation is that sometimes we need to do some physical actions. In that case, we can "open CD-ROM tray". It may sound strange...

For example, we need to turn on XBox360 devkit, before start the testing. Unfortunately XBox360 devkit does not have any way to turn on the machine remotely. We have to manually press the power button in order to turn on the machine. The same thing applies to turning off.

We can place the computer and XBox360 devkit closely, so that when CD-Rom tray opens, it hits the power button of XBox360 devkit.

There are several free utility to open/close CD-ROM try.

I found an interesting YouTube clip that does this.

Jun 30, 2010

Start Unreal Engine 3 for free

As many game programmers know, the famous game engine Unreal Engine is now free to the public. Although you need to pay when you want to sell your games, it is a great chance to get some experience from it. It will also give you better chance to get a job offer.
One who don't find where to start, I recommend the official wiki web site:
Just focus on UnrealEngine3 not 2. And the name of free version is called Unreal Development Kit (UDK).

You can download the engine for free:

There are books in Amazon too: "Mastering Unreal Engine". There are several volumes and several editions. Try to find the latest. As far as I know this book series doesn't cover Unreal Script. You will need to find other books that cover it or you can find the reference page here:

There is also video tutorials which is an easiest and efficient way of learning a tool:

Jun 29, 2010

E3 Best game awards

There are game lists on list web site.

I can save time by playing those games first....

Jun 27, 2010

AutoIt - GUI Software automation script language

It is always hard to test GUI applications more than command line applications. One of the reason is GUI usually take inputs from mouse device and it is hard to simulate. There are several other reason though, I am trying to focus more on AutoIt: is a script language as far as I understand. Before I find AutoIt, I was trying to use Windows API "SendMessage" to mimic mouse click or keyboard short-cut: for example, SendMessage( hwnd, WM_KEYDOWN, xxx, xxx ). I found a utility to do this, SendMsg:

I soon found it is very hard to find a right window handler and to find a right windows message even with Spy++.

At first time when I found AutoIt, I didn't expect something more than AutoMouse like simple features. My expectation was got suddenly raised after I read an sample example about notepad. It was a reliable script language rather than just sending some Windows messages.

Here is an simple example from its website:
WinWaitActive("Untitled - Notepad")
Send("This is some text.")
; from
We can also send keyboard short-cut messages:
Send("!f") ; Alt+f
Send("^s") ; Ctrl+s
Send("{ENTER}") ; Enter
I haven't seen mouse input functions yet. But there must be someway to do it; I will figure it later.

I think this utility can save lot of time for testing GUI applications.

Jun 10, 2010

Game Physics 2nd

David Eberly who is one of the famous game related authors published a new book last month: Game Physic 2nd.
The book is based on Wild Magic new version, 5.1, according to the web page:

I was wondering for a while what he was doing now. I haven't read the 1st edition, but I like to grep the new edition sooner or later.

*PS: It seems like he is now focusing more on physic parts than graphics or engine parts.

May 2, 2010

Lighting on Final Fantasy 13.

I have played Final Fantasy 13 only about three hours. I think it is using forward shading; if I am wrong, I will be very surprised.

I couldn't figure out how they lit the objects. It seems to use more than one light, but it is definitely not deferred shading nor pre-pass-lighting.The game heavily uses cut-scenes. They are pre-rendered movie clips. It is easy to differentiate because the movie clips use a lot more lights than the game play scene.

I have seen pre-baked shadows as well as one high resolution shadow map with PCF; I guess.

Forward shading is still effective method. It can utilize MSAA with low cost and allows transparency for any objects.

Light buffer rendering with SPU

PhyreEngine implemented rendering Light buffer with SPU: I was reading the ppt first time, I didn't pay attention to the SPU implementation part. After I read an article on GPG8, I found it is very interesting.

May 1, 2010

How about using PS3 for graphics data baking machine...

Occasionally the question comes to my mind whether we can use PS3 as a graphics data pre-baking tool. In that way, we will be able to take advantages of several SPUs.

One thing I am not clear is how to store the result data. It wouldn't be a problem with PS3 DevKit, but I am not aware of any means to store data from normal PS3s. Since Sony blocked Linux support, it may be harder than before.

Apr 27, 2010

Shadow blur with PCF and Gaussian filter

Long time back, GPU Gems introduced how to soften shadow with Percentage Closer Filter:

The article fetches eight times and get a similar result to sixteen times fetching.

I have tried both ways but I wasn't satisfied with them.

As the article explains the problem on the shadow map is that we needs to get binary depth testing results first. We cannot just blur the shadow depth values.

To get a correct Gaussian blur result, we first need to get sixteen texels from the shadow map. Then we compare those values with the actual depth value. Now we need to use Gaussian blur on those binary testing results.

The article on the GPU Gems just takes the average on them, but the quality can be much more improved with Gaussian blur and/or Bilinear filter.

We can use a 3x3 Gaussian mask on 4x4 texels. Thus we will get four boolean values. Then we can take bilinear filter on them.

To improve the result even further, we can calculate floating values from the depth testing, rather than just boolean values.
float depthOffset = 0.001;

float depth00 = GetDepth( uv + float2( -1.5, -1.5 ) );

float brightness00 = saturate( ( depth00 - ( actualDepth - depthOffset ) ) / ( 2* depthOffset ) );

// do the same thing on different offsets.
This approach is trying to estimate how much the pixel is open to the light; I tried to apply SSAO trick on the shadow map, but in a simpler way.

The result was looking good, but I am still looking for a better way, because sixteen texel fetching seems too expensive.

One way to improve the performance can be to fetch half or quarter of them and fetch others only when those depth testing results differ.

Apr 25, 2010

Why does light leak happen on the soft shadow map?

Today I have arrived at certain conclusion about soft shadow map technique. I am trying to describe it now.

Variance shadow map (or VSM) Convolution shadow map (or CSM) and Exponential shadow map (or ESM) all have light leaking problem. On papers of VSM this problem is referred as "light bleeding". Here I assume "light bleeding" and "light leaking" are the same problem.
The light leaking artifact is actually the source of softness of the shadow. What I am proposing here to solve the light leaking problem is to differentiate inner penumbra region from outer penumbra region; I will explain more latter.

A screen shot of Exponential shadow map is below:The screen shot was referred as "no bleeding", but the fact is it is still bleeding and it has physically wrong softness on the contact points.The shadow around the contact points is supposed to be sharp because they are affected by high frequency light most. The contact point soften problem is actually a little bit different version of the light leaking problem. For example, if we put two pieces of papers very closely, the second paper which is supposed to be under shadow will get soften shadow due to the short distance.

First the reason why we have light leaking problem on shadow map retrieving methods is that we are trying to estimate information of places under the shadow map while the shadow map hold non of information of them.

There can be two types of penumbra region with respect to the shadow map. I call them inner penumbra and outer penumbra. Inner penumbra is penumbra region that is not visible from shadow map. Outer penumbra is penumbra region that is visible from shadow map.Whenever we attempt to soften "inner penumbra" region, we are using estimated values. It is because the shadow map doesn't see the inner penumbra part so that there is no information on the shadow map. The way to estimate the information behind the shadow map is causing the light leaking problem.

A physically correct shadow strength calculation should be like this:Note that we must differentiate "B and D" from "A and C". It is because B part gets full of light while A part doesn't get any. If we use the same calculation on inner penumbra region( A and C ) and outer penumbra region ( B and D ), the result must be wrong.

The problem is on "A and C" part with VSM and ESP.

The calculation of VSM is like this:
float bOuterPenumbra = ( depthOnShadowMap >= actualDepth );

float brightness = variance / ( variance + pow( actualDepth - depthOnShadowMap, 2 ) );

float finalBrightness = max( bOuterPenumbra, brightness );
This will make "A" part brighter than "C" part, which is opposite to the correct shadow model. This will make stronger shadow on "C" part and softer shadow on "A".

Note that VSM does not count the outer penumbra region into the calculation. In order words VSM is working with inner penumbra region not with outer penumbra region.

ESM does the smiliar calculation way of VSM. It reduced light leaking problem but now it causes contact point soften problem. It is because ESM considers depth values only regardless how much the depth values are interpolated. By this reason, the light leaking problem is moved from "C" part to "A" part so that it appears to be soften contact points.

To address this light leaking problem, we need to give the softness on the outer penumbra rather than the inner penumbra. With respect to the fact that the shadow map holds no information of inner penumbra region, it makes more sense to apply shadow softening on the outer penumbra region.

The calculation will be like this:
const float e = 2.71828183;
const expConst = -5.0;

const bool bUnderShadow = ( actualDepth > depthOnShadowMap );

const float brightness = ( bUnderShadow ? 0.0 : ( 1 - saturate( pow( e, expConst * actualDepth ) / pow( e, expConst * expFromExpMap ) ) ) );
There are some problems on this approach.

One of them is that we cannot use "second depth rendering" trick on shadow map rendering phase. Second depth rendering is to render back face on the shadow map rather than front face. It is very helpful to reduce the shadow swimming problem and "surface acne" problem. Since the softness calculation relies on the front face depth, we can no longer use second depth rendering.

Another problem is that we must hold the original shadow map in order to differentiate inner penumbra and outer penumbra. When we want to pre-filter the shadow map as ESM does, we need to fetch texels on both.

I have some of solutions in my mind and I will post more details after I get enough test results.

Apr 23, 2010

Soft shadow with shadow map...

There were several new innovative shadow map techniques that promised soft shadow. There are Variance shadow map, and Exponential shadow map.

They are using 32bit floating point surfaces and they assumes they work with linear hardware filter. Unfortunately PlayStation3 does not support linear nor bilinear filter for 32bit floating point surface. How am I supposed to get soft shadow?My final choice was a little bit modified version of Percentage Closer Filter, which means I fall back to the bottom line. I modified it to mimic the bilinear filter.

I am not quite happy because I cannot get any benefits from new hardware accelerations. But it seems to work fairly good.

*PS: The screenshot it from here.

Apr 20, 2010

Depth buffer blur with Gaussian filter on PS3

These days I often look at the real shadows of trees or any objects. Strangely they don't look natural or real for me. It makes me thinking of what kind of soft shadow I want to see.

The softness of shadows is from frequency of light. Around edges the shadow looks sharp, because it is resulted from high frequency. Some of soft shadows are from low light frequency.When we render a shadow map, it is analogous to the HIGH frequency light. It is proper to the shadows close to the object but improper to the shadow far from the object.

We need LOW frequency shadow map for distance shadows. Low frequency shadow map can be retrieved by Gaussian blur. The blurred shadow map must have the same resolution to the original shadow map.

One problem here is again how to blur the shadow buffer which contains encoded values.

Simplest way is to sample nine times and calculate Gaussian blur, but this way will be too slow.

I have already described how to get blurred depth buffer with linear filter before:

Today I found that I can also do this with the Gaussian filter on PS3. The trick is very similar to the one with linear filter, but it needs more buffer bits on each channel. And this is not for down sampling but only for blur.

The Gaussian filter on PS3 does "[ 1, 2, 1 ] [ 2, 4, 2 ] [ 1, 2, 1 ] / 16".
Since it divides by 16 ( =2^4 ), we need to have 4 bits as buffer on each color channel.

It means that we can use 4bits on Red channel, 4 bits on Blue channel, 4 bits on Green channel, and 8 bits on Alpha channel. Therefore, we can use total 20bits, which is large enough.

The way to use those shadow maps is to sample on the non-blurred shadow map first and sample one more time on the most blurred shadow map. The difference value between those two sampled values will represent how much the shadow needs to be blurred.

Apr 19, 2010

From gran turismo 5 movie clips

Today I was looking closer to Gran Turismo 5 movie clip.

I found that they are using two types of shadows of cars. One is from directional light and another is ambient occlusion.

The shadow from the directional light is done by shadow map. Each car seems to hold their own shadow map.
As you can see from the screen shot above, those two types of shadows don't seem to get along well.

An impressive part is reflections on the car's body. The reflection looks very clear and covers body and glasses part very nicely. I guess they are stream in and out pre-baked environment map rather than rendering them at run-time.

I will be very surprised if the environment map is rendered at run-time. One way to figure out is to see whether other cars are reflected on the car's body or not. I couldn't find that kind of situations from the movie clip but I can test it latter.I tried to see aliasing issues but the movie clip did not carry enough details on edges. HDR was covering edge parts smoothly and Depth of Filed made the look smoother. Slightly shaking camera up and down made the scene more realistic.

God Of War 3 has seamless pre rendered cut scenes

Today I am very sure that the cut scenes in god of war 3 are pre-rendered not real-time rendering.

The way I figured out is that the shadow looks very very nicer and softer on the pre-rendered cut scenes while the shadow isn't that much smooth during the game play.
I should say it was almost indistinguishable if I didn't pay enough attention on the shadow.

It is often hard tasks to match the pre-rendered cut scenes and the actual game scenes, because 3DS Max or Maya generates quite different look and feel of the background or characters from what the actual game engine generates. I am guessing that they might have pre-rendering tools with the game engine, not directly from Max/Maya.

It could probably save some face motion animation data by using pre-rendered cut scenes.

Apr 18, 2010

Real-time Ray-tracing with PS3s.

I found some of YouTube movie clips of Real-time ray-tracing with several PS3s.

It is very interesting. They are using SPUs only due to the limitation of supervisor mode. Each PS3 has six SPUs available.
However, it still does not look fast enough for game rendering.
The first demonstration uses three PS3s and the scene is static.
The next one is using seven PS3s and the scene is dynamic.

I think if real-time ray-tracing is feasible with two or three PS3, then we may be able to see any commercial or non-commercial games potentially.. :D

God of war 3 uses light cube

Today I found two interesting visual bugs on God Of War 3.
The screen shot above shows it uses light cube. Note the light boundary bends on leg part. It means that the light boundary is aligned to the world space not to the screen space. In other words it is not screen quad but light cube in the world space.

It also means that they are using deferred shading or light pre-pass. I don't know which one it is. I don't see much variety of materials, so it can be deferred but I don't think there can be any reasons to use deferred shading over light pre-pass these days. If I can see any sophisticated shaders such as parallax mapping, I will be able to conclude it is using light pre-pass, but I haven't seen any of them yet.

It is common to use light cube than light sphere which I am using on our project currently. I think saving the cost on fragment shader will be more efficient than saving that of vertex shader. I may be wrong, because I haven't seen any games using light sphere yet.

Another visual bug is about shadow.The first screen shot shows a badly aliased shadow on the wall. The second screen shot shows anti-aliased shadow on the character. In other words, the shadow on the wall looks jig-jagged while the same shadow looks very smooth on the character.

I am guess it is because they use different resolutions on different objects for shadow map. It makes sense to handle the shadows differently on characters from on static objects.

One more interesting thing is the aliased shadow on the wall looks still not too bad and somehow blurred. I don't know how they did it.

Apr 13, 2010

Projection matrix and inversed depth value.

On DirectX coordination system, which is left hand coordinate, the center of the screen is ( x, y, z ) = ( 0, 0, [0~1] ); "[0-1]" means from zero to one. Top right corner of the screen is ( x, y , z ) = ( 1, 1, 0 ) and bottom left of the screen is ( x, y, z ) = ( -1, -1, 0 ).

When we show the 3D points or lines on the screen, we use perspective way rather than orthographical way. In other word, a far object appears to be smaller while a closer object appears to be bigger.

The way we do this is to divide x and y values by z value. When z value is bigger, x/z and y/z get smaller. In other words the points or lines get closer to the center of the screen when the z value is bigger, because the center is ( 0, 0, [0-1] ).

The matrix will be like this:
[ 1, 0, 0, 0 ] [ x ]
[ 0, 1, 0, 0 ] [ y ]
[ 0, 0, 1, 0 ] [ z ]
[ 0, 0, 1, 0 ] [ 1 ]
[ x ]
[ y ]
[ z ]
[ z ]

We always divide x, y, and z values by the last value, w, in order to make the w value to be 1.
[ x / z ]
[ y / z ]
[ z / z ]
[ 1 ]

Now we see there is a problem on Z value. Z will be always 1. In computer graphics, Z value 1 means deepest inside of the screen. In order words, the most far objects have z value one, while the closest objects have z value zero in DirectX.

In order to solve this problem we use this matrix instead:
[ 1, 0, 0, 0 ]
[ 0, 1, 0, 0 ]
[ 0, 0, 1, -1 ]
[ 0, 0, 1, 0 ]

The result is:
[ x / z ]
[ y / z ]
[ ( z - 1 ) / z ]
[ 1 ]

Note the z value is now ( z - 1 ) / z. It is equals to ( 1 - 1 / z ).Now far objects will have less than one but closer to one. Since the Z value is inversed the values are not evenly distributed. For far objects, the projected Z values would not vary much. For example, projected Z for Z value, 100, is 1 - 1/100 = 0.99 and projected Z of Z value, 200, is 0.995. The Z values are double times different but the projected Z value differs only 0.005. In other words, for far objects the projected Z value is less sensitive while for closer objects the projected Z value is more sensitive.

This is a preferable result on perspective view, because we can notice depth difference of closer objects more than far objects.

One problem I realized is that by this projection matrix the negative Z value may result in a very big positive value. For example, for Z value, -2, which means the object is behind of the camera, the projected Z value is 1 + 1 / 2 = 1.5, meaning that it is at infinitely far front side not on back side. To solve this problem, we need to make sure the Z value always ranges from zero or positive value.


Lawyers may do good if you hired them.
They are definitely mean if someone else hired them.
Thus they are either definitely mean or may do good.
In other words they are averagely closer to mean:
( "may do good" + "definitely mean" ) / 2 = "mean"
We may need them sometimes but I don't think their job is productive. It is like soldiers, which are powerful but don't produce any.I wish not to face them.

Tessellation on XBox360

Today I found that XBox360 has tessellation feature.

I thought tessellation or geometry shader is supported from DirectX10 so that I was very excited by finding this. Tessellation is simply saying when I pass one triangle, I can use more than one triangles as the result. It is supposed to be fast and efficient due to hardware support.Actually I haven't been paying attention on geometry shader or tessellation, because I thought we cannot use it yet. I always skipped those parts when I read books. Now I need to track back and check what kind of techniques I can use with it.

It seems like PS3 doesn't have any similar functionality. Sony might think of SPU as the counter part.

Unfortunately the XBox360 SDK document doesn't explain about the tessellation at detail; for example performance or caveats. In addition, I haven't heard any XBox360 games have been using the tessellation feature before. It makes me wonder how useful it will be.

Observation of shadow map in God Of War3

Today I have played a little bit of God Of War 3.

I know there was a presentation on GDC 2010 about shadow in God Of War3, but I cannot get it. Therefore I had to guess by looking at the result.

From by observation, it is definitely doing multi-sampling on a shadow map. Probably it does percentage closer sampling, because the penumbra reason is almost one pixel distance on shadow map. It is more clearly noticeable when the shadow is drawn on a edge.
I saw the shadow is often jumping. It gives me the impression that the shadow map is dynamically moving. It may moved to find the best angle between shadow map and the shadow receiver. It is also possible the position of shadow map is predefined by artists. This way reminded me of the shadow implementation of Halo series.

The character has only one shadow even when there are many lights around him.
Black shadow like texture appears when the character is under a shadow, which is similar result with assassin creed 2.

The game also uses static shadows from static objects with the dynamic shadows from characters. They get along well. When a character is inside of the static shadow, the dynamic shadow merges and disappear naturally.

I want to know how actually it works....

Apr 5, 2010

Penumbra shadow map screen shot

I have implemented some part of the penumbra soft shadow map idea. I need more time to implement the actual shadow part, but today I got the penumbra map.

I found that we can use shader programs even with wire frame fill mode; I am not sure whether or not it can be faster than solid fill mode. Therefore I could implement shaders that generate a derivative image directly with rendering wire-frame model.

I like to show some of screen shots.
The first image is the scene from light perspective:
Then, the depth map is generated. The depth values are stored at Red and Green channels.
Finally, the derivative image is generated by rendering wire-frame fill mode:
Note that the leg parts of the elephant looks like blurred. It hasn't been blurred but because their derivative values are small. Thus I can expect that contact points between the disk and the elephant will have sharp shadow.

It fetches five times at quarter resolution and I didn't see any false-positive edges. Since the derivative values are 8bit data, they can be retrieved by linear filter.

Apr 4, 2010

Soft shadow map with wire-frame lines.

After lots of thoughts, I came up with another idea about soft-shadow map.

The basic idea is that we render objects on Wire-frame mode on the shadow map as fourth channel. This wire-frame information is used as edges. Then when we retrieve the depth information from the depth map, we multi-sample on pixels where the wire-frame is rendered; otherwise, we sample only one time.
The idea is based on multi-sampling shadow mapping; for example, Percentage-closer filtering (PCF). It will be very costly if we just multi-sample on every pixels. We want to multi-sample only on edges. However edge detection on shadow map by doing image processing is very hard. My idea is to use 3D model information to get those edges by rendering wire-frame lines.

If we can control the width of wire-frame lines, we can increase the size of penumbra region, which will require more multi-sampling. It will be better if we can have different width on different depth, but it would not be possible sadly. If the fixed width give us bad result, we may need to blur the lines; possibly at lower resolution.

It is also possible to use stencil buffer instead of rendering actual lines on a color buffer. If we are already using a depth buffer on console machines for shadow map, we can reuse the stencil channel at free.
I haven't seen any rendering tricks that use wire-frame information yet. It makes me wonder whether this idea is feasible or not. And although I assume that wire-frame rendering will be cheaper than normal rendering, it may not be true; I need to test this.

I kept looking for any ways to eliminate the multi-sampling cost at all by doing some of pre-processing on the wire-frame image; for example, blurring or pre-calculating. However, I found that it may not be worthy, because each pixel on the shadow map does not correspond to each pixel on the screen. In other words, some of pixels on the shadow map will be used several times while others may not be used at all. In addition, image processing on the shadow depth map is relatively expensive, because each pixel needs to fetch at least two or normally five texels.

Edge detection with wire-frame rendering will give us false-positive a lot. It will be nicer if we can eliminate those wrong edge lines. One way to do this is to use normal information, but I'm not sure whether I can use shader programs on wire-frame mode.

It is also possible to use a derivative image of the shadow map. The derivative image can eliminate those false-positive edges. Taking derivative image will require at least three times of texture fetching per pixel, which is expensive. But if the number of multi-sampling on edge side will be more than three times, taking derivative can be an option as pre-processing. After updating stencil buffer by rendering the wire-frame, we can reduce the cost of 2D image processing with stencil early cull.

Another thing I want to mention is that there are lots of missing information on the shadow map that is necessary for softening penumbra region. Even after the costly pre-calculation on the shadow map, those reconstructed inner-penumbra information is not correct; by inner-penumbra I mean penumbra region inside shadow, while outer-penumbra will be penumbra region out side of shadow. Thus I think it is better to multi-sample latter with the actual depth.

I will try to implement this idea to see feasibility.

Apr 1, 2010

Blurring depth buffer with linear filter.

It happens sometimes that we need to blur the depth buffer, but we cannot use linear filter for depth value.

The reason is that the depth value is encoded into three channels. We use 24 bits for the depth values, and each color channel consists of only 8 bits, so that we have to split the depth value onto three separate color channels. By the reason, if we linearly interpolate each channel of the encoded values, the value will be broken.

My idea to use linear filter for the depth buffer requires two preconditions.
  1. We store the depth values on color buffer not on the depth buffer so that we can control the encoding way.
  2. We want to get the average values from four texels. In order words, this way does not work if we want to linearly interpolate at arbitrary ratio. It has to be exactly at the middle of four texels.
Let's see the reason why depth values cannot be linearly interpolated.
For example we have two values: 257 and 254. Of course the actual depth value ranges from zero to one, but floating point expression is hard to read as an example.

The encoding formula is like this:
  • Blue channel = value % 256
  • Green channel = ( value / 256 ) % 256
  • Red channel = ( value / ( 256 * 256 ) ) % 256
*PS: "%" is modulation operator, which gives us reminder after division.

For the value, 257, the encoded value is [ R, G, B ] = [ 0, 1, 1 ]. For the value, 254, the encoded value is [ R, G, B ] = [ 0, 0, 254 ].

When we sample at exactly middle of the texels, the hardware linear filter will give us the interpolated value, [ R, G, B ] = [ 0, 0, 127 ], because first it calculate values for each channels, and it will truncate the values lower than zero.

The decoded value will be 127; The way to decode is the opposite process of the encoding way: 0 * (256*256) + 0 * 256 + 127. The result we expect is "255", but we got 127.

The problem is at under-flow part. The averaged value on the Green channels is supposed to be 0.5. If we can get the value below than zero, we can correctly reverse back the original value.
0.5 * 256 + 127 = 255.

Thus we need some buffer bits for the truncation. The encoding formula must be changed to:
  • Blue channel = value % 256
  • Green channel = ( ( value / 256 ) % 128 ) * 2
  • Red channel = ( ( value / ( 256 * 128 ) ) % 128 ) * 2
By this encoding way, we will store a value, [ R, G, B ] = [ 0, 2, 1 ], for the value 257 and another value, [ R, G, B ] = [ 0, 0, 254 ], for the value, 254. The linearly interpolated value for each channel is [ R, G, B ] = [ 0, 1, 127 ], and now we can reverse back to the correct average value, 255: ( 0 / 2 ) * ( 256 * 128 ) + ( 1 / 2 ) * 256 + 127.

Since we need to use bilinear filter for 2D image, we need to reserve 2 bits for each channel. Fortunately overflow never happens due to the characteristics of average. Now we can use 6 bits for the Red channel, 6 bits for Green channel, and 8 bits for Blue channel; total 20bits. I assume that each channel has only 8 bits available.

In case that 20 bits are not enough, we can also use the fourth channel, alpha. Then we can use 6 bits for Red, 6 bits for Green, 6 bits for Blue and 8 bits for Alpha; total 26bits, which is big enough for the most cases.

This way should work for depth value blur with the trick I explained before:

Please note that this trick does not work when we expect arbitrary linear interpolation; the texel coordinate is not exactly at the middle. It will cause under-flow problem with much higher precision.

Mar 28, 2010

Anti-alias after edge detection on 4xMSAA with Light pre-pass

This article is after the prior article, "Fast Edge Detection on 4xMSAA with Light pre-pass":

Today I found that I can run fragment shaders per sampling point on PS3. It wasn't quite obvious on the document so that I wasn't sure until I see the result. The way PS3 allows us to run at sampling point is to set "sample mask" on MSAA surface. By doing that, I can decide which sampling point I will store the result from the fragment shader. It is a bit mask style, so that the result can go to more than one sampling point.

On my prior article, I said "we need to calculate light value at sampling points and then we need to average the light values." Now we can fetch only one normal sampling point and one depth sampling point per sampling point; no need to average any. After this, I found the Light buffer became more like MSAA buffer.

There were several things to notice on the major change. Since we store the light values per sampling point, we don't need sum up or average it. Later, we can get the averaged value by using linear filter on the light buffer. I expect this can buy us some time.

A downside is that now we need to render the light geometry, or quad, three times more on the edges. Including non-edge lighting, we need to render the light geometry total five times; one for non-edge and four for each sampling point.

Although we need to run 4 times of the edge fragment shader, each fragment shader became lighter. Now it does only two times of texture fetching: one on the normal buffer and one on the depth buffer. It was eight times before: four on normal buffer and four on depth. Therefore the total cost seems almost same; in fact, I need to do the performance test more.

The next change was to use Centroid on the fragment shader that actually renders each object with materials. Without centroid, the last result may not have any difference. :-)

I was very happy. The result looked just perfect. I kept seeing dark pixels around edges even after I adopted the fast edge detection. Now the result is very very nice and even beautiful.

*PS : please leave any similar papers or articles so that I can improve this method.

Mar 27, 2010

Gaussian filter at run-time

Today I found that PlayStation3 has "3x3 Gaussian filter" as a texture filter. I was surprised, because Gaussian filter is known as an expensive filter.On second thought, I found that it is actually not too crazy expensive. 3x3 Gaussian filter is like this:
[ 1 2 1]
[ 2 4 2]
[1 2 1] / 16

To get the filtered result, we can actually do it by 4 times of texture fetching with bilinear filter, which most of graphic cards support.

[ 0.25 0.25 0 ]
[ 0.25 0.25 0 ]
[ 0 0 0 ]
[ 0 0.25 0.25 ]
[ 0 0.25 0.25 ]
[ 0 0 0 ]
[ 0 0 0 ]
[ 0.25 0.25 0 ]
[ 0.25 0.25 0 ]
[ 0 0 0 ]
[ 0 0.25 0.25 ]
[ 0 0.25 0.25 ] ) /4

I haven't thought that I can use Gaussian filter at run-time. I wonder why I haven't heard from anybody else or any books. Did I just forget with getting older? :-)

It can be useful on blur or glow effects. It can also help alias problems.

By doing the same trick double times, we can get 5x5 Gaussian result. 5x5 Gaussian filter requires 25 pixel colors for each result pixel. With the trick, it takes only 8 times of texture fetching rather than 25 times. If the PS3 hardware Gaussian filter is fast enough, it will cost only two times of texture fetching.Tomorrow I will try this idea and see how much it will cost. I'm guessing it will cost about 5ms... And I will also test the PS3 Hardware Gaussian filter too.

Mar 26, 2010

Fast edge detection on MSAA with Light Pre-Pass

I have spent a whole week for solving an aliasing problem on 4xMSAA and Light pre-pass technique. I think it is a time for me to write about my experience.

Light pre-pass consists of three steps: rendering normal/depth buffer, rendering lights and rendering objects with proper materials.

Light pre-pass is more friendly to MSAA than deferred shading, because on the last rendering phase objects can utilize MSAA. On the other hand, objects on deferred shading cannot get any benefits from MSAA while lights may get some. I believe this is why Wolfgang said "It works with MSAA definitely":

However, there are two problems: one is between the light buffer and the last accumulation buffer, and another problem is on between normal/depth buffer and the light buffer.

When we render lights on the light buffer, MSAA doesn't work. It means that even if we use 4xMSAA setting for the light buffer, the result values on the four colors of each pixel are all the same. It is because light geometries are usually a simple sphere or a small quad. Thus when we render objects as the last step, we cannot get light values at MSAA sampling level, because we don't have enough light information for each sampling point.

One possible way to solve this problem is to do the light rendering per MSAA sampling point. Executing the pixel shader per sampling point will allow us to store each light value per sampling point. Then the light value will be selected by centroid value. Since not every graphic cards allows to run at sampling point, storing the averaged light values is an alternative way.

Another problem is on between the normal/depth buffers and light buffer. Since we are rendering objects on the normal and depth buffers, MSAA properly works on the normal/depth buffers. Then when we render the light buffer, we need to fetch four sampling points from the normal buffer and four sampling points from the depth buffer. It is because we need to calculate light value at sampling points and then we need to average the light values. If we just use linear filter for normal buffer and take the average of the normal values on a pixel, the normal value will have no proper meaning. The depth value is also the same. For example if we have 4 depth values on a pixel: 0, 0, 1, and 1. Then the linearly averaged value will be 0.5, but there was no objects at the depth position.

Since the pixel shader needs to fetch 8 texels, it is very expensive. One way to solve this problem is to differentiate edge pixels from non-edge pixels. On non-edge pixel, we perform one time calculation, while we still do four times calculation on edge pixels.

To practically implement this idea, a cheap edge detection step is required. On Wolfgang's blog, a guy, benualdo, left an interesting idea:

From his idea, I postulated an interesting characteristics of normal values. The length of a normal value is always one by the definition. However, if we linearly interpolate two normal values, the length may decrease. The length will be kept at one only when the normal values are the same. By this characteristics, we can determine whether a pixel includes four same normal values or not by checking the length of the averaged normal value. In other words, by one time of texture fetching with linear filter, we are able to check four normal values; we save three times of texture fetching. I will call this "normal length checking".

The bigger problem was on the depth checking part. I spent about 3 days thinking of depth checking with any similar way to the normal length checking trick. The first way I came up with is to make the 1 dimensional depth value to be 2 dimensional value; ( y = depth, x = 1 - depth ), and normalize it. When the depth values are different, linearly interpolating the normalized value will make the length smaller. This interesting idea didn't work, because we usually use 24 bits for depth and we need 24bits + 24bits to do this. Although I found that 24bits + 16bits is enough to detect edges, I could not accommodate the 2 bytes on any buffers. Normal buffer needs to use 3 channels for the normal and only one channel is left. I tried to encode the normal values onto 2 channels, but I found 2 channels are not enough to do the normal length checking trick. Thus I had to find another way.

My last choice was to use normal direction checking. The way is similar to the toon shading edge detection. When a normal value points to outward from the screen, the pixel is an edge pixel; edge = 0.2 > dot( normal, (0, 0, 1) ). A difference is that on our case false-positive is allowed. In other words, the result will be fine even if we misjudge a non-edge as an edge; we will need little bit more calculations but the result will be the same. On toon shading, this kind of false-positive will make the edge line thicker, which make the result bad.

To prevent too much performance leak, I adopted the Centroid trick that is well explained on ShaderX7. The assumption is that if a pixel is inside of a triangle, it is not an edge pixel, so that we can easily reject those non-edge pixels by checking the centroid value. This reduces a big amount of false-positive edges from the normal direction checking. The centroid information is stored on the alpha channel of the normal buffer.

I like to add some of comments about the centroid trick. The basic idea was very interesting. However, after I implemented it, I soon found that it gave me almost wire-frame-like edges. For example if we have a sphere which consists of several triangles, the centroid trick will indicate those seam parts between triangles as edges. But those pixels are not edges in the sense of normal continuity and depth continuity. In addition, if we use normal map during the normal buffer rendering, pixels in the middle of triangles may need to be considered as edges due to the normal discontinuity. Furthermore, PS3 is based on tiles. The wire-frame-like edges are actually covering almost every screen, although they are sparse. The Stencil cull implementation on PS3 was almost disabled in the situation.

PS3 had another problem with the Centroid trick. PS3 sends the identical value as a centroid value where a pixel is partially covered if the polygon covers the middle of the pixel. It was mentioned on the document and a DevNet supporting guy told me the behavior is hardware-wired so that they have no way to change it. According to my rough calculation only 2/3 of actual edges are detected by the Centroid trick. In other words 1/3 are missed. I couldn't give up this trick, because it makes the edge detection very fast, although the quality may decrease.On my implementation, first the normal length checking will return edges. Remain pixels are tested with the centroid trick, and then the normal direction checking takes place. This requires only 1 time of texture fetching and was fast enough to perform in 1.25ms at 720p.

The result between the normal direction checking and the depth checking on four texels, which is expensive, was little bit different, but the direction checking was very good enough with respect to the cost.

Mar 22, 2010

Version control system: Mercury

On Joel on blog, I found an interesting article posted:

It is a kind of introductory of Mercury source version control system.

First I thought we have already enough number of source version control systems such as Subversion, CVS, and AlienBrain.

The very first time I have heard about Mercury is from Google code. They actively supported it and they even had a video clip for it. I watched it but I didn't get how different it is from others.

Joel introduced it in a easy way, and I also think it is a very important big progress in software engineering. He also kindly made a tutorial of it: http://hginit.comI will try it on google code at a next personal project.

In a nut shell, Mercury is different from others in the sense that it stores "changes between versions" while previous systems store each version.

*PS: I don't see eclipse plugin for Mercury yet.

Mar 16, 2010

GPG8 is published.

Today I was checking Amazon and I found Game Programming Gems 8 is published.This time I expect more than usual, because last year there were big progress in graphics programming such as SSAO, and pre-lighting. And there were several impressive games published.

From the table of contents, it contains about SPUs, Code coverage and face rendering... I must read it.

Mar 15, 2010

GCC converts variables into SSA form

I was gathering some information of GCC with respect to my prior article. And I found that GCC is converting code into SSA form: SSA stands for "Single Static Assignment."

It is quite surprising me, because DEF-USE relationship is used for software testing and I haven't thought testing and compiler is related; indeed they must be able to share big amount of parsing technique.

In addition, personally I prefer to use "const" keyword for every single local variables. Some programmer don't like to see const keywords, because it makes source code longer. And they may think it increases the number of local variables and therefore it increases the size of the binary code. If compiler is internally converting to SSA form, then there is no way to reduce the binary code size by not using "const".

Hence I found a better excuse to keep using const keyword for local variables. :-)

Mar 14, 2010

Use Perfect Hash Table for alternative way of switch.

My question I had yesterday is how better "switch" performs comparing to "else if". For example, when we have a piece of code like this:
if( a == 1 ) doSomething1();
else if( a == 2 ) doSomething2();
else if( a == 3 ) doSomething3();
else if( a == 40000 ) doSomething40000();
For each line, CPU, more precisely ALU, will evaluate each statement: "a == 1", "a == 2" and so on. In other words, CPU need to calculate 40000 times for the same value "a".

More intuitive representation for this testing can be like this:
switch( a )
case 1: doSomething1(); break;
case 2: doSomething2(); break;
case 3: doSomething3(); break;
case 40000: doSomething4000(); break;
This "switch statement" gives us an illusion that CPU will evaluate the value of "a" only one time.

According to an article on CodeGuru, however, "switch" statement will be replaced by "else if". See the article:

A faster and ideal implementation will be like this:
typedef void (*CB)();
CB doSomethings[] = { doSomething0(), doSomething1(), ... doSomething40000() };
(*(doSomethings[ a ]))();
This idea is called "jump table". In this implementation, CPU does not evaluate the "a" value 40000 times but does only once. In other words, this way is faster.

One problem of "jump table" is that the jump table can be too big. When we have a big gap between two values, like "0", and "40000", we still need to have values for 1, 2, 3,... 39999, which will never be used.

The article also mentions the jump table. However, when the table became ridiculously big in some cases, it retreats to "else if".

According to another article, GCC does test "density" of values on "cases". See the article:

The article says when some of values are close enough, GCC will use jump table for those values only, while it still uses "else if" for other sparse values. For example, when we have values like "1", "2", "3", and "40000", GCC will use jump table for those close values "1", "2", and "3", and it will use "else if" for the distant value "40000".

The problem that I am still not happy is that it still uses "else if", although it does use jump table partially.

My idea to improve this problem is to use Perfect Hash Table.
Hash table is a table that contains both key value and mapped value. For the example above, "1" is mapped to "doSomething1" and "40000" is mapped to "doSomething40000".
std::map< int, CB > hashTable;
hashTable[ 1 ] = doSomething1;
hashTable[ 40000 ] = doSomething40000;
(*(hashTable[ a ]))();
One better property of hash table over "jump table" is that the memory space that the hash table requires does not depend on the values but depends on the number of values, which is preferred. Although hash table need little bit more space than the number of values, it is much less than the size of jump table in this case.

One down side of Hash Table idea is that "hash function" may not as fast as jump table address calculation.

For this down side, it is unavoidable that Hash Function calculation is expensive than direct address calculation by its nature.

However, the cost of Hash function varies depending on what Hash Function we are going to use. So we should be able to control the cost by selecting the hash function.

There are several well-known hash functions. It seems like most of Hash Function Implementations are focusing on String key values, while I need optimal Hash functions for Integer values; for example, gperf and CMPH. An article I found shows Integer Hash Functions:

For one case of the article, "32 bit Mix function", it does 11 CPU cycle on HP9000, which is relatively old platform. In addition, those hash function can utilize parallel operations, which can perform faster.

A point is that Hash function for integer is not crazily expensive. Since it doesn't have Branch operation, it should be faster then a bunch of "else if".

Another down side is that hash values may conflict for different key values, then it needs to take additional steps to resolve it.

For this down side, we can use the idea of "Perfect hash table". Perfect hash table is a hash table that does not have confliction at all. In a easy way to think of the perfect hash table is an hash table whose reserved size is very big while the number of values in the hash table is small.

Therefore we can avoid confliction problem by using Perfect Hash Table. Mose of cases, Perfect Hash Table is not good at inserting and deleting in the middle of process. In other words, we need to know every values that are going to be in the table when we create the perfect hash table. Fortunately C/C++ requires the values on "case" to be constant values. So we know every values at compile time; in other words, compiler knows every values at compile time.

There is another concept, "Minimal Perfect Hash Table".
Minimal Perfect Hash Table is a hash table whose reserved size is same with the number of keys on it without any conflict.

It sounds very nice, but the hash function for minimal perfect hash function is at least 4 times expensive than normal hash function. One example of Minimal Perfect Hash Function is BDZ algorithm:

The basic idea of BDZ was surprisingly simple to me. The idea seems very useful for data compression. But it does 3 times of normal hash function to get perfect hash table and it does additional calculation to achieve minimal perfect hash table.

In brief, using Perfect Hash Table saves memory space than jump table and it performs faster than a bunch of "else if". On the other hand, it is slower than jump table and it may (or may not) take more space than "else if" way. Therefore, using Perfect Hash Table for "switch statement" is an alternative way in between "else if" replacement way and "jump table" way. It can certainly perform better for the cases that jump table does not fit. In addition, this evaluation can be done by compiler at compile time, so I believe a better compiler should consider this option for internal optimization.

Mar 4, 2010

Features for next generation game engine.

In my mind, there are several unsatisfied demand for game engines. It is because most of game engines have been improved from long time ago. For example, game bryo, Unreal, Quake, and Half-Life. Those game engines did not have chance to adopt new concepts.

The features that I want to have are listed here:
  1. Interface and contract driven design.
  2. Support for Unit-Testing.
  3. Pre-lighting as well as deferred shading and forward shading.
Additionally and personally I prefer these features as well:
  1. Progressive rendering.
  2. SPU support on PS3.
  3. XNA support through C++ DLL.
  4. No preprocessor codes such as #ifdef.
Other external tool supports such as Max/Maya plug-in, UI tool, and sound system are second level issue. Once we have the reliable corner stone, people will join the project.

Truck number or lucky number.

Truck number is the minimum number of people who can mess up a project. The higher, the better.

For example, the truck number is 1 for a project, meaning that when one person got hit by a truck then the project cannot go on anymore.

If the example is too cruel, then we can also call it lucky number. For example, a person get big money by a lottery, he quit the company and the project cannot go on anymore. It means the lucky number is one.Now our company has one HR person. He is out of country more than 30 days. I need to talk to him about my Visa but I cannot process it; I'm wondering he might actually get a lottery.

One may think higher lucky number means less efficient, because of too much redundancy. Well, it may be true. Insurance is for reducing risk not to increase efficiency. And the risk may mess up your company or your life.

My company is small as a matter of fact. But I do not think it is as small as it can afford only one HR person. I do believe it is rather a matter of experience. People who don't have various experiences tend not to prepare for other risky situations.