After lots of thoughts, I came up with another idea about soft-shadow map.
The basic idea is that we render objects on Wire-frame mode on the shadow map as fourth channel. This wire-frame information is used as edges. Then when we retrieve the depth information from the depth map, we multi-sample on pixels where the wire-frame is rendered; otherwise, we sample only one time.
The idea is based on multi-sampling shadow mapping; for example, Percentage-closer filtering (PCF). It will be very costly if we just multi-sample on every pixels. We want to multi-sample only on edges. However edge detection on shadow map by doing image processing is very hard. My idea is to use 3D model information to get those edges by rendering wire-frame lines.
If we can control the width of wire-frame lines, we can increase the size of penumbra region, which will require more multi-sampling. It will be better if we can have different width on different depth, but it would not be possible sadly. If the fixed width give us bad result, we may need to blur the lines; possibly at lower resolution.
It is also possible to use stencil buffer instead of rendering actual lines on a color buffer. If we are already using a depth buffer on console machines for shadow map, we can reuse the stencil channel at free.
I haven't seen any rendering tricks that use wire-frame information yet. It makes me wonder whether this idea is feasible or not. And although I assume that wire-frame rendering will be cheaper than normal rendering, it may not be true; I need to test this.
I kept looking for any ways to eliminate the multi-sampling cost at all by doing some of pre-processing on the wire-frame image; for example, blurring or pre-calculating. However, I found that it may not be worthy, because each pixel on the shadow map does not correspond to each pixel on the screen. In other words, some of pixels on the shadow map will be used several times while others may not be used at all. In addition, image processing on the shadow depth map is relatively expensive, because each pixel needs to fetch at least two or normally five texels.
Edge detection with wire-frame rendering will give us false-positive a lot. It will be nicer if we can eliminate those wrong edge lines. One way to do this is to use normal information, but I'm not sure whether I can use shader programs on wire-frame mode.
It is also possible to use a derivative image of the shadow map. The derivative image can eliminate those false-positive edges. Taking derivative image will require at least three times of texture fetching per pixel, which is expensive. But if the number of multi-sampling on edge side will be more than three times, taking derivative can be an option as pre-processing. After updating stencil buffer by rendering the wire-frame, we can reduce the cost of 2D image processing with stencil early cull.
Another thing I want to mention is that there are lots of missing information on the shadow map that is necessary for softening penumbra region. Even after the costly pre-calculation on the shadow map, those reconstructed inner-penumbra information is not correct; by inner-penumbra I mean penumbra region inside shadow, while outer-penumbra will be penumbra region out side of shadow. Thus I think it is better to multi-sample latter with the actual depth.
I will try to implement this idea to see feasibility.