Mittwoch, 20. Februar 2013

[BGE+GLSL] Finding interest points

Interest Points
I learned a bit in a class called "Computer Vision".
One thing was "finding interest points". 
Basically when you have a picture and give it to your computer it doesn't know anything what to do with it. It's just plain boring data. Every point is as interesting as every other. On some images you might have the same feeling. On others you recognize things which gives it more value then just color. 

But what makes the difference?
How do you see something?

Contrast is a good starting point. If the picture is just like one color, everything blurry, a low contrast then it's hard to see something. With high contrast especially between objects it's much easier to stop objects.
Those places with high contrast are probably also good interest points. That's why we try to find them, or better: let the computer find them so he knows some interesting points on the image.

The following algorithm bits are part of SIFT.
At first you have your image. You make at least 4 blurred copies of it with each having a different blur-strength. Then you take two neighbors and make a difference of them. As a result you have 3 difference-pictures (1-2, 2-3, 3-4). Now in the last step you put those 3 difference-pictures virtually over each other. Walking over each pixel without the surrounding of the middle picture you check if the pixel is the absolute biggest or absolute smallest around its 26 neighboring pixels (3x3x3 - center, 9 above, 9 below, 8 on same).  If it is then you put a 1 on an extra final image at that place, else a 0. All those "1" spots now mark interest points.

Just one sample and one code
Just for the fun of it I tried implementing that inside the Blender Game Engine using GLSL code. The first implementation is probably the worst you can do. 
The image goes into the shader. At the fragment-shader you just have the value of the pixel of the current place on the texture. Because I need 27 difference-of-blur pixels I need to get those on the fly. So I get the neighboring pixels, blur them, do a difference and then do the check. 
It's really bad because most of those difference-of-blur pixels are calculated again and again and …
But it works!
(With about 4000ms taken for calculating…)

Here are the files for it:

image to find interest points in
interest points of the image

Many samples and some code
Having written nearly the same algorithm in Java getting around 400ms to find those interest points the GLSL-results weren't pleasing. 
In Java I used different pictures for the different results. That way I won't calculate needed points again and again. The same I tried inside Blender.
The problems:
→ How do I get the image blurred?
→ How do I create a difference of two blurred images?
→ How do I compare three images?

If you can't wait, here are the files for it:

"Use Frame Rate" on

"Use Frame Rate" off
the setup
As you can already see the setup for this way is much bigger then for the first version. I used 16 planes, 9 cameras and about 9 images for the different textures. 

Blurring with GLSL
The first step: getting the image blurred with different strength.
 Two substeps are therefore needed:
1] Get the image that should be blurred.
2] Blur it via GLSL.

I could have just used the texture I used for the main-plane but I choose a different way. A camera above the targeted plane is used to render the plane and create a texture from that.
The module-code is rather simple:
    1) own = cont.owner
    2) camSrc = scene.objects[own['Cam']]
    3) matID = texture.materialID(own, 'IM'+own['Tex']+own['ID'])
    4) own['RenderedTexture'] = texture.Texture(own, matID)
   5) own['RenderedTexture'].source = texture.ImageRender(scene, camSrc)
    6) own['RenderedTexture'].refresh(False)
With "texture.materialID" I query the id of the texture I want to render on. Then I create a new texture with "texture.Texture". The important part here is to save this new texture somewhere or else it will be destroyed when leaving the module. The call to "texture.ImageRender" does the actual rendering.
Having the desired image the next step is blurring it. GLSL is a fancy way of doing stuff to the texture and object. The object itself is irrelevant for here. I just needed the texture-changing. OK, it does not actually change the texture but the rendered view of it. There are ways to save the changes back to the texture itself via OpenGL and GLSL but it's not part of this project.
The code is divided in 3 parts: 
→ python-code to to set up the shader
→ a vertex shader
→ a fragment shader

You can write the shaders in the python file e.g. as a string and bind and use all that together. I like to split those in separate files. Having my shader-code loaded the setup-code looks like this:
for mat in mesh.materials:
        shader = mat.getShader()
        if shader != None:
            shader.setUniform1f('width', 1.0/mainCam['width'])
            shader.setUniform1f('height', 1.0/mainCam['height'])
            shader.setUniform1f('M_PI', math.pi)
            shader.setUniform1i('BlurFactor', own['BlurFactor'])
With "setSource" the code to use is set. "setSampler" is the method to tell the shader which texture to use.  The string should be in the shader-code, the number is the number of the texture in you texture-slot. Via "setUniform" some variables can be set. Here I tell the shader the step-size for width and height, the value of PI and the blur factor. Normally the projection, view and model matrices should also be given over at this point. But somehow I messed up. I need to investigate that further at a later point.
Having set this up just the shader code is missing.
Like I sad I won't need any object changes. Therefore the vertex-shader is as simple as this:
gl_TexCoord[0] = gl_MultiTexCoord0;
position = gl_ModelViewProjectionMatrix*gl_Vertex;
gl_Position = position;
Setting the UV-texture coordinates to use and the actual position.
Now to the fragment-shader with the blurring.
To blur a pixel you also need some of the pixels around. I used the Gaussian blur here. The sum of the the value at a position multiplied by its gauss-value is created for all points in a diameter of 11. Because of the formula of the Gaussian formula the sum can't be greater then 1. So we can use the result as the value of the color.
Now we have the second row with all 4 blurred images.

As you can see in the setup-picture above I have two rows with planes. That is because the GLSL code doesn't actually change the texture and therefore when I want to use the blurred images and just try using their textures they aren't changed like you think they should be. That's why I need to render the GLSL changed planes to textures so I can use those.

Difference of blurred
The third row contains those images that are the difference of two blurred images. The GLSL is similar to the one above. Because two images are needed as input two samples are given into the shader. Calculating the difference in the fragment-shader is quite simple:
texture(Image0, st)-texture(Image1, st)
Normally: Yes.
And the main-camera will show it just fine. But when you render it e.g. to a texture you will get problems. A color consists of 4 parts: Red Green Blue Alpha. The Alpha will make a problem here because "1 - 1 = 0". The input-samples will probably have an Alpha of 1 each. The result will get an Alpha of 0. Not good. That's why you should take care and turn the resulting Alpha to 1.

Maximum of 27
The last row is the initial image with the interest points in it. So you need 4 samples in the shader: the initial image and the three difference images.
It's plain simple: take all 3x3x3 = 27 points and check if the center is the biggest. If it is then put a white dot at the current place, else just show the normal image.

As you might have already seen the time to find/calculate the the interest points is much lower then before. One thing I don't understand is that turning the frame rate limiter off results in heavy load on the Rasterizer being about 13 times bigger with about the same result in Frametime. I guess that this might be a bug in the BGE.
Another difference to the calculation in one step except that this runs much faster is that this isn't done in one step. More precise not done in one frametime. In the first round the normal image is rendered. The different cameras just see black and render that. Then the camera of the main image sees something and the planes with the not jet blurred are getting a copy of the render. With that the GLSL blur shader can do the work on that. A round later the cameras over the blurred are seeing things and create textures of those which will go the difference-planes. Later those get GLSL-worked on, and then again rendered to texture. And then searching the maximum. Done.
If you would have just one frametime time to do it it won't work. But in the end we have like 60 render rounds per second. The result is much faster here then with the single approach from before.

Possible extensions
Having thought about the code a bit more I would say that there is a lot of improvement left. To name some simple things:
→ The calculation is done on a black-n-white image. With 4 channels per color 3 are unused. Worse: needless calculated. So e.g. all 4 blur-images code be create with in one texture. The problem here is with render-to-texture. The Alpha would kill you. A better way would be the right GLSL function to write the texture back or change the texture accordingly. Another problem would be that you couldn't simply show the texture unless you write a shader to just show the correct channel.
The same could be done with the difference-images and the maximum-storage.
→ Use the texture of the initial image a input for the blur-images. Don't render-to-texture it.
→ Having:
    → one initial image
    → one image with four blur
    → one image with three difference and one maxima
    All those could be put into one shader. So just one shader would have to be written to do it all. 
→ Having no image except the initial one. Just thinking: if a shader creates a global array. Is that still there in the next render round? If so all those extra images could be arrays. So you would just move inside those arrays instead of the images. It should give a performance boost I guess.
→ Avoid this kind of interest point detection. There are faster ones out there. E.g. have a look at SURF

The End
Doing this I learned a lot about image filtering, GLSL, BGE.
Maybe someone finds this round-up useful.
Ideas, comments, critics etc. are welcome.

See Ya!