Category Archives: General

Dropshadows for Freebloks for Android

The latest update of Freebloks 3D for Android adds nice drop shadows to falling stones. Instead of “correct” shadows using shadow volumes in a stencil buffer, the android version renders a pseudo drop shadow texture on the board. The shadows are not always correct, but it is much easier to add individual tinting, alpha or scale effects, depending on the distance of the stones. This adds a more realistic look and is easy on the hardware, because there is no need to recalculate the shadow volume each frame.

Screenshot_2013-04-18-12-01-57

Much improved AI speed for Freebloks for Android using jni

For the port of Freebloks 3D to Android I rewrote all code from C to Java. While that was working fine and resulted in greatly simplified networking code, the speed of the AI was not so great. It took up to 10 seconds on a fairly powerful SGS 2 for the computer to find a good move.

I was trying to move the CPU intense routines of the AI to C again, using jni as a bridge between Java and C. The simple network routines should stay in Java.

But the transfer of relevant game data to C and back to Java turned out to be very ugly, yet the solution was incredibly simple:

The Freebloks code was always split in two parts, the GUI/client part and the AI/server part, with the client and server always communicating using network sockets. Yes, even the single player version starts a network server and connects to localhost. The original source code always contained a package for running a dedicated server.

It was incredibly easy to copy the dedicated server code into my project, compile the C code with the NDK and connect it to Java with only a single jni call. It was running out of the box, with almost no change of the original C code at all! Since the server is running in a thread started from the native C code, there is no additional jni call neccessary and no data transfers except for the sockets.

The average duration for the AI to calculate a complete game dropped from 87 sec to 28 sec on my SGS 2. The version 0.0.2 in the Google Play Store supports ARMv5, ARMv7 and x86. Grab it now! You may also download a free apk file here.

And please don’t forget to give feedback.

Screenshot_2013-02-11-14-27-24

USB write performance drop on Fritz!Box 7170

I want to attach a USB stick to the AVM Fritz!Box 7170 to use as USB storage and be able to write to it using the integrated ftp server. When writing a bunch of files, the write performance drops to under 50 kb/sec, while the stick can easily handle 512 kb/sec. Why the bad performance and why the drop?

I replaced the stock AVM firmware with Freetz but got similar results. What got my attention is a drop in performance after copying 4 files, that does not recover after time. The following tests were done using the Freetz modification with Linux kernel 2.6.13.1-ohio.

Performance drop when writing

Look at these numbers when copying a bunch of files to the stick using scp:

$ scp tmp* root@fritz.box:/var/media/ftp/uStor00/
tmp1                          100% 2048KB 682.7KB/s   00:03    
tmp2                          100% 2048KB 512.0KB/s   00:04    
tmp3                          100% 2048KB 512.0KB/s   00:04    
tmp4                          100% 2048KB  55.4KB/s   00:37    
tmp5                          100% 2048KB  38.6KB/s   00:53

Each following transfer would then be at only 55KB/s. Issuing a sync command to flush out dirty buffers makes no difference, so the speed is not throttled by the USB stick being busy.

Let’s have a look at the VFS cache

The Linux kernel reveals some interesting cache and memory information in /proc/meminfo. These are numbers taken after a fresh boot:

# cat /proc/meminfo 
MemTotal:        30204 kB
MemFree:          9632 kB    # unused, completely free memory
Buffers:           280 kB
Cached:           6280 kB    # memory used for cached files
SwapCached:          0 kB
Active:           8652 kB
Inactive:         1524 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        30204 kB
LowFree:          9632 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:               0 kB    # memory waiting to be written to disk
Writeback:          0 kB    #  memory actively being written to disk
Mapped:           8040 kB
Slab:             6028 kB
CommitLimit:     15100 kB
Committed_AS:     5724 kB
PageTables:        240 kB
VmallocTotal:  1048560 kB
VmallocUsed:      4056 kB
VmallocChunk:  1043636 kB

While copying the first files, the highlighted numbers read like this:

MemFree:          1716 kB
Cached:          13704 kB
Active:           8976 kB
Inactive:         8928 kB
Dirty:            6836 kB     # lots of data waiting to be written
Writeback:         444 kB    # lots of data being actively writting

We see that the cache is filled up quickly with buffers also marked to be written on the stick (marked dirty) and that the pdflush daemon already started to write out chunks of consecutive data to the usb stick. Remember that usb sticks have good performance when streaming out data chunks that fit into the physical structure but bad performance, when writing out small chunks because a lot of the flash memory keeps being reread and overwritten. The performance is good here, because there are a lot of dirty buffers the kernel can optimize the writing out.

Writing file ‘tmp1’

Let’s go back and look at the numbers exactly after tmp1 has been written (2048 kB):

MemFree:          7100 kB    # before: 9632 kB
Cached:           8456 kB    # before: 6280 kB
Dirty:               0 kB
Writeback:           0 kB

The buffers have all been flushed, so the stick is idle. Our cache grew by 2048 kB taken from the free memory, containing now also the file tmp1.

Writing file ‘tmp2’

Copying file tmp2 (2048 kB) is fast and the memory info after copying is no surprise:

MemFree:          5084 kB    # 2048 kB less
Cached:          10504 kB    # 2048 kB more
Dirty:               0 kB
Writeback:           0 kB

Neither is tmp3 (2048 kB), because there is still unused memory left. But now it’s getting interesting, because write performance with tmp4 drops drastically.

Writing file ‘tmp4’ with no free memory

While writing tmp4, and the performance dropping to 30 KB/sec, the numbers look like this:

MemFree:          1148 kB
Cached:          13988 kB
Dirty:              12 kB
Writeback:          36 kB

Of course free memory is useless, we’d rather have everying to into the cache. The cache stays filled (we have tmp1, tmp2 and tmp3 in the cache), but the values for Dirty and Writeback are too low.

Before, the file to be written was completely loaded into the cache first and marked dirty.The pdflush daemon was started deferred and found rich caches to be written to disk.

The number of blocks marked dirty now never seems to exceed 50 kB. The pdflush daemon can only flush out small chunks of up to 36 kB at once (usually less), resulting in a lot of USB operations and overhead and low performance.

Clearing the cache helps

The Freetz kernel unfortunately does not expose /proc/sys/vm/drop_caches to drop all cached buffers. But what happens, if we rm tmp1:

MemFree:          1604 kB
Cached:          14004 kB

Nothing. tmp1 is not in the cache anymore and most likely tmp4 has taken it’s place, because it is newer. But tmp2 is still in the cache, so let’s rm it:

MemFree:          3464 kB     # rm tmp2 frees up the cached memory
Cached:          12152 kB     # the rm'ed file is removed from cache

Now we have over 3 MB free and unused memory and the file is not in the cache anymore.

Writing tmp5

Now let’s copy tmp5 (2048 KB). These are numbers from during the copy to see the values of Dirty and Writeback, so the file is only partly transfered yet:

MemFree:          2204 kB
Cached:          12948 kB
Dirty:             152 kB
Writeback:         424 kB

We again see high numbers for Dirty and Writeback as parts of the copied file are moved to the cache and dirty. The pdflush daemon gets huge chunks of buffers again to be streamed to the medium and we get a fairly high transfer rate.

Broken kernel behaviour

This is the fairly old Linux kernel 2.6.13.1-ohio from Freetz. The behaviour of the VFS and pdflush seems to be broken and thus result in very poor write performance:

  • when there is no free memory available, why doesn’t the kernel free more old cache memory for the new buffers to be marked dirty?
  • it seems, the pdflush daemon is forced to write out as soon as there are dirty buffers and memory is low (= no free memory). Why does the kernel seem to prefer to free memory by writing out dirty buffers instead of clearing the read cache to make room for more dirty buffers?
  • allocating new buffers seems to stall while pdflush daemon is freeing up dirty memory
  • new buffers are still taken from old cached files, so after copying the whole file, it is completely in the cache. why not put if completely in the cache before starting to write out and stall allocation of new buffers?
  • rm’ing a file that is in the cache, frees up the cache, resulting in performance boosts, until that free memory is used by the cache again and the pdflush daemon writes out much smaller chunks. Practically that won’t happen and a normal Linux system should never have large amounts of free memory.

This is a kernel bug preventing Fritz!Box 7170 from ever achieving good write performance on my USB stick and other mediums.

Summary

  1. When copying files to the Fritz!Box, the kernel caches these files in it’s cache only when free memory is available.
  2. It writes out the cached files to storage, with good performance because there are big chunks to be written. The files remain in the cache in case they are read.
  3. With a full cache and no free memory, new files aren’t cached anymore but directly written to the medium, resulting in a lot of small writes with big overhead and bad performance. The file is still in the cache after writing is done.
  4. Clearing up the cache results in free memory and write performance boosts until the cache is saturated again.
  5. Because the cache is only freed up on unmount, the situation almost never happens, making writing data to USB sticks a pain.

External harddrives might work better, because of fast integrated hardware caches that can take lots of small chunks. But on a USB stick without hardware cache, performance is killed by the small writes.

It is unlikely that this bug will be fixed by AVM or by Freetz for the Fritz!Box 7170 because it seems to be a flaw in the used Linux kernel and AVM does not update the 7170 firmware anymore.

Is this a known bug and is this fixed in newer kernels?

Out of Memory in GLSurfaceView on resume

The SympTom

After publishing WordMix with the OpenGL accelerated 2D game view (using GLSurfaceView), I received weird crash reports from some devices, mostly out of memory from within the GL context:

android.opengl.GLException: out of memory
 at android.opengl.GLErrorWrapper.checkError(GLErrorWrapper.java:62)
 at android.opengl.GLErrorWrapper.glGenTextures(GLErrorWrapper.java:350)
 at [...]

From the very limited information the Google Play Developer Console gives me about crash reports, I assumed it only affects devices running Android version 3. Modifying the code only caused the out of memory exception to be thrown at random other places, even at GL10.glClear(…)!

I also found out, the crash only happens when the user finishes a subactivity that would leave to the activity containing the GLSurfaceView. Users were complaining about the crash happening before starting a second game, which puzzled me, because all my rendering code seemed to be working fine on all devices running Android 4. Everything worked fine without the GLSurfaceView as well.

Looking that the source code for GLSurfaceView, nothing interesting was changed between Android 3.2 to Android 4, so the GLSurfaceView was hardly to blame, but more the hardware, drivers or specific OpenGL implementation.

The problem

The actual problem was very hard to track down and took me several hours and was particularly hard because I did not have an Android 3 tablet for debugging:

Up to Android 2.3, views were drawn in software and later composited using the hardware. Android 3 introduced an alternative hardware accelerated drawing engine for everything that uses Canvas classes. This alternative render path is disabled by default in Android 3 and supposedly enabled by default in Android 4 (previous blog post).

When I found out, that the Samsung Galaxy S2 does not enable hardware acceleration by default, I did set

<application android:hardwareAccelerated="true" ...>

in the AndroidManifest.xml for all activities that should support hardware acceleration. Using hardware acceleration for the activity with the anyway hardware accelerated GLSurfaceView did not make much of a difference. But accelerating the results or preferences activity, for example, gave a nice performance boost on my SGS2.

It turns out that the crash happens in Android, when an activity, that contains a GLSurfaceView, is paused for a fullscreen activity, that is hardware accelerated. When that hardware accelerated activity is finished, the underlying GLSurfaceView is screwed up, throwing out of memory exceptions, even though the GL context is completely reinitialized correctly.

The solution

Yes, I should have tested more the effects of hardwareAccelerated=”true”.

Leaving that attribute entirely unset is recommended for Android 3, especially when you use a GLSurfaceView, and should not hurt Android 4 devices as well. Setting a reasonable default value is then up to the manufacturers.

Summary

  • and suspend that activity by starting another fullscreen activity
  • and that activity is hardwareAccelerated by setting so in the AndroidManifest.xml
  • and you target Android 3 devices
  • expect weird behaviour like out-of-memory exceptions

Welcome to fragmentation. Just let hardwareAccelerated be unset.

OpenGL antialiasing in Android and transparent textures

I tried to replace the legacy 2D rendering code of WordMix, which uses the native Android canvas methods, with an OpenGL renderer to allow for fancy effects and animations.

First attempt

simple texture with full bleed image, no border

Because the tiles are simple rectangles with round corners, I created a texture with gimp and rendered a quad in OpenGL. The texture had no mipmaps and was filtered linear for both, minimizing and magnifying. When rotating that quad, I got the typical “staircase” lines, because I did not use anti-aliasing / multisampling. The result looks rather horrible:

no multisampling, no mipmaps, simple texture result in typical “staircase” borders

You can see two effects, one if it being the clear staircase borders, where the texture is not linear filtered, and you see the round corners of the texture with a grayish border, I’ll explain in the next paragraphs.

Multisampling emulation to remove “staircase”

So how to achieve multisampling in OpenGL ES 1.1? The answer I found is quite simple and easy on the hardware: use a texture with a transparent border and linear texture interpolation will do the rest. So I modified the texture to include a transparent border and rendered the quads slightly bigger to fill the same amount of pixels.

texture image with transparent border

The result looked better but I was not satisfied with the borders. I saw the interpolations but there is still a very visible “staircase”. Plus it seems, that the borders are blended with a black color, which can be seen on the overlapping tiles:

no mipmaps, texture with transparent border, still visible “staircase” and dark colored border

This is in fact due to my texture, which had the transparent pixels assigned the color black. The OpenGL interpolation would just average two neighbour pixels, which would calculate like

(argb(1, 1, 1, 1) + argb(0, 0, 0, 0)) / 2 = argb(0.5, 0.5, 0.5, 0.5)

which is a semi transparent gray color tone.

Monkeying with gimp for transparent pixel

So how to create a texture, where the transparent pixels have the color white? Gimp seemed to screw up the color of transparent pixels even though when exporting my work as png file, it offers to keep the color of transparent pixels.

The trick: combine all visible planes, create an alpha channel and change the color layer. If you have uncombined planes, the result is unpredictable and the colors are screwed up.

So now I had a texture with a white but fully transparent border (value 0x00FFFFFF) and I’d expect the calculation to be

(argb(1, 1, 1, 1) + argb(0, 1, 1, 1)) / 2 = argb(0.5, 1, 1, 1)

But I still got the same result:

texture with transparent + white border: still black border in Android

Bitmaps with transparent pixels in Android

So why is my border still black, while the texture has white transparent regions? I checked the loaded Bitmap with this code after loading the png resource:

Bitmap bmp = BitmapFactory.decodeResource(getResources(), R.drawable.stone);
Log.d("texture", bmp.getPixel(0, 0)); /* result: 0 */

Why is the result 0?? I’d expect a 0x00FFFFFF, but either Androids Bitmap loader premultiplies the alpha or recompresses the image file on compile, although I did place the image in the res/drawable-nodpi folder.

But apparently Bitmap and Canvas throw away all color information, when drawing with an alpha value of 0. This results in a fully transparent, but black canvas:

canvas.drawColor(Color.argb(0, 255, 255, 255), Mode.SRC);
Log.d("texture", bmp.getPixel(0, 0)); /* result: 0 */

while the following results in a white canvas, which is almost transparent (1/256):

canvas.drawColor(Color.argb(1, 255, 255, 255), Mode.SRC);
Log.d("texture", bmp.getPixel(0, 0)); /* result: 0x01FFFFFF */

Good to know, so now I create my texture with a border that is almost transparent, but not completely (alpha value 1/256) and the color white, which should be hardly visible, calculating like:

(argb(1, 1, 1, 1) + argb(0.01, 1, 1, 1)) / 2 = argb(0.505, 1, 1, 1)

I checked with above Log code and indeed got the value 0x01FFFFFF. So at least the Bitmap was loaded correctly now. But I still get a  black border and the result looks the same. Why?

Creating OpenGL textures with unmultiplied alpha

I found a post and bug report that apparently the GLUtils.glTexImage2D() screws with the alpha and colors too, creating texture values of 0x01010101, which gets blended with the nearby white pixels on linear filtering. What the…?

The post suggests a workaround to not use GLUtils to load the Bitmap into an OpenGL texture but use the original GL10.glTexImage2D(). While the code in that post is not very efficient, it does result in nice and smooth blended borders. Of course the use of mipmaps helps too to make the texture smooth when minified:

texture with 0x01FFFFFF almost-transparent border and use of original GL10.glTexImage2D method and mipmaps

Summary

Several culprits were found to make antialiasing work with an Android App using OpenGL ES 1.1:

  1. Create textures that have transparent borders, so linear filtering emulated oversampling at polygon borders
  2. Make sure the transparent border of your texture contains color values, which will “bleed” into the border pixels of the texture.
  3. If you use mipmaps, make sure you have enough transparent border pixels or set GL_TEXTURE_WRAP to GL_CLAMP.
  4. Double check result, because gimp does screw up when having multiple layers, that are merged when exporting as png image.
  5. Androids Bitmap loader and Canvas code seems to zero out the color values when alpha is 0. The workaround to keep the color values on load: Use colored pixels with alpha value of 1 (of 255).
  6. The GLUtils.glTexImage2D implementation premultiplies alpha values with color values, resulting in very dark color, instead of the white I wanted. Use the GL10.glTexImage2D directly (example in this post).

Using mipmaps and adding a nice shadow texture results in a screen, that looks very similar to the original, but which is much faster:

final result with all workarounds, mipmaps and shadows

WordMix learning Russian

The next WordMix and WordMix Pro release will include support for Russian, Portuguese and Dutch as dictionary languages. I had a lot of fun with the Cyrillic encoding of characters and especially the database for the words as I learned that a lot of Linux tools are still not ready for handling multi byte character sequences correctly.

Mostly the tool tr kept me busy, when I tried to convert lower case letters to upper case. The normal approach of

tr [:lower:] [:upper:]

only seems to work for the ASCII character set. If manually used on UTF-8 data, it screws everything up even more, like in the command:

tr \
  абвгдеёжзийклмнопрстуфхцчшщъыьэюя \
  AБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

The trick was to use tr on the original KOI8-R encoded data (which is 8 bit), for which I also had to pass KOI8-R encoded parameters to the tool, which was a pain inside an otherwise UTF-8 encoded shell script. So I tried to read the KOI8-R encoded parameters from a file before passing it as arguments so I don’t screw up my shell script.

It took me several hours and attempts to find that out and to get all the encodings right, so now a working Russian dictionary is available. 🙂 It won’t be shipped by default though, so it needs to be fetched from the Internet once by the game, on first use.

Of course the global ranklist is prepared for the new languages as well.

manually enabling deflate filter

Oh my, my webhoster has the deflate output filter disabled by default, that enables gzip compression of outgoing content. This is important for huge xml/json data from webservices that travel over mobile networks and easily reduces used bandwith to up to 10%.

Putting this in my .htaccess did the trick:

AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript

You can analyze the traffic with Wireshark, Firebug or use an online tool:

Make sure, your mobile app sets the Accept-Encoding of the request accordingly.

 

Hardware acceleration on SGS2 with Android 4.0

Starting from Android 3 (API level 11), there is a hardware renderer for 2D graphics, which drastically increases performance. The hardware acceleration was disabled by default and had to be enabled by the developer by declaring in his AndroidManifest.xml file:

<application android:hardwareAccelerated="true" ...>

According to the android documentation, that value changed to true starting with API level 14:

The default value is "true" if you’ve set either minSdkVersion or targetSdkVersion to "14" or higher; otherwise, it’s "false".

This is true for some devices (like the HTC Sensation with Android 4.0.3), but does NOT apply for the Samsung Galaxy S2 with official Android 4.0.3 and 4.0.4.

Applications without above attribute explicitly set to true are not hardware accelerated automatically on that device. On the HTC Sensation they are.

So don’t forget to declare that attribute in your AndroidManifest.xml file, if you want hardware acceleration on all devices.

Users can force-enable the use of GPU rendering in the developer options, which can be used as a workaround with the risk of incompatible applications yielding render errors.

WordMix 2D view must not use hardware acceleration

Currently, the 2D view of my WordMix game uses some features of Canvas, that are incompatible with hardware acceleration and results in display bugs. These glitches did not occur on my Samsung Galaxy S2, because it was not hardware accelerated as stated above, but occured on another device, a HTC Sensation with Android 4. Took me a while to figure out, what exactly was going on, but after declaring

<application android:hardwareAccelerated="false" ...>

fixed it for the time being, until I changed the render code to be hardware accelerated.

android.hardware.faketouch vs. android.hardware.touchscreen

By default, an Android application requires the feature android.hardware.touchscreen, which is then used for filtering in Google Play, so only devices, that have touchscreen capabilities, will be able to install that app.

Besides that, there is a more basic feature, android.hardware.faketouch; android docs state:

If your application requires basic point and click interaction (in other words, it won’t work with only a d-pad controller), you should declare this feature. Because this is the minimum level of touch interaction, your app will also be compatible with devices that offer more complex touch interfaces.

If the application does not require touchscreen features, it is recommended to set android.hardware.touchscreen to not be required, but declare android.hardware.faketouch instead, so I did this for WordMix, which should work with faketouch devices, too:

<uses-feature android:name="android.hardware.touchscreen" android:required="false" />
<uses-feature android:name="android.hardware.faketouch" android:required="true" />

If you do that, check the results on Google Play, which shows the number of supported devices:

  • touchscreen required, faketouch not required: 1500
  • touchscreen not required, faketouch required: 860
  • neither required: 1800

That is odd and not according to the documentation. For example a Samsung GT-S5360 seems to support touchscreen, but not faketouch. The Samsung Galaxy S2 supports both. You can include all devices by setting touchscreen to be not required, which then includes all faketouch devices, but also all devices that have even less input capabilities.

First playable version of Freebloks for Android

Freebloks-0.0.1_pre1.apk

I just pushed a first playable version of Freebloks for Android, a port of Freebloks 3D to the Android system. It’s a very early development version and very rough, but you can start a single player game and join a network game and place stones. The user interface still needs a lot of love, but I might push a preview version on Google Play soon. Please get involved on GitHub, if you want to contibute.

Here are two screenshots:

 

New featured image of WordMix

I tried using the Blender modelling software to create a new featured image for WordMix in Google Play, but I just can’t wrap my head around it. The software has such a horrible UI, that after several hours of trying, I still was unable to create a dice looking object. The learning curve is very flat, I wish I had more time.

This is the result with 3dsmax instead:

The previous image was created with Gimp, based on a screenshot of the game, which I took in a tabet android emulator. It sure did look poorly modified and, compared to the screenshots in Google Play, it did not add anything more: