Tag Archives: linux

Much improved AI speed for Freebloks for Android using jni

February 12, 2013 Sascha

For the port of Freebloks 3D to Android I rewrote all code from C to Java. While that was working fine and resulted in greatly simplified networking code, the speed of the AI was not so great. It took up to 10 seconds on a fairly powerful SGS 2 for the computer to find a good move.

I was trying to move the CPU intense routines of the AI to C again, using jni as a bridge between Java and C. The simple network routines should stay in Java.

But the transfer of relevant game data to C and back to Java turned out to be very ugly, yet the solution was incredibly simple:

The Freebloks code was always split in two parts, the GUI/client part and the AI/server part, with the client and server always communicating using network sockets. Yes, even the single player version starts a network server and connects to localhost. The original source code always contained a package for running a dedicated server.

It was incredibly easy to copy the dedicated server code into my project, compile the C code with the NDK and connect it to Java with only a single jni call. It was running out of the box, with almost no change of the original C code at all! Since the server is running in a thread started from the native C code, there is no additional jni call neccessary and no data transfers except for the sockets.

The average duration for the AI to calculate a complete game dropped from 87 sec to 28 sec on my SGS 2. The version 0.0.2 in the Google Play Store supports ARMv5, ARMv7 and x86. Grab it now! You may also download a free apk file here.

And please don’t forget to give feedback.

General

USB write performance drop on Fritz!Box 7170

November 26, 2012 Sascha

I want to attach a USB stick to the AVM Fritz!Box 7170 to use as USB storage and be able to write to it using the integrated ftp server. When writing a bunch of files, the write performance drops to under 50 kb/sec, while the stick can easily handle 512 kb/sec. Why the bad performance and why the drop?

I replaced the stock AVM firmware with Freetz but got similar results. What got my attention is a drop in performance after copying 4 files, that does not recover after time. The following tests were done using the Freetz modification with Linux kernel 2.6.13.1-ohio.

Performance drop when writing

Look at these numbers when copying a bunch of files to the stick using scp:

$ scp tmp* root@fritz.box:/var/media/ftp/uStor00/
tmp1                          100% 2048KB 682.7KB/s   00:03    
tmp2                          100% 2048KB 512.0KB/s   00:04    
tmp3                          100% 2048KB 512.0KB/s   00:04    
tmp4                          100% 2048KB  55.4KB/s   00:37    
tmp5                          100% 2048KB  38.6KB/s   00:53

Each following transfer would then be at only 55KB/s. Issuing a sync command to flush out dirty buffers makes no difference, so the speed is not throttled by the USB stick being busy.

Let’s have a look at the VFS cache

The Linux kernel reveals some interesting cache and memory information in /proc/meminfo. These are numbers taken after a fresh boot:

# cat /proc/meminfo 
MemTotal:        30204 kB
MemFree:          9632 kB    # unused, completely free memory
Buffers:           280 kB
Cached:           6280 kB    # memory used for cached files
SwapCached:          0 kB
Active:           8652 kB
Inactive:         1524 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        30204 kB
LowFree:          9632 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:               0 kB    # memory waiting to be written to disk
Writeback:          0 kB    #  memory actively being written to disk
Mapped:           8040 kB
Slab:             6028 kB
CommitLimit:     15100 kB
Committed_AS:     5724 kB
PageTables:        240 kB
VmallocTotal:  1048560 kB
VmallocUsed:      4056 kB
VmallocChunk:  1043636 kB

While copying the first files, the highlighted numbers read like this:

MemFree:          1716 kB
Cached:          13704 kB
Active:           8976 kB
Inactive:         8928 kB
Dirty:            6836 kB     # lots of data waiting to be written
Writeback:         444 kB    # lots of data being actively writting

We see that the cache is filled up quickly with buffers also marked to be written on the stick (marked dirty) and that the pdflush daemon already started to write out chunks of consecutive data to the usb stick. Remember that usb sticks have good performance when streaming out data chunks that fit into the physical structure but bad performance, when writing out small chunks because a lot of the flash memory keeps being reread and overwritten. The performance is good here, because there are a lot of dirty buffers the kernel can optimize the writing out.

Writing file ‘tmp1’

Let’s go back and look at the numbers exactly after tmp1 has been written (2048 kB):

MemFree:          7100 kB    # before: 9632 kB
Cached:           8456 kB    # before: 6280 kB
Dirty:               0 kB
Writeback:           0 kB

The buffers have all been flushed, so the stick is idle. Our cache grew by 2048 kB taken from the free memory, containing now also the file tmp1.

Writing file ‘tmp2’

Copying file tmp2 (2048 kB) is fast and the memory info after copying is no surprise:

MemFree:          5084 kB    # 2048 kB less
Cached:          10504 kB    # 2048 kB more
Dirty:               0 kB
Writeback:           0 kB

Neither is tmp3 (2048 kB), because there is still unused memory left. But now it’s getting interesting, because write performance with tmp4 drops drastically.

Writing file ‘tmp4’ with no free memory

While writing tmp4, and the performance dropping to 30 KB/sec, the numbers look like this:

MemFree:          1148 kB
Cached:          13988 kB
Dirty:              12 kB
Writeback:          36 kB

Of course free memory is useless, we’d rather have everying to into the cache. The cache stays filled (we have tmp1, tmp2 and tmp3 in the cache), but the values for Dirty and Writeback are too low.

Before, the file to be written was completely loaded into the cache first and marked dirty.The pdflush daemon was started deferred and found rich caches to be written to disk.

The number of blocks marked dirty now never seems to exceed 50 kB. The pdflush daemon can only flush out small chunks of up to 36 kB at once (usually less), resulting in a lot of USB operations and overhead and low performance.

Clearing the cache helps

The Freetz kernel unfortunately does not expose /proc/sys/vm/drop_caches to drop all cached buffers. But what happens, if we rm tmp1:

MemFree:          1604 kB
Cached:          14004 kB

Nothing. tmp1 is not in the cache anymore and most likely tmp4 has taken it’s place, because it is newer. But tmp2 is still in the cache, so let’s rm it:

MemFree:          3464 kB     # rm tmp2 frees up the cached memory
Cached:          12152 kB     # the rm'ed file is removed from cache

Now we have over 3 MB free and unused memory and the file is not in the cache anymore.

Writing tmp5

Now let’s copy tmp5 (2048 KB). These are numbers from during the copy to see the values of Dirty and Writeback, so the file is only partly transfered yet:

MemFree:          2204 kB
Cached:          12948 kB
Dirty:             152 kB
Writeback:         424 kB

We again see high numbers for Dirty and Writeback as parts of the copied file are moved to the cache and dirty. The pdflush daemon gets huge chunks of buffers again to be streamed to the medium and we get a fairly high transfer rate.

Broken kernel behaviour

This is the fairly old Linux kernel 2.6.13.1-ohio from Freetz. The behaviour of the VFS and pdflush seems to be broken and thus result in very poor write performance:

when there is no free memory available, why doesn’t the kernel free more old cache memory for the new buffers to be marked dirty?
it seems, the pdflush daemon is forced to write out as soon as there are dirty buffers and memory is low (= no free memory). Why does the kernel seem to prefer to free memory by writing out dirty buffers instead of clearing the read cache to make room for more dirty buffers?
allocating new buffers seems to stall while pdflush daemon is freeing up dirty memory
new buffers are still taken from old cached files, so after copying the whole file, it is completely in the cache. why not put if completely in the cache before starting to write out and stall allocation of new buffers?
rm’ing a file that is in the cache, frees up the cache, resulting in performance boosts, until that free memory is used by the cache again and the pdflush daemon writes out much smaller chunks. Practically that won’t happen and a normal Linux system should never have large amounts of free memory.

This is a kernel bug preventing Fritz!Box 7170 from ever achieving good write performance on my USB stick and other mediums.

Summary

When copying files to the Fritz!Box, the kernel caches these files in it’s cache only when free memory is available.
It writes out the cached files to storage, with good performance because there are big chunks to be written. The files remain in the cache in case they are read.
With a full cache and no free memory, new files aren’t cached anymore but directly written to the medium, resulting in a lot of small writes with big overhead and bad performance. The file is still in the cache after writing is done.
Clearing up the cache results in free memory and write performance boosts until the cache is saturated again.
Because the cache is only freed up on unmount, the situation almost never happens, making writing data to USB sticks a pain.

External harddrives might work better, because of fast integrated hardware caches that can take lots of small chunks. But on a USB stick without hardware cache, performance is killed by the small writes.

It is unlikely that this bug will be fixed by AVM or by Freetz for the Fritz!Box 7170 because it seems to be a flaw in the used Linux kernel and AVM does not update the 7170 firmware anymore.

Is this a known bug and is this fixed in newer kernels?

General

isatapd hits the portage tree

November 11, 2012 Sascha

My ISATAP client daemon for Linux finally hits the Gentoo portage tree.

When originally released, the tool quickly got packaged into Ubuntu (10.04), Debian 6 (Squeeze) and derivates, but it needed another 2 years to get integrated in Gentoo. At last. 🙂

# emerge isatapd

Source code is available on http://github.com/shlusiak/isatapd

General

WordMix learning Russian

October 18, 2012 Sascha

The next WordMix and WordMix Pro release will include support for Russian, Portuguese and Dutch as dictionary languages. I had a lot of fun with the Cyrillic encoding of characters and especially the database for the words as I learned that a lot of Linux tools are still not ready for handling multi byte character sequences correctly.

Mostly the tool tr kept me busy, when I tried to convert lower case letters to upper case. The normal approach of

tr [:lower:] [:upper:]

only seems to work for the ASCII character set. If manually used on UTF-8 data, it screws everything up even more, like in the command:

tr \
  абвгдеёжзийклмнопрстуфхцчшщъыьэюя \
  AБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

The trick was to use tr on the original KOI8-R encoded data (which is 8 bit), for which I also had to pass KOI8-R encoded parameters to the tool, which was a pain inside an otherwise UTF-8 encoded shell script. So I tried to read the KOI8-R encoded parameters from a file before passing it as arguments so I don’t screw up my shell script.

It took me several hours and attempts to find that out and to get all the encodings right, so now a working Russian dictionary is available. 🙂 It won’t be shipped by default though, so it needs to be fetched from the Internet once by the game, on first use.

Of course the global ranklist is prepared for the new languages as well.

General

Back from XDC2012

September 24, 2012 Sascha

Finally back from XDC2012, which was at the SuSe headquarters in Nürnberg, Germany this year. It was an awesome conference with a lot of interesting talks and people.

Because it was my first time at an X.Org Developer Conference, I took the chance to give a talk about the joystick input module, I created and am maintaining ever since:

General

isatapd githubbed

August 12, 2012 Sascha

The ISATAP client daemon utility, that I wrote as part of my master thesis, is now available on GitHub.

isatapd is the userspace utility for Linux, neccessary to function as an ISATAP client.

While the tool always was open source and free under the terms of the GPLv2, making it available on GitHub makes it much easier to contribute or collaborate.

Fork isatapd on GitHub: https://github.com/shlusiak/isatapd

General

Creating a custom LSF email router using exim

July 28, 2012 Sascha

The University of Applied Sciences Weingarten, as many others, uses an installation of LSF (Lehre Studium Forschung) to manage students, departments and courses.When working at the datacenter of the university, I was part of a project to make the data of the LSF data available to the email system, to make LSF the primary system to connect user accounts with departments or classes. While accounts were primarily managed using LDAP, the system was not flexible enough to hold information about enrolled courses or roles in LSF, like professor, student, assistant, member of department X, etc.

Connecting LSF to the email system

News article: LSF email router

A custom webservice within the LSF framework however, replied a query with a nice plaintext list of user account names, which easily can be mapped to the canonical email adress of each given account. So we have the data, now how do we route the emails?

The datacenter ran two powerful Cisco Ironport appliances, used as MX, filter and first email router. I decided to put the lsf-email-router part on a separate Linux machine, the lists.hs-weingarten.de server, already running exim4 and an instance of mailman for the faculty. The Ironport appliances already routed every mail for lists.hs-weingarten.de subdomain to the lists server.

Special addresses

The exim4 router should detect special LSF email addresses (starting with lsf-), querying LSF for the data, and forward the email to the received list of user accounts, injecting the emails back into the email system to the main mail routers, that resolved account names to actual mailboxes.

Several adresses and queries were prepared, which can be easily extended, for example:

lsf-einrichtung-XXX@… (department XXX)
lsf-veranstaltung-xyz@… (all members of course xyz)
lsf-einrichtung-XXX-funktion-YYY@… (department XXX, role YYY)
lsf-veranstaltung-XYZ-20112@… (members of course xyz of winter semester 2011/12)
lsf-studiengang-XY@… (all students of the field XY)

A complete list of available dynamic addresses can be found in the service portal: http://rz-serviceportal.hs-weingarten.de/lsf-email-verteiler

The custom script

Two ninjas created a short perl script to be called from exim4, that did the following tasks:

Separate the parts of the email address and check for validity
Determine cache file for the special list and exit script, if cache file is younger than 6 hours. This is done to not query the webservice more often than necessary and using 6h old data was acceptable in this case.
If the cache file does not exist or is older, the LSF webservice is queried for the list of accounts. On success, the cache file is created or updated.
On failure, the LSF might be down, but the email system shouldn’t. If a cache file exists and is younger than 1 week, use it anyway.
If the cache file is too old, delete it and fail. Do not use cached file that is too old but rather fail routing the email, which will result in a mailer daemon message back to the sender.
The perl script exits with an error code 0 or 1 to indicate success or failure.

Now we have a cache file with valid email addresses, or we don’t.

The exim configuration

Integrating the perl script into exim, using exim4’s splitted config, was demeaningly easy, but was actually the tricky part (file /etc/exim4/conf.d/router/475-lsf-router):

lsf_router:
    driver = redirect
    local_part_prefix = lsf-
    file = ${run{/usr/bin/perl /etc/exim4/lsf-router.pl \
           --lsflist $local_part_prefix$local_part} \
           {/var/cache/lsf-router/$local_part_prefix$local_part} \
           {/var/cache/lsf-router/error}}

This router entry works as a redirect router, giving it a file to expand email adresses from. Like a boss, we used exim’s ${run} string expansion to run our script every time an email is routed that maches the lsf- local part prefix of the address. Depending on the success or failure of the ran command, the expansion returns one of two strings. On success, the used filename is the whole local part of the destination address, on failure it is a non-existing path, which will result in a permanent routing failure.

Voilà, our LSF email router is done and working. Sending an email to lsf-veranstaltung-XYZ@… results in the email being routed to the Linux box, the script will be run to refresh the cache for the special email address, querying LSF at most every 6 hours, expanding the data and exim will forward the email to all user accounts.

It’s like using mailman for managing university courses and departments – except that it’s great.

For further information, feel free to contact me or rechenzentrum@hs-weingarten.de

kwin-style-crystal repository on GitHub

July 7, 2012 Sascha

I prepared a public repository on GitHub with all the code of the crystal window decoration, from crystal-0.5 for KDE-3.2 to the current crystal-2.2.0 for KDE-4.9 beta.

Please feel free to contribute:

https://github.com/shlusiak/kwin-style-crystal

There are also ongoing discussions about integrating the crystal window decoration into the official KDE SC, moving all development into the offical KWin tree, next to the current default Oxygen deco.

Currently all major distributions ship optional binary crystal packages. If not, see the official crystal project page on kde-look.org.

General

Crystal-2.1.1 casting shadows ahead

July 1, 2012 Sascha

I just added shadow support in Crystal-2.1.1 (for KDE 4.8) and Crystal-2.2.0 (for KDE 4.9), as well as some minor fixes. Since KDE 4.8 the window decoration itself is responsible for rendering shadows when compositing. Because Crystal uses an unstable kwin API to support tabbing, versions before 2.2.0 are unable to compile, which has the exact same features than 2.1.1.

The shadows aren’t the prettiest but better than none. Hopefully I’ll have time to do further improvements, refactoring, increased speed, stability and graphical improvements.

I hope to get a stable crystal release, which would blend in well with the rest of KDE SC.

Get crystal from the project page on kde-look.org or wait for your distribution to catch up:

Crystal-2.2.0 on KDE-Look.org

Sascha Hlusiak

Tag Archives: linux

Much improved AI speed for Freebloks for Android using jni

USB write performance drop on Fritz!Box 7170

Performance drop when writing

Let’s have a look at the VFS cache

Writing file ‘tmp1’

Writing file ‘tmp2’

Writing file ‘tmp4’ with no free memory

Clearing the cache helps

Writing tmp5

Broken kernel behaviour

Summary

isatapd hits the portage tree

WordMix learning Russian

Back from XDC2012

isatapd githubbed

Creating a custom LSF email router using exim

Connecting LSF to the email system

Special addresses

The custom script

The exim configuration

See also:

kwin-style-crystal repository on GitHub

Crystal-2.1.1 casting shadows ahead

what once was lost