All posts by Sascha

iOS VPN ondemand probes and caching

In our company we push out a VPN configuration to iOS devices that contains on demand rules to only connect to the VPN when not in the company network, based on various criteria.

TL;DR: unexpected caching behaviour of NEOnDemandRuleDisconnect.probeURL

We discovered that iOS (12.3.1) appears to aggressively cache the response of the probeURL when using NEOnDemandRuleDisconnect on demand rules in a VPN configuration.

The response is cached in certain situations, causing the rule to match when it shouldn’t and not to match when it should. This cache was only reset on rebooting the device and not when switching networks. We did not even see new DNS requests for the probeURL when switching between networks, which was even more problematic as we use Split-DNS for the resource that was probed.

The VPN on demand rules were evaluated every time the network changed, but the probe wasn’t sent and the rules still matched on the cached response.

  • When returning a 200 OK, make sure to add Cache-Control headers!!
  • Do not return 301 Moved Permanently!!

Checking for a HTTP resource

The “probe” resource was only available from within the company network, controlled by Split-DNS.

  // disconnect if "probe" returns 200 OK
  let rule1 = NEOnDemandRuleDisconnect()
  rule1.interfaceTypeMatch = .wiFi
  rule1.probeURL = URL(string: "")

  // otherwise connect
  let rule2 = NEOnDemandRuleConnect()
  rule2.interfaceTypeMatch = .wiFi

The probe returned a 200 OK when queried within the company network. Publicly the DNS entry was pointing to a S3 bucket which did not contain the probed resource.

NEOnDemandRuleConnect and probeURL

We use the NEOnDemandRuleConnect classes that are attached to a VPN configuration. The documentation of the probeURL attribute is fairly sparse:

var probeURL: URL?

An HTTP or HTTPS URL. If a request sent to this URL results in a HTTP 200 OK response and all of the other conditions in the rule match, then then rule matches. If this property is nil (the default), then an HTTP request does not factor into the rule match.

Apple’s Configuration Profile reference adds:

A URL to probe. If this URL is successfully fetched (returning a 200 HTTP status code) without redirection, this rule matches.

Cache rules

With extensive testing and sniffing we discovered the following cache behaviour of iOS for probeURL:

  • 301 Moved Permanently is cached for ever
  • 200 OK is cached for ever, if it contains content, ETag and Last-Modified headers and no Cache-Control.

Once a response like this has been seen, iOS will not send another DNS or HTTP query and will use the result until the device is rebooted.

In the case of the 301, the rule will always not match (as iOS will not follow redirects). In the case of the 200 response, the rue will always match, no matter what network the device is currently on.

Unexpected caching behaviour

Obviously the result of an on-demand probe should never be cached beyond the network boundaries, as it defeats the probe pointless.

Further we were expected to see at least DNS requests for the resource when switching networks as we use Split-DNS. Assuming that a response is still valid when DNS returns a different IP address is surprising.

When a resource returns ETag and Last-Modified headers and no further Cache-Control headers, we expected clients to validate whether the resource has been changed (304 Not Modified) rather than it being cached without validation.

Cache-Control headers

Adding the following header to our 200 OK response, caused iOS to always request the resource when processing the on demand probes:

Cache-Control: no-cache, no-store, must-revalidate, max-age=0 returns 0 immediately without blocking


If using non-blocking IO and setting SelectionKey.interestOps(0), a Selector will wake up on POLLHUP | POLLERR (e.g. connection reset by peer), but the JDK /Android SDK will not be able to surface this condition to the caller and will return 0 instead, potentially causing a loop with 100% CPU utilisation.


I was recently investigating an issue with an Android Selector going into a busy loop where select() would return immediately without any keys being selected.

From the Android API documentation:

This method performs a blocking selection operation. It returns only after at least one channel is selected, this selector’s wakeup method is invoked, or the current thread is interrupted, whichever comes first.

I knew that neither the current thread was interrupted nor that the selector was woken up, yet the select() returned with an empty selected key set, consuming 100% CPU in the resulting busy loop.

There are a few reports of similar problems, e.g.

In particular that first link suggests that other implementations have encountered this when using the JDK and found workarounds like recreating the entire selector based on a heuristig.

After trying many things and digging through all related Android/Java/JNI classes I have finally found the cause of the spurious wake ups, which appears to be a bug in the JDK or an illegal, yet undocumented, use case when using

Program flow pseudo code

Our selector loop detects when channels are ready for reading but would aim to offload the actual reading of the data to a secondary thread.

do {
  // block until any channel is ready
  int ret =
  // for any key that is ready
  for (key: selector.selectedKeys()) {
    if (key.isReadable()) {
      // make sure we are not waking up
      // until reading has finished

      // offload the reading to a worker
      // the worker will call
      // key.interestOps(OP_READ) when done

  // make sure all selected keys are cleared
  // because we've handled them all
} while (true);

As illustrated, once a key has been detected to be ready, we unregister all interested in all operations and offload the read work to a secondary thread. We did this by calling key.interestOps(0), and pass the channel to a worker thread. When the worker thread has completed the read, it registers the channel again with key.interestOps(OP_READ).

We observed a situation where select() would constantly return 0, with the selectedKeys set being empty, causing 100% CPU load.

Missing support for interestOps(0) in the JDK

The JDK / Android SDK promises that select will only wake up if any key is ready, the selector is woken up or interrupted, but neither of these things have happened. The documentation does not mention a special handling of interestOps(0) and it can be assumed that this is a valid operation to perform.

Under the hood Java and Android SDK are using poll(2) to block for the I/O ready state. The pollfd struct takes an events field (which in our case is 0) and it would populate a revents field with the values of the ready file description.

I found that poll(2) wakes up with revents being POLLHUP | POLLERR as a signal that the remove channel is closed. This is a valid case even when registered events is 0 and any read on such an fd would return -1.

The Android however FILTERS the read the nioReadyOps() by the nioInterestOps():, sk);
if ((sk.nioReadyOps() & sk.nioInterestOps())!=0) {

So even if nioReadyOps() would return a value, it would be masked out by the nioInterestOps(). Unfortunately, the translateAndSetReadyOps() in will ensure that even nioReadyOps() is 0, because it is set to intOps in case of error:

if ((ops & (Net.POLLERR | Net.POLLHUP)) != 0) {
  newOps = intOps;
  // No need to poll again in checkConnect,
  // the error will be detected there
  readyToConnect = true;
  return (newOps & ~oldOps) != 0;

Endless loop on interestOps(0) and POLLHUP | POLLERR

A selector will wake up if a registered channel has been disconnected even if interestOps has been set to 0!

However there is no way to access this condition in the userland because the condition has been masked out by the zero of interestOps. The selector will return an empty selectedKeys set instead even though poll signalled a ready file descriptor.

Fly as a chicken!

Chicken Tournament for Android received some updates recently, with some great improvements. Chicken can now fly!! This took a while to get right and I have replaced the very complex and realistic approach with much simplified mechanics (on the cost of loopings or barrel rolls).

Use either the on-screen D-Pad or fly using the device sensors.

Now I just need to teach the Computer chicken how to fly. 🙂

Check out the promo video below

As always only available in the Google Play Store:

Chicken Tournament Demo on Google Play