Selector.select() returns 0 immediately without blocking

TL;DR

If using non-blocking IO and setting SelectionKey.interestOps(0), a Selector will wake up on POLLHUP | POLLERR (e.g. connection reset by peer), but the JDK /Android SDK will not be able to surface this condition to the caller and will return 0 instead, potentially causing a loop with 100% CPU utilisation.

Details

I was recently investigating an issue with an Android Selector going into a busy loop where select() would return immediately without any keys being selected.

From the Android API documentation:

This method performs a blocking selection operation. It returns only after at least one channel is selected, this selector’s wakeup method is invoked, or the current thread is interrupted, whichever comes first.

I knew that neither the current thread was interrupted nor that the selector was woken up, yet the select() returned with an empty selected key set, consuming 100% CPU in the resulting busy loop.

There are a few reports of similar problems, e.g.

In particular that first link suggests that other implementations have encountered this when using the JDK and found workarounds like recreating the entire selector based on a heuristig.

After trying many things and digging through all related Android/Java/JNI classes I have finally found the cause of the spurious wake ups, which appears to be a bug in the JDK or an illegal, yet undocumented, use case when using
selectionKey.interestOps(0);

Program flow pseudo code

Our selector loop detects when channels are ready for reading but would aim to offload the actual reading of the data to a secondary thread.

do {
  // block until any channel is ready
  int ret = selector.select()

// for any key that is ready
for (key: selector.selectedKeys()) {
if (key.isReadable()) {
// make sure we are not waking up
// until reading has finished
key.interestOps(0);

// offload the reading to a worker
// the worker will call
// key.interestOps(OP_READ) when done
workQueue.add(key.channel());
}
}

// make sure all selected keys are cleared
// because we’ve handled them all
selector.selectedKeys().clear();
} while (true);

As illustrated, once a key has been detected to be ready, we unregister all interested in all operations and offload the read work to a secondary thread. We did this by calling key.interestOps(0), and pass the channel to a worker thread. When the worker thread has completed the read, it registers the channel again with key.interestOps(OP_READ).

We observed a situation where select() would constantly return 0, with the selectedKeys set being empty, causing 100% CPU load.

Missing support for interestOps(0) in the JDK

The JDK / Android SDK promises that select will only wake up if any key is ready, the selector is woken up or interrupted, but neither of these things have happened. The documentation does not mention a special handling of interestOps(0) and it can be assumed that this is a valid operation to perform.

Under the hood Java and Android SDK are using poll(2) to block for the I/O ready state. The pollfd struct takes an events field (which in our case is 0) and it would populate a revents field with the values of the ready file description.

I found that poll(2) wakes up with revents being POLLHUP | POLLERR as a signal that the remove channel is closed. This is a valid case even when registered events is 0 and any read on such an fd would return -1.

The Android AbstractPollSelectorImpl.java however FILTERS the read the nioReadyOps() by the nioInterestOps():

sk.channel.translateAndSetReadyOps(rOps, sk);
if ((sk.nioReadyOps() & sk.nioInterestOps())!=0) {
  selectedKeys.add(sk);
  numKeysUpdated++;
}

So even if nioReadyOps() would return a value, it would be masked out by the nioInterestOps(). Unfortunately, the translateAndSetReadyOps() in SocketChannelImp.java will ensure that even nioReadyOps() is 0, because it is set to intOps in case of error:

if ((ops & (Net.POLLERR | Net.POLLHUP)) != 0) {
  newOps = intOps;
  sk.nioReadyOps(newOps);
  // No need to poll again in checkConnect,
  // the error will be detected there
  readyToConnect = true;
  return (newOps & ~oldOps) != 0;
}

Endless loop on interestOps(0) and POLLHUP | POLLERR

A selector will wake up if a registered channel has been disconnected even if interestOps has been set to 0!

However there is no way to access this condition in the userland because the condition has been masked out by the zero of interestOps. The selector will return an empty selectedKeys set instead even though poll signalled a ready file descriptor.