RPC calls and mysterious 40ms delay

I wrote a Java client that reads DMX data from the OLA application. OLA provides a nice Java Wrapper class that use Google’s protobuf library (Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats). Protobuf use TCP sockets to communicate, this primitive diagram show how it works:

1
2
3

Client sends RPC call to Server
Server responds with an answer (4 bytes) and send the payload size
Server sends the payload

I tested my Java application on my Raspberry Pi – and somehow it was very slow. I added some debug output to my application and saw that the first two communication steps (send RPC call and getting the payload size) was fast, but it took around 40ms until my application get’s the data. I tested the application on a regular Linux system (Ubuntu) with the same result, so it’s not a RPi limitation. On my OSX MacBook Air however the RPC call needed only 3ms!

I was a bit clueless, so rewrote the Java client to use Java’s NIO SocketChannels. The result was, it took now 80ms! After several research hours passed I found a really nice explanation on Stackoverflow:

40ms is the TCP ACK delay on Linux, which indicates that you are likely encountering a bad interaction between delayed acks and the Nagle algorithm. The best way to address this is to send all of your data using a single call to send() or sendmsg(), before waiting for a response. If that is not possible then certain TCP socket options including TCP_QUICKACK (on the receiving side), TCP_CORK (sending side), and TCP_NODELAY (sending side) can help, but can also hurt if used improperly. TCP_NODELAY simply disables the Nagle algorithm and is a one-time setting on the socket, whereas the other two must be set at the appropriate times during the life of the connection and can therefore be trickier to use.

So I patched my local OLA application to set the TCP_NODELAY option:

1
2
3
4
5
6
7
8
9
10
11

int flag = 1;
int result = setsockopt(sd, /* socket affected */
IPPROTO_TCP, /* set option at TCP level */
TCP_NODELAY, /* name of option */
(char *) &flag, /* the cast is historical cruft */
sizeof(int)); /* length of option value */
if (result < 0) {
OLA_WARN << "can't set nodelay for " << sd << ", " << strerror(errno);
close(sd);
return false;
}

Problem solved! It takes around 6ms for the RPC call to finish.

This technical paper from IBM explains the problem in detail. TL:DR disable the Nagle’s Algorithm (TCP_NODELAY) on the server side.