Hi,
We've hit an issue when running lava-master on Debian Jessie and lava-slave on Debian Stretch, after a few minutes the slave would stop working. After some investigation, it turned out to be due to a difference of the libzmq versions in Jessie (4.0.5+dfsg-2) and Stretch (4.2.1-4) causing some protocol errors.
The line that detects the error in Stretch is:
https://github.com/zeromq/libzmq/blob/7005f22726d4a6ca527f27560a0a132394fdbb...
This appears to be due to how the "once" counter gets written into memory and into the zmq packets: the libzmq version from Jessie uses memcpy whereas the one in Stretch calls put_uint64. As a result the byte endianness has changed from little to big, causing the packets to work until "once" reaches 255 which translates into 0xff << 56, after which it overflows to 0 and causes the error.
This is not a LAVA bug as such, rather a libzmq one, but it impacts interoperability between Jessie and Stretch for LAVA so it may need to be documented or resolved somehow. We've installed the new version of libzmq onto our Jessie servers to align them with Stretch; doing this does fix the problem.
Best wishes, Guillaume