New subject: [Lava-users] Usb bus circulated issue for android.

10 Sep 2020

      Hi, guys,
We found a blocking issue for android test, the story is next:
1. job #1 with device #1 is running for about 12 hours, during its run, it will restart the boards many times, then the usb path will e.g. start from /dev/bus/usb/003/001 to /dev/bus/usb/002, then /dev/bus/usb/003...... finally /dev/bus/usb/127.
You know, the max number here will be 127, so, if device reset again, the number will back to 001.
Adb devices in container 1:
$ adb devices
List of devices attached
040c41d4d72d7393        device
2. job #2 with device #2 starts run during the job #1 still running, then E.g. it will mknod /dev/bus/usb/003/016 to another docker-test-shell container, also cgroup privilege added.
But as the /dev/bus/usb/003/016 was once used by job #1, and this node won't be deleted from docker-test-shell container.
So, we find high probability the device #2 was seen in job #1's docker-test-shell container (Checked with adb devices).
Now, adb devices in container 2:
$adb devices
List of devices attached
After above, adb devices in container 1:
$ adb devices
List of devices attached
040c41d4d72d7393        device
23305a0a5c85d936        device
This becomes a big issue for our parallel android test.
In fact, in the old LXC days, we also find similar issues, so we made a workaround in our local:
https://github.com/atline/lava-docker-slave/blob/66f15d9da88912fc929fef52136...
In this patch, we also monitor "remove", ENV{ID_SERIAL_SHORT}, that is "if a usb leaved, let it delete the node".
But, I don't know for which reasons, in current version(2020.08), now I can just monitor "remove" in udev, can't match "remove + ENV{ID_SERIAL_SHORT}" correctly.
So, to make our local android run could work in a short time, we did a patch as next:
# diff __init__.py.bak __init__.py
157,158c157,158
<             "mkdir -p %s && mknod %s c %d %d || true"
<             % (os.path.dirname(node), node, major, minor),
---
...
        "mkdir -p %s && rm -f %s/* && mknod %s c %d %d || true"
        % (os.path.dirname(node), os.path.dirname(node), node, major, minor),

Now, no issue happens in our side, but it looks this is somewhat not universal?
So, I'm here to ask the question, have you ever found this issue? And what's your thought on this?