Hi,
In most cases, we don't need multiple node job as we can control AOSP DUT from lxc via adb over USB. However, here is the use case.
CTS/VTS tradefed-shell --shards option supports to split tests and run them on multiple devices in parallel. To leverage the feature in LAVA, we need multinode job, right? And in multinode job, master-node lxc needs access to DUTs from salve nodes via adb over tcpip, right? Karsten shared a job example here[1]. This probably is the most advanced usage of LAVA, and probably also not encouraged? To make it more clear, the connectivity should look like this.
master.lxc <----adb over usb----> master.dut master.lxc <----adb over tcpip ---> slave1.dut master.lxc <----adb over tcpip ---> slave2.dut ....
I see two options for adb over tcpip.
Option #1: WiFi. adb over wifi can be enabled easily by issuing adb cmds from lxc. I am not using it for two reasons.
* WiFi isn't reliable for long cts/vts test run. * In Cambridge lab, WiFi sub-network isn't accessible from lxc network. Because of security concerns, there is no plan to change that.
Option #2: Wired Ethernet. On devices like hikey, we need to run 'pre-os-command' in boot action to power off OTG port so that USB Ethernet dongle works. Once OTG port is off, lxc has no access to the DUT, then test definition should be executed on DUT, right? I am also having the following problems to do this.
* Without context overriding, overlay tarball will be applied to '/system' directory and test job reported "/system/bin/sh: /lava-247856/bin/lava-test-runner: not found"[2]. * With the following job context, LAVA still runs '/lava-24/bin/lava-test-runner /lava-24/0' and it hangs there. It is tested in my local LAVA instance, test job definition and test log attached. Maybe my understanding on the context overriding is wrong, I thought LAVA should execute '/system/lava-24/bin/lava-test-runner /system/lava-24/0' instead. Any suggestions would be appreciated.
context: lava_test_sh_cmd: '/system/bin/sh' lava_test_results_dir: '/system/lava-%s'
I checked on the DUT directly, '/system/lava-%s' exist, but I cannot really run lava-test-runner. The shebang line seems problematic.
--- hacking --- hikey:/system/lava-24/bin # ./lava-test-runner /system/bin/sh: ./lava-test-runner: No such file or directory hikey:/system/lava-24/bin # cat lava-test-runner #!/bin/bash
#!/bin/sh
.... # /system/bin/sh lava-test-runner lava-test-runner[18]: .: /lava/../bin/lava-common-functions: No such file or directory --- ends ---
I had a discussion with Milosz. He proposed the third option which probably will be the most reliable one, but it is not supported in LAVA yet. Here is the idea. Milosz, feel free to explain more.
**Option #3**: Add support for accessing to multiple DUTs in single node job.
* Physically, we need the DUTs connected via USB cable to the same dispatcher. * In single node job, LAVA needs to add the DUTs specified(somehow) or assigned randomly(lets say both device type and numbers defined) to the same lxc container. Test definitions can take over from here.
Is this can be done in LAVA? Can I require the feature? Any suggestions on the possible implementations?
Thanks, Chase
[1] https://review.linaro.org/#/c/qa/test-definitions/+/29417/4/automated/androi... [2] https://staging.validation.linaro.org/scheduler/job/247856#L1888