Hi all,
I'm facing a pretty frustrating issue when running CTS/VTS with LAVA. I'm using Linaro's tradefed test definition : https://git.linaro.org/qa/test-definitions.git/tree/automated/android/tradef...
During some runs, the adb connection is lost, leading to incomplete test job. Do you know if this behavior is known and mostly general ? Or is it a bad configuration on my side ? Maybe someone knows some way to keep a reliable adb connection to the target ?
Best regards, Axel
Hi Axel,
We have similar issues on our setup and there seem to be various root causes. In some cases, CTS/VTS test cases cause devices to get lost (rebooting, device becoming unresponsive etc.). In other cases, it's the infrastructure causing issues (unstable adb or USB connection etc.).
For your setup, is there a way to get the devices back without physically interacting with them? `adb kill-server` does sometimes magic. Do you know what exactly prevents the devices from being accessible via adb? Do they show up at all in `adb devices` or `lsusb`?
In general, CTS/VTS etc expect you to rerun your test sessions until you end up with a stable number of failures. From that perspective it is considered normal that after a single run you end up with incomplete modules or false positives. Regarding running CTS in LAVA, I implemented a variant of the Tradefed runner that works around some of the reasons for false positives and lost devices. It is more fault tolerant when it comes to lost (but hopefully recovering) devices. Also it automates the Tradefed retry mechanism. Have a look here, there are also example jobs: https://git.linaro.org/qa/test-definitions.git/tree/automated/android/multin... It does not implement VTS yet, but that should be reasonable simple to add.
Besides, the linked runner also implements sharding test runs across multiple devices by combining LAVA MultiNode jobs with adb TCP/IP connections. As it relies on a the DUTs having a network connection to the container running the Tradefed shell, it is not appropriate in all setups. Also, the network connection does not play well with some CTS modules (e.g., if network tests and modules test reboot devices, e.g. in CtsAppSecurityHostTestCases). However, you can use the runner on a single USB-attached DUT by setting the count of the "worker" role to 0 (in the example job yaml).
Karsten Tausche | Software Engineer Jollemanhof 17, 1019 GW Amsterdam, The Netherlands www.fairphone.com https://www.fairphone.com/en/
On Thu, Jun 13, 2019 at 4:37 PM Axel Lebourhis axel.lebourhis@linaro.org wrote:
Hi all,
I'm facing a pretty frustrating issue when running CTS/VTS with LAVA. I'm using Linaro's tradefed test definition : https://git.linaro.org/qa/test-definitions.git/tree/automated/android/tradef...
During some runs, the adb connection is lost, leading to incomplete test job. Do you know if this behavior is known and mostly general ? Or is it a bad configuration on my side ? Maybe someone knows some way to keep a reliable adb connection to the target ?
Best regards, Axel _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
Axel,
I've been struggling with adb disconnect for a couple months now. So far the only conclusion is that it's (in my case) most likely some problem with hardware. Disconnect happens in VTS after several reboots. We narrowed it down to a single test: VtsKernelProcFileApi#testProcSysrqTrigger If the test is executed subsequently on a single board outside of lava eventually the board shuts down completely without any messages on the console. If there is a better explanation I'm very interested to hear it.
milosz
On Thu, 13 Jun 2019 at 15:36, Axel Lebourhis axel.lebourhis@linaro.org wrote:
Hi all,
I'm facing a pretty frustrating issue when running CTS/VTS with LAVA. I'm using Linaro's tradefed test definition : https://git.linaro.org/qa/test-definitions.git/tree/automated/android/tradef...
During some runs, the adb connection is lost, leading to incomplete test job. Do you know if this behavior is known and mostly general ? Or is it a bad configuration on my side ? Maybe someone knows some way to keep a reliable adb connection to the target ?
Best regards, Axel _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
Hi Karsten,
On Thu, 13 Jun 2019 at 17:05, Karsten Tausche karsten@fairphone.com wrote:
Hi Axel,
We have similar issues on our setup and there seem to be various root causes. In some cases, CTS/VTS test cases cause devices to get lost (rebooting, device becoming unresponsive etc.). In other cases, it's the infrastructure causing issues (unstable adb or USB connection etc.).
For your setup, is there a way to get the devices back without physically interacting with them? `adb kill-server` does sometimes magic. Do you know what exactly prevents the devices from being accessible via adb? Do they show up at all in `adb devices` or `lsusb`?
For me it's hard to check that kind of thing because we usually run xTS during the week end or even when we run it during the week, the job usually fails during the night... In some cases, it's a specific module that causes the board to become unresponsive, without any logs, only the "Closed by foreign host" from the telnet connection. In those cases, I exclude the "dangerous" modules and run them alone, which is even weirder because when I run them alone, the device is not lost...
In general, CTS/VTS etc expect you to rerun your test sessions until you end up with a stable number of failures. From that perspective it is considered normal that after a single run you end up with incomplete modules or false positives. Regarding running CTS in LAVA, I implemented a variant of the Tradefed runner that works around some of the reasons for false positives and lost devices. It is more fault tolerant when it comes to lost (but hopefully recovering) devices. Also it automates the Tradefed retry mechanism. Have a look here, there are also example jobs: https://git.linaro.org/qa/test-definitions.git/tree/automated/android/multin... It does not implement VTS yet, but that should be reasonable simple to add.
Thanks for the tip I will give a look a it. The Tradefed retry mechanism is very interesting !
Besides, the linked runner also implements sharding test runs across multiple devices by combining LAVA MultiNode jobs with adb TCP/IP connections. As it relies on a the DUTs having a network connection to the container running the Tradefed shell, it is not appropriate in all setups. Also, the network connection does not play well with some CTS modules (e.g., if network tests and modules test reboot devices, e.g. in CtsAppSecurityHostTestCases). However, you can use the runner on a single USB-attached DUT by setting the count of the "worker" role to 0 (in the example job yaml).
Karsten Tausche | Software Engineer Jollemanhof 17, 1019 GW Amsterdam, The Netherlands www.fairphone.com https://www.fairphone.com/en/
On Thu, Jun 13, 2019 at 4:37 PM Axel Lebourhis axel.lebourhis@linaro.org wrote:
Hi all,
I'm facing a pretty frustrating issue when running CTS/VTS with LAVA. I'm using Linaro's tradefed test definition : https://git.linaro.org/qa/test-definitions.git/tree/automated/android/tradef...
During some runs, the adb connection is lost, leading to incomplete test job. Do you know if this behavior is known and mostly general ? Or is it a bad configuration on my side ? Maybe someone knows some way to keep a reliable adb connection to the target ?
Best regards, Axel _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
Hi Milosz,
On Thu, 13 Jun 2019 at 18:33, Milosz Wasilewski < milosz.wasilewski@linaro.org> wrote:
Axel,
I've been struggling with adb disconnect for a couple months now. So far the only conclusion is that it's (in my case) most likely some problem with hardware. Disconnect happens in VTS after several reboots. We narrowed it down to a single test: VtsKernelProcFileApi#testProcSysrqTrigger If the test is executed subsequently on a single board outside of lava eventually the board shuts down completely without any messages on the console. If there is a better explanation I'm very interested to hear it.
Yes I observed the same behavior but on different modules. Like I said to Karsten, I run the "dangerous" modules alone, and I observed that usually the run is complete.
milosz
On Thu, 13 Jun 2019 at 15:36, Axel Lebourhis axel.lebourhis@linaro.org wrote:
Hi all,
I'm facing a pretty frustrating issue when running CTS/VTS with LAVA. I'm using Linaro's tradefed test definition :
https://git.linaro.org/qa/test-definitions.git/tree/automated/android/tradef...
During some runs, the adb connection is lost, leading to incomplete test
job.
Do you know if this behavior is known and mostly general ? Or is it a
bad configuration on my side ?
Maybe someone knows some way to keep a reliable adb connection to the
target ?
Best regards, Axel _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
lava-users@lists.lavasoftware.org