Hi Karsten,

On Thu, 13 Jun 2019 at 17:05, Karsten Tausche <karsten@fairphone.com> wrote:
Hi Axel,

We have similar issues on our setup and there seem to be various root causes. In some cases, CTS/VTS test cases cause devices to get lost (rebooting, device becoming unresponsive etc.). In other cases, it's the infrastructure causing issues (unstable adb or USB connection etc.).

For your setup, is there a way to get the devices back without physically interacting with them? `adb kill-server` does sometimes magic. Do you know what exactly prevents the devices from being accessible via adb? Do they show up at all in `adb devices` or `lsusb`?
 
For me it's hard to check that kind of thing because we usually run xTS during the week end or even when we run it during the week, the job usually fails during the night... In some cases, it's a specific module that causes the board to become unresponsive, without any logs, only the "Closed by foreign host" from the telnet connection. In those cases, I exclude the "dangerous" modules and run them alone, which is even weirder because when I run them alone, the device is not lost...
 

In general, CTS/VTS etc expect you to rerun your test sessions until you end up with a stable number of failures. From that perspective it is considered normal that after a single run you end up with incomplete modules or false positives.
Regarding running CTS in LAVA, I implemented a variant of the Tradefed runner that works around some of the reasons for false positives and lost devices. It is more fault tolerant when it comes to lost (but hopefully recovering) devices. Also it automates the Tradefed retry mechanism. Have a look here, there are also example jobs: https://git.linaro.org/qa/test-definitions.git/tree/automated/android/multinode/tradefed
It does not implement VTS yet, but that should be reasonable simple to add.

Thanks for the tip I will give a look a it. The Tradefed retry mechanism is very interesting !

Besides, the linked runner also implements sharding test runs across multiple devices by combining LAVA MultiNode jobs with adb TCP/IP connections. As it relies on a the DUTs having a network connection to the container running the Tradefed shell, it is not appropriate in all setups. Also, the network connection does not play well with some CTS modules (e.g., if network tests and modules test reboot devices, e.g. in CtsAppSecurityHostTestCases). However, you can use the runner on a single USB-attached DUT by setting the count of the "worker" role to 0 (in the example job yaml).

Karsten Tausche | Software Engineer
Jollemanhof 17, 1019 GW Amsterdam, The Netherlands


On Thu, Jun 13, 2019 at 4:37 PM Axel Lebourhis <axel.lebourhis@linaro.org> wrote:
Hi all,

I'm facing a pretty frustrating issue when running CTS/VTS with LAVA.

During some runs, the adb connection is lost, leading to incomplete test job.
Do you know if this behavior is known and mostly general ? Or is it a bad configuration on my side ?
Maybe someone knows some way to keep a reliable adb connection to the target ?

Best regards,
Axel
_______________________________________________
Lava-users mailing list
Lava-users@lists.lavasoftware.org
https://lists.lavasoftware.org/mailman/listinfo/lava-users

Hi Milosz,

On Thu, 13 Jun 2019 at 18:33, Milosz Wasilewski <milosz.wasilewski@linaro.org> wrote:
Axel,

I've been struggling with adb disconnect for a couple months now. So
far the only conclusion is that it's (in my case) most likely some
problem with hardware. Disconnect happens in VTS after several
reboots. We narrowed it down to a single test:
VtsKernelProcFileApi#testProcSysrqTrigger
If the test is executed subsequently on a single board outside of lava
eventually the board shuts down completely without any messages on the
console. If there is a better explanation I'm very interested to hear
it.

Yes I observed the same behavior but on different modules.
Like I said to Karsten, I run the "dangerous" modules alone, and I observed that usually the run is complete.

 

milosz


On Thu, 13 Jun 2019 at 15:36, Axel Lebourhis <axel.lebourhis@linaro.org> wrote:
>
> Hi all,
>
> I'm facing a pretty frustrating issue when running CTS/VTS with LAVA.
> I'm using Linaro's tradefed test definition : https://git.linaro.org/qa/test-definitions.git/tree/automated/android/tradefed
>
> During some runs, the adb connection is lost, leading to incomplete test job.
> Do you know if this behavior is known and mostly general ? Or is it a bad configuration on my side ?
> Maybe someone knows some way to keep a reliable adb connection to the target ?
>
> Best regards,
> Axel
> _______________________________________________
> Lava-users mailing list
> Lava-users@lists.lavasoftware.org
> https://lists.lavasoftware.org/mailman/listinfo/lava-users