Next question: timeouts.
I'm using two different machine types: QEMU that boots very quickly, and a real board that takes forever to boot.
If I start both tests with a "lava-sync ready", the timeout is 360 seconds. QEMU boots, waits for 360 seconds and fails, all before the board has finished flashing its SD card.
Is there a LAVA method for handling this, or should I use a long "sleep" shell command to make QEMU hang around before calling the "lava-sync"?
On Wed, 22 Jan 2020 at 14:54, Ryan Harkin ryan.harkin@linaro.org wrote:
Hi Remi,
Thanks for the quick response.
On Wed, 22 Jan 2020 at 14:12, Remi Duraffort remi.duraffort@linaro.org wrote:
Hello Ryan,
Le mer. 22 janv. 2020 à 14:02, Ryan Harkin ryan.harkin@linaro.org a écrit :
Hi folks,
I'm struggling to create a working multinode job and wondered if someone could help?
Here's the sample job I've submitted to try to get a handle on how multinode works:
https://validation.linaro.org/scheduler/job/1963418.1#L1745
The main problem happens when LAVA executes this command on the shell:
/lava-1963419/bin/lava-test-runner /lava-1963419/1
But there is no "/lava-1963419/1" directory, so then it can't find lava-common-functions and fails like this because the path is relative to a path that doesn't exist:
/lava-1963419/bin/lava-test-runner: line 18:
/lava-1963419/1/../bin/lava-common-functions: No such file or directory
When I run a similar definition on a single node, it uses zero for the lava sub-dir, not 1, and it works:
That"s maybe a bug in LAVA. The number (0 or 1 or ...) allows LAVA to have many test actions in the same overlay. SO usually it should start by calling 0 and then 1, ... I will have a look.
Ah, ok. This makes some sense! In my definition, I have 3 test actions. The test action that uses lava-test-runner is in the 2nd block, eg:
test: interactive:
test: definitions:
test interactive:
When I remove the first "interactive" block, and lava-test-runner action is used in the first test action, it uses 0 and it is working. So I have a workaround - thanks! :)
This mix of interactive and definitions works fine on my single node jobs. But it's not a problem for me to use only definitions first. Of course, there is always the possibility that I'm doing something I shouldn't be...
I tried to work out what was running here, to see if I could debug the scripts to work out where the "1" comes from. I cloned the LAVA repo and I cannot see the script "environment", and the lava-test-runner sources lava-common-functions from line 16, not 18. So now I'm curious where LAVA gets its scripts from.
https://git.lavasoftware.org/lava/lava/tree/master/lava_dispatcher/lava_test... ?
Hmmm, that's where I looked. The files there are slightly different than those running on the board, but they are so similar, I think they've been modified somehow while being transferred. I set my job to "cat lava-test-runner", but the output isn't conclusive:
https://validation.linaro.org/scheduler/job/1963433.1#L1920
But it's ok, I have what I need now, so don't need to debug any more.
Another minor point, I'm following the instructions on this link:
https://docs.lavasoftware.org/lava/writing-multinode.html
And there are some dead links in there, eg.
https://docs.lavasoftware.org/lava/examples/test-jobs/first-multinode-job.ya...
https://docs.lavasoftware.org/lava/examples/test-jobs/second-multinode-job.y...
https://docs.lavasoftware.org/lava/examples/test-jobs/bbb-lxc-ssh-guest.yaml
Thanks for noticing this. In fact every files under examples/ are missing :( But not in the embedded documentation ( https://staging.validation.linaro.org/static/docs/v2/writing-multinode.html ).
Perfect! Thanks again.
Cheers, Ryan.
Cheers
-- Rémi Duraffort