On 22 March 2018 at 16:21, Karsten Tausche <karsten@fairphone.com> wrote:

Hi Folks,

I'm experimenting with Multinode for distributing tests across multiple Android DUTs (for using the CTS shards option at some point). The problem now is that the devices are rebooted to fastboot after the test although reboot_to_fastboot: false is specified in the test parameters. Apparently this parameter is not passed over from the multinode job to the LxcProtocol.

Any idea on how to fix this?

I attached a basic test shell definition that demonstrates the problem.

A side question here: If I set the count of the worker role to something larger than 1, one of the job instance will stop incompletely with "error_msg: Invalid job data: ["Missing protocol 'lava-lxc'"]", and the other two time out at "Multinode wait/sync". Am I missing something here or is this a limitation of the multinode/lxc protocol combination?

We have a unit test designed to provide coverage for both elements of this support:

test_multinode_hikey (lava_scheduler_app.tests.test_pipeline.TestYamlMultinode) ... ok

$ ./lava_server/manage.py test -v2 --noinput lava_scheduler_app.tests.test_pipeline.TestYamlMultinode.test_multinode_hikey

https://git.linaro.org/lava/lava-server.git/tree/lava_scheduler_app/tests/test_pipeline.py#n730

That uses this sample job: https://git.linaro.org/lava/lava-server.git/tree/lava_scheduler_app/tests/sample_jobs/hikey_multinode.yaml

That should provide a way to discover the cause of both issues during the process where the MultiNode submission is split into the sub jobs.

https://git.linaro.org/lava/lava-server.git/tree/lava_scheduler_app/utils.py#n133

Compare the test job submission of the sub jobs for worker: {count: 1} with worker: {count: 2}

Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/