Hi Folks,
I'm experimenting with Multinode for distributing tests across multiple Android DUTs (for using the CTS shards option at some point). The problem now is that the devices are rebooted to fastboot after the test although reboot_to_fastboot: false is specified in the test parameters. Apparently this parameter is not passed over from the multinode job to the LxcProtocol.
Any idea on how to fix this?
I attached a basic test shell definition that demonstrates the problem.
A side question here: If I set the count of the worker role to something larger than 1, one of the job instance will stop incompletely with "error_msg: Invalid job data: ["Missing protocol 'lava-lxc'"] http://localhost/results/testcase/817", and the other two time out at "Multinode wait/sync". Am I missing something here or is this a limitation of the multinode/lxc protocol combination?
Thank you, Karsten
On 22 March 2018 at 16:21, Karsten Tausche karsten@fairphone.com wrote:
Hi Folks,
I'm experimenting with Multinode for distributing tests across multiple Android DUTs (for using the CTS shards option at some point). The problem now is that the devices are rebooted to fastboot after the test although reboot_to_fastboot: false is specified in the test parameters. Apparently this parameter is not passed over from the multinode job to the LxcProtocol.
Any idea on how to fix this?
I attached a basic test shell definition that demonstrates the problem.
A side question here: If I set the count of the worker role to something larger than 1, one of the job instance will stop incompletely with "error_msg: Invalid job data: ["Missing protocol 'lava-lxc'"] http://localhost/results/testcase/817", and the other two time out at "Multinode wait/sync". Am I missing something here or is this a limitation of the multinode/lxc protocol combination?
We have a unit test designed to provide coverage for both elements of this support:
test_multinode_hikey (lava_scheduler_app.tests.test_pipeline.TestYamlMultinode) ... ok
$ ./lava_server/manage.py test -v2 --noinput lava_scheduler_app.tests.test_pipeline.TestYamlMultinode. test_multinode_hikey https://git.linaro.org/lava/lava-server.git/tree/lava_scheduler_app/tests/te...
That uses this sample job: https://git.linaro.org/lava/lava-server.git/tree/lava_scheduler_app/tests/sa...
That should provide a way to discover the cause of both issues during the process where the MultiNode submission is split into the sub jobs. https://git.linaro.org/lava/lava-server.git/tree/lava_scheduler_app/utils.py...
Compare the test job submission of the sub jobs for worker: {count: 1} with worker: {count: 2}
Hi Neil,
thanks for your pointer, it was indeed very helpful. I created a patch for both issues here: https://review.linaro.org/#/c/24517/ Regarding the tests: I tested my changes on the 2018.2 release tag on Debian Stretch (only using python packages from the Debian repo). When trying to test on the current master I got errors such as "TypeError: unhashable type: 'TestYamlMultinode'". Is there anything else needed to setup the test environment that is not documented yet?
Best regards, Karsten
On Thu, Mar 22, 2018 at 6:09 PM, Neil Williams neil.williams@linaro.org wrote:
On 22 March 2018 at 16:21, Karsten Tausche karsten@fairphone.com wrote:
Hi Folks,
I'm experimenting with Multinode for distributing tests across multiple Android DUTs (for using the CTS shards option at some point). The problem now is that the devices are rebooted to fastboot after the test although reboot_to_fastboot: false is specified in the test parameters. Apparently this parameter is not passed over from the multinode job to the LxcProtocol.
Any idea on how to fix this?
I attached a basic test shell definition that demonstrates the problem.
A side question here: If I set the count of the worker role to something larger than 1, one of the job instance will stop incompletely with "error_msg: Invalid job data: ["Missing protocol 'lava-lxc'"] http://localhost/results/testcase/817", and the other two time out at "Multinode wait/sync". Am I missing something here or is this a limitation of the multinode/lxc protocol combination?
We have a unit test designed to provide coverage for both elements of this support:
test_multinode_hikey (lava_scheduler_app.tests.test_pipeline.TestYamlMultinode) ... ok
$ ./lava_server/manage.py test -v2 --noinput lava_scheduler_app. tests.test_pipeline.TestYamlMultinode.test_multinode_hikey https://git.linaro.org/lava/lava-server.git/tree/lava_ scheduler_app/tests/test_pipeline.py#n730
That uses this sample job: https://git.linaro.org/ lava/lava-server.git/tree/lava_scheduler_app/tests/ sample_jobs/hikey_multinode.yaml
That should provide a way to discover the cause of both issues during the process where the MultiNode submission is split into the sub jobs. https://git.linaro.org/lava/lava-server.git/tree/lava_ scheduler_app/utils.py#n133
Compare the test job submission of the sub jobs for worker: {count: 1} with worker: {count: 2}
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
On 3 April 2018 at 15:18, Karsten Tausche karsten@fairphone.com wrote:
Hi Neil,
thanks for your pointer, it was indeed very helpful. I created a patch for both issues here: https://review.linaro.org/#/c/24517/ Regarding the tests: I tested my changes on the 2018.2 release tag on Debian Stretch (only using python packages from the Debian repo). When trying to test on the current master I got errors such as "TypeError: unhashable type: 'TestYamlMultinode'". Is there anything else needed to setup the test environment that is not documented yet?
You need python3-django-testscenarios from stretch-backports to run the full unit tests
$ sudo apt-get -q -t stretch-backports install python3-django-testscenarios
Best regards, Karsten
On Thu, Mar 22, 2018 at 6:09 PM, Neil Williams neil.williams@linaro.org wrote:
On 22 March 2018 at 16:21, Karsten Tausche karsten@fairphone.com wrote:
Hi Folks,
I'm experimenting with Multinode for distributing tests across multiple Android DUTs (for using the CTS shards option at some point). The problem now is that the devices are rebooted to fastboot after the test although reboot_to_fastboot: false is specified in the test parameters. Apparently this parameter is not passed over from the multinode job to the LxcProtocol.
Any idea on how to fix this?
I attached a basic test shell definition that demonstrates the problem.
A side question here: If I set the count of the worker role to something larger than 1, one of the job instance will stop incompletely with "error_msg: Invalid job data: ["Missing protocol 'lava-lxc'"] http://localhost/results/testcase/817", and the other two time out at "Multinode wait/sync". Am I missing something here or is this a limitation of the multinode/lxc protocol combination?
We have a unit test designed to provide coverage for both elements of this support:
test_multinode_hikey (lava_scheduler_app.tests.test_pipeline.TestYamlMultinode) ... ok
$ ./lava_server/manage.py test -v2 --noinput lava_scheduler_app.t ests.test_pipeline.TestYamlMultinode.test_multinode_hikey https://git.linaro.org/lava/lava-server.git/tree/lava_schedu ler_app/tests/test_pipeline.py#n730
That uses this sample job: https://git.linaro.org/la va/lava-server.git/tree/lava_scheduler_app/tests/sample_ jobs/hikey_multinode.yaml
That should provide a way to discover the cause of both issues during the process where the MultiNode submission is split into the sub jobs. https://git.linaro.org/lava/lava-server.git/tree/lava_schedu ler_app/utils.py#n133
Compare the test job submission of the sub jobs for worker: {count: 1} with worker: {count: 2}
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
lava-users@lists.lavasoftware.org