multinode+lxc/fastboot: reboot_to_fastboot, multinode device count

List overview All Threads
Download

newer

older

Trigger another job or task on...

How to plot in Lava

Karsten Tausche

22 Mar 2018 22 Mar '18

4:21 p.m.

Hi Folks,

I'm experimenting with Multinode for distributing tests across multiple Android DUTs (for using the CTS shards option at some point). The problem now is that the devices are rebooted to fastboot after the test although reboot_to_fastboot: false is specified in the test parameters. Apparently this parameter is not passed over from the multinode job to the LxcProtocol.

Any idea on how to fix this?

I attached a basic test shell definition that demonstrates the problem.

A side question here: If I set the count of the worker role to something larger than 1, one of the job instance will stop incompletely with "error_msg: Invalid job data: ["Missing protocol 'lava-lxc'"] http://localhost/results/testcase/817", and the other two time out at "Multinode wait/sync". Am I missing something here or is this a limitation of the multinode/lxc protocol combination?

Thank you, Karsten

Attachments:

attachment.html (text/html — 1.3 KB)
multinode-lxc-fastboot-get-adb-serial.yaml (application/x-yaml — 1.3 KB)

Show replies by date

Neil Williams

22 Mar 22 Mar

5:09 p.m.

New subject: [Lava-users] multinode+lxc/fastboot: reboot_to_fastboot, multinode device count

On 22 March 2018 at 16:21, Karsten Tausche karsten@fairphone.com wrote:

...

Hi Folks,

I'm experimenting with Multinode for distributing tests across multiple Android DUTs (for using the CTS shards option at some point). The problem now is that the devices are rebooted to fastboot after the test although reboot_to_fastboot: false is specified in the test parameters. Apparently this parameter is not passed over from the multinode job to the LxcProtocol.

Any idea on how to fix this?

I attached a basic test shell definition that demonstrates the problem.

A side question here: If I set the count of the worker role to something larger than 1, one of the job instance will stop incompletely with "error_msg: Invalid job data: ["Missing protocol 'lava-lxc'"] http://localhost/results/testcase/817", and the other two time out at "Multinode wait/sync". Am I missing something here or is this a limitation of the multinode/lxc protocol combination?

We have a unit test designed to provide coverage for both elements of this support:

test_multinode_hikey (lava_scheduler_app.tests.test_pipeline.TestYamlMultinode) ... ok

$ ./lava_server/manage.py test -v2 --noinput lava_scheduler_app.tests.test_pipeline.TestYamlMultinode. test_multinode_hikey https://git.linaro.org/lava/lava-server.git/tree/lava_scheduler_app/tests/te...

That uses this sample job: https://git.linaro.org/lava/lava-server.git/tree/lava_scheduler_app/tests/sa...

That should provide a way to discover the cause of both issues during the process where the MultiNode submission is split into the sub jobs. https://git.linaro.org/lava/lava-server.git/tree/lava_scheduler_app/utils.py...

Compare the test job submission of the sub jobs for worker: {count: 1} with worker: {count: 2}

-- Neil Williams ============= neil.williams@linaro.org http://www.linux.codehelp.co.uk/

Karsten Tausche

3 Apr 3 Apr

2:18 p.m.

New subject: [Lava-users] multinode+lxc/fastboot: reboot_to_fastboot, multinode device count

Hi Neil,

thanks for your pointer, it was indeed very helpful. I created a patch for both issues here: https://review.linaro.org/#/c/24517/ Regarding the tests: I tested my changes on the 2018.2 release tag on Debian Stretch (only using python packages from the Debian repo). When trying to test on the current master I got errors such as "TypeError: unhashable type: 'TestYamlMultinode'". Is there anything else needed to setup the test environment that is not documented yet?

Best regards, Karsten

On Thu, Mar 22, 2018 at 6:09 PM, Neil Williams neil.williams@linaro.org wrote:

...

On 22 March 2018 at 16:21, Karsten Tausche karsten@fairphone.com wrote:

...
Hi Folks,

I'm experimenting with Multinode for distributing tests across multiple Android DUTs (for using the CTS shards option at some point). The problem now is that the devices are rebooted to fastboot after the test although reboot_to_fastboot: false is specified in the test parameters. Apparently this parameter is not passed over from the multinode job to the LxcProtocol.

Any idea on how to fix this?

I attached a basic test shell definition that demonstrates the problem.

A side question here: If I set the count of the worker role to something larger than 1, one of the job instance will stop incompletely with "error_msg: Invalid job data: ["Missing protocol 'lava-lxc'"] http://localhost/results/testcase/817", and the other two time out at "Multinode wait/sync". Am I missing something here or is this a limitation of the multinode/lxc protocol combination?

We have a unit test designed to provide coverage for both elements of this support:

test_multinode_hikey (lava_scheduler_app.tests.test_pipeline.TestYamlMultinode) ... ok

$ ./lava_server/manage.py test -v2 --noinput lava_scheduler_app. tests.test_pipeline.TestYamlMultinode.test_multinode_hikey https://git.linaro.org/lava/lava-server.git/tree/lava_ scheduler_app/tests/test_pipeline.py#n730

That uses this sample job: https://git.linaro.org/ lava/lava-server.git/tree/lava_scheduler_app/tests/ sample_jobs/hikey_multinode.yaml

That should provide a way to discover the cause of both issues during the process where the MultiNode submission is split into the sub jobs. https://git.linaro.org/lava/lava-server.git/tree/lava_ scheduler_app/utils.py#n133

Compare the test job submission of the sub jobs for worker: {count: 1} with worker: {count: 2}

--

Neil Williams

neil.williams@linaro.org http://www.linux.codehelp.co.uk/

Neil Williams

2:20 p.m.

New subject: [Lava-users] multinode+lxc/fastboot: reboot_to_fastboot, multinode device count

On 3 April 2018 at 15:18, Karsten Tausche karsten@fairphone.com wrote:

...

Hi Neil,

thanks for your pointer, it was indeed very helpful. I created a patch for both issues here: https://review.linaro.org/#/c/24517/ Regarding the tests: I tested my changes on the 2018.2 release tag on Debian Stretch (only using python packages from the Debian repo). When trying to test on the current master I got errors such as "TypeError: unhashable type: 'TestYamlMultinode'". Is there anything else needed to setup the test environment that is not documented yet?

You need python3-django-testscenarios from stretch-backports to run the full unit tests

$ sudo apt-get -q -t stretch-backports install python3-django-testscenarios

...

Best regards, Karsten

On Thu, Mar 22, 2018 at 6:09 PM, Neil Williams neil.williams@linaro.org wrote:

...
On 22 March 2018 at 16:21, Karsten Tausche karsten@fairphone.com wrote:

...
Hi Folks,

I'm experimenting with Multinode for distributing tests across multiple Android DUTs (for using the CTS shards option at some point). The problem now is that the devices are rebooted to fastboot after the test although reboot_to_fastboot: false is specified in the test parameters. Apparently this parameter is not passed over from the multinode job to the LxcProtocol.

Any idea on how to fix this?

I attached a basic test shell definition that demonstrates the problem.

A side question here: If I set the count of the worker role to something larger than 1, one of the job instance will stop incompletely with "error_msg: Invalid job data: ["Missing protocol 'lava-lxc'"] http://localhost/results/testcase/817", and the other two time out at "Multinode wait/sync". Am I missing something here or is this a limitation of the multinode/lxc protocol combination?

We have a unit test designed to provide coverage for both elements of this support:

test_multinode_hikey (lava_scheduler_app.tests.test_pipeline.TestYamlMultinode) ... ok

$ ./lava_server/manage.py test -v2 --noinput lava_scheduler_app.t ests.test_pipeline.TestYamlMultinode.test_multinode_hikey https://git.linaro.org/lava/lava-server.git/tree/lava_schedu ler_app/tests/test_pipeline.py#n730

That uses this sample job: https://git.linaro.org/la va/lava-server.git/tree/lava_scheduler_app/tests/sample_ jobs/hikey_multinode.yaml

That should provide a way to discover the cause of both issues during the process where the MultiNode submission is split into the sub jobs. https://git.linaro.org/lava/lava-server.git/tree/lava_schedu ler_app/utils.py#n133

Compare the test job submission of the sub jobs for worker: {count: 1} with worker: {count: 2}

--

Neil Williams

neil.williams@linaro.org http://www.linux.codehelp.co.uk/

-- Neil Williams ============= neil.williams@linaro.org http://www.linux.codehelp.co.uk/

2758

days inactive

2770

days old

lava-users@lists.lavasoftware.org

3 comments

participants

tags (0)

participants (2)

Karsten Tausche
Neil Williams