Hi Guys:
When I am doing a multi-node testing, I create one job definition liking below. For example: Sub-job 1 finished booting and testing, but sub-job 2 is on-going booting. So sub-job 1 will Remove the template file like <lava_dipatcher>/tmp/overlay****, that will cause sub-job 2 could NOT download The overlay**** file, sub-job 2 failed in the end. My question is how to do sync between multi-node in the job Definition?
My job definition:
protocols: lava-multinode: roles: foo: tags: - board1 device_type: ********** context: grub_method: centos grub_installed_device: (hd1,gpt1) count: 1 bar: tags: - board2 device_type: ********** context: grub_method: centos grub_installed_device: (hd2,gpt1) count: 1 timeout: minutes: 6
job_name: centos openjdk test timeouts: job: minutes: 1500 action: minutes: 50 connection: minutes: 30 priority: medium visibility: public
actions: - deploy: role: - foo - bar kernel: url: http://******** type: zimage os: centos timeout: minutes: 80 to: tftp
- boot: timeout: minutes: 40 role: - bar method: grub commands: centos_installed auto_login: login_prompt: 'login:' username: root password_prompt: 'Password:' password: root prompts: - 'root@localhost ~' transfer_overlay: download_command: rm -f /root/overlay* ; ifconfig ; wget -S --progress=dot:giga unpack_command: tar -C / -xaf parameters: shutdown-message: "reboot: Restarting system"
- boot: timeout: minutes: 40 role: - foo method: grub commands: centos_installed auto_login: login_prompt: 'login:' username: root password_prompt: 'Password:' password: root prompts: - 'root@localhost ~' transfer_overlay: download_command: rm -f /root/overlay* ; ifconfig ; wget -S --progress=dot:giga unpack_command: tar -C / -xaf parameters: shutdown-message: "reboot: Restarting system"
- test: role: - foo - bar timeout: minutes: 50 definitions: - repository: ssh://**********/test-definitions from: git branch: ********** path: automated/linux/openjdk/openjdk-smoke.yaml name: openjdk-smoke
Thanks B.R. Guoqi
This email is intended only for the named addressee. It may contain information that is confidential/private, legally privileged, or copyright-protected, and you should handle it accordingly. If you are not the intended recipient, you do not have legal rights to retain, copy, or distribute this email or its contents, and should promptly delete the email and all electronic copies in your system; do not retain copies in any media. If you have received this email in error, please notify the sender promptly. Thank you.
On 8 February 2018 at 08:26, Liao, Guoqi guoqi.liao@hxt-semitech.com wrote:
Synchronisation is done using the MultiNode API - the test shell simply calls lava-sync but for that to work, there needs to be a functional test shell in the first place.
However, this is a different problem, to do with multiple usage of transfer_overlay.
What version of LAVA are you running? We have fixes for this in the upcoming release.
https://projects.linaro.org/browse/LAVA-1202
However, I don't think we've explicitly tested with MultiNode using transfer_overlay.
This looks like a typical client:server MultiNode test job - it really does help if you describe the roles that way rather than using slang.
I'm assuming an internal device-type but it is worth exploring whether the device integration for this type can support adding the overlay to the rootfs in advance.
Transfer_overlay is not a solution to using the same rootfs for multiple test jobs - there are still issues of persistence which will affect the utilities executed by the test shell. It would be much better to deploy a fresh rootfs each time and then let LAVA add the overlay to that rootfs, avoiding the need for transfer_overlay support. The rootfs can have whatever dependencies are required by the base system pre-installed but a fresh rootfs each time means that the configuration is always the same at the start of each test job.
Keep things simple and only change one element at a time. Not deploying the rootfs each time means that the rootfs *can* change arbitrarily between test jobs. So not deploying the rootfs each time means that you are not only changing the kernel each test job, you are also inheriting unknown changes in the rootfs from the previous test job. The rootfs can be exactly the same tarball every time in every test job but that then means that all your results are reproducible - only the kernel is being changed in each test job. The small amount of time required to deploy a clean rootfs for each test job is tiny in comparison to the engineering time lost by trying to debug issues caused by a persistent rootfs.
Any configuration, package installation or setup done by that test definition will be persistent into the next test job and that is known to cause reliability issues, difficulty in triaging of failed results and other complications.
Hi Williams:
Thanks for your supporting and my feedback is inline bellowing.
B.R. Guoqi
From: Neil Williams [mailto:neil.williams@linaro.org] Sent: 2018年2月8日 16:52 To: Liao, Guoqi guoqi.liao@hxt-semitech.com Cc: lava-users@lists.linaro.org Subject: Re: [Lava-users] how to achieve sync among multi-node in job definition
On 8 February 2018 at 08:26, Liao, Guoqi <guoqi.liao@hxt-semitech.commailto:guoqi.liao@hxt-semitech.com> wrote: Hi Guys:
When I am doing a multi-node testing, I create one job definition liking below. For example: Sub-job 1 finished booting and testing, but sub-job 2 is on-going booting. So sub-job 1 will Remove the template file like <lava_dipatcher>/tmp/overlay****, that will cause sub-job 2 could NOT download The overlay**** file, sub-job 2 failed in the end. My question is how to do sync between multi-node in the job
Synchronisation is done using the MultiNode API - the test shell simply calls lava-sync but for that to work, there needs to be a functional test shell in the first place.
However, this is a different problem, to do with multiple usage of transfer_overlay.
What version of LAVA are you running? We have fixes for this in the upcoming release.
[Guoqi] our version of LAVA is 2017.12-1+stretch
https://projects.linaro.org/browse/LAVA-1202
However, I don't think we've explicitly tested with MultiNode using transfer_overlay.
Definition?
My job definition:
protocols: lava-multinode: roles: foo:
This looks like a typical client:server MultiNode test job - it really does help if you describe the roles that way rather than using slang.
[Guoqi] in fact, in this case, I just want to start up 2 boards to dome same testing. 2 boards have same role. You know that each board have spent different time on booting, that caused Faster board will delete shared file in server.
tags: - board1 device_type: **********
I'm assuming an internal device-type but it is worth exploring whether the device integration for this type can support adding the overlay to the rootfs in advance.
Transfer_overlay is not a solution to using the same rootfs for multiple test jobs - there are still issues of persistence which will affect the utilities executed by the test shell. It would be much better to deploy a fresh rootfs each time and then let LAVA add the overlay to that rootfs, avoiding the need for transfer_overlay support. The rootfs can have whatever dependencies are required by the base system pre-installed but a fresh rootfs each time means that the configuration is always the same at the start of each test job. [Guoqi] In my env, we deployed the system on hard disk, so we need to transfer overlay to devices everytime.
Keep things simple and only change one element at a time. Not deploying the rootfs each time means that the rootfs *can* change arbitrarily between test jobs. So not deploying the rootfs each time means that you are not only changing the kernel each test job, you are also inheriting unknown changes in the rootfs from the previous test job. The rootfs can be exactly the same tarball every time in every test job but that then means that all your results are reproducible - only the kernel is being changed in each test job. The small amount of time required to deploy a clean rootfs for each test job is tiny in comparison to the engineering time lost by trying to debug issues caused by a persistent rootfs.
context: grub_method: centos grub_installed_device: (hd1,gpt1) count: 1 bar: tags: - board2 device_type: ********** context: grub_method: centos grub_installed_device: (hd2,gpt1) count: 1 timeout: minutes: 6
job_name: centos openjdk test timeouts: job: minutes: 1500 action: minutes: 50 connection: minutes: 30 priority: medium visibility: public
actions: - deploy: role: - foo - bar kernel: url: http://******** type: zimage os: centos timeout: minutes: 80 to: tftp
- boot: timeout: minutes: 40 role: - bar method: grub commands: centos_installed auto_login: login_prompt: 'login:' username: root password_prompt: 'Password:' password: root prompts: - 'root@localhost ~' transfer_overlay: download_command: rm -f /root/overlay* ; ifconfig ; wget -S --progress=dot:giga unpack_command: tar -C / -xaf parameters: shutdown-message: "reboot: Restarting system"
- boot: timeout: minutes: 40 role: - foo method: grub commands: centos_installed auto_login: login_prompt: 'login:' username: root password_prompt: 'Password:' password: root prompts: - 'root@localhost ~' transfer_overlay: download_command: rm -f /root/overlay* ; ifconfig ; wget -S --progress=dot:giga unpack_command: tar -C / -xaf parameters: shutdown-message: "reboot: Restarting system"
- test: role: - foo - bar timeout: minutes: 50 definitions: - repository: ssh://**********/test-definitions from: git branch: ********** path: automated/linux/openjdk/openjdk-smoke.yaml
Any configuration, package installation or setup done by that test definition will be persistent into the next test job and that is known to cause reliability issues, difficulty in triaging of failed results and other complications.
name: openjdk-smoke
Thanks B.R. Guoqi
This email is intended only for the named addressee. It may contain information that is confidential/private, legally privileged, or copyright-protected, and you should handle it accordingly. If you are not the intended recipient, you do not have legal rights to retain, copy, or distribute this email or its contents, and should promptly delete the email and all electronic copies in your system; do not retain copies in any media. If you have received this email in error, please notify the sender promptly. Thank you.
_______________________________________________ Lava-users mailing list Lava-users@lists.linaro.orgmailto:Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
--
Neil Williams ============= neil.williams@linaro.orgmailto:neil.williams@linaro.org http://www.linux.codehelp.co.uk/
This email is intended only for the named addressee. It may contain information that is confidential/private, legally privileged, or copyright-protected, and you should handle it accordingly. If you are not the intended recipient, you do not have legal rights to retain, copy, or distribute this email or its contents, and should promptly delete the email and all electronic copies in your system; do not retain copies in any media. If you have received this email in error, please notify the sender promptly. Thank you.
On 8 February 2018 at 09:10, Liao, Guoqi guoqi.liao@hxt-semitech.com wrote:
The work for LAVA-1202 is to ensure that the overlay tarball isn't deleted, instead it is copied to where it is needed and only removed when the test job finishes.
It is a very small change - you may be able to apply it directly: https://review.linaro.org/#/c/23674/
It would be possible for LAVA to deploy the system to the hard disk fresh on each test job.
lava-users@lists.lavasoftware.org