Hello,
We upgrade from LAVA 2019.04 to 2019.10 and we have issues with the prompt matching when the device boots.
We test 2 kinds of Linux images: * Development image * it outputs data into the serial (secure boot, uboot, kernel, systemd) * user: root, pass: empty * Production image * it doesn't output anything and *only presents the login prompt* * user: root, pass: it could be anything
The workflow is the following: * spin up LXC container for recovery mode * boot the DUT waiting for the prompt. We don't need to login using the serial connection but we need to wait the prompt to understand that the board has successfully booted * run tests from the LXC container and using ssh to the DUT
In LAVA 2019.04 everything was working as expected: ... start: 7.3 auto-login-action (timeout 00:05:57) [target] Using line separator: #'\n'# No login prompt set. Parsing kernel messages ['-+\[ cut here \]-+\s+(.*\s+-+\[ end trace (\w*) \]-+)', '(Unhandled fault.*)\r\n', 'Kernel panic - (.*) end Kernel panic', 'Stack:\s+(.*\s+-+\[ end trace (\w*) \]-+)', 'mbed-linux-os-(.*) login:', 'Login timed out', 'Login incorrect'] [auto-login-action] Waiting for messages, (timeout 00:05:57) Waiting using forced prompt support. 178.45381999015808s timeout Trying ::1... Connected to localhost. Escape character is '^]'. Mbed Linux OS mbl-warrior-dev-production_build64 mbed-linux-os-674 ttymxc5 Matched prompt #4: mbed-linux-os-(.*) login: ...
While in 2019.10 the job is failing with the following: ... start: 7.3 auto-login-action (timeout 00:05:57) [target] auto-login-action: Wait for prompt ['Linux version [0-9]'] (timeout 00:06:00) Trying ::1... Connected to localhost. Escape character is '^]'. � Mbed Linux OS mbl-warrior-dev-production_build74 mbed-linux-os-1764 ttymxc5 ...
Another thing we observed in LAVA 2019.10: using the dev images (hence with output data) it works without any problem. Here a snippet of our pipeline:
.... - boot: namespace: recovery timeout: minutes: 5 method: recovery commands: recovery
- deploy: timeout: minutes: 10 to: recovery namespace: recovery connection: lxc images: images to download..... os: debian
- test: namespace: lxc connection: lxc timeout: minutes: 10 definitions: - from: inline name: flash-image path: inline/flash-image.yaml repository: metadata: format: Lava-Test Test Definition 1.0 name: flash-image description: "Flash image to board in recovery mode" os: - oe run: steps: - steps to recover the board
- boot: namespace: recovery timeout: minutes: 5 method: recovery commands: exit
- boot: namespace: target method: minimal failure_retry: 3 prompts: - "mbed-linux-os-(.*) login:" timeout: minutes: 6 ....
Looking at the LAVA code we think the following commit changed the behaviour: https://git.lavasoftware.org/lava/lava/commit/6b5698a2d3ed23031e40aa1d861819...
How are we supposed to change the pipeline to make it work again?
Thanks
-- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/docs/mbed-linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Thu, 31 Oct 2019 at 11:27, Diego Russo Diego.Russo@arm.com wrote:
Hello,
We upgrade from LAVA 2019.04 to 2019.10 and we have issues with the prompt matching when the device boots.
We test 2 kinds of Linux images:
- Development image * it outputs data into the serial (secure boot, uboot, kernel, systemd) * user: root, pass: empty
- Production image * it doesn't output anything and *only presents the login prompt* * user: root, pass: it could be anything
The workflow is the following:
- spin up LXC container for recovery mode
- boot the DUT waiting for the prompt. We don't need to login using the serial connection but we need to wait the prompt to understand that the board has successfully booted
- run tests from the LXC container and using ssh to the DUT
In LAVA 2019.04 everything was working as expected: ... start: 7.3 auto-login-action (timeout 00:05:57) [target] Using line separator: #'\n'# No login prompt set. Parsing kernel messages ['-+\[ cut here \]-+\s+(.*\s+-+\[ end trace (\w*) \]-+)', '(Unhandled fault.*)\r\n', 'Kernel panic - (.*) end Kernel panic', 'Stack:\s+(.*\s+-+\[ end trace (\w*) \]-+)', 'mbed-linux-os-(.*) login:', 'Login timed out', 'Login incorrect'] [auto-login-action] Waiting for messages, (timeout 00:05:57) Waiting using forced prompt support. 178.45381999015808s timeout Trying ::1... Connected to localhost. Escape character is '^]'. Mbed Linux OS mbl-warrior-dev-production_build64 mbed-linux-os-674 ttymxc5 Matched prompt #4: mbed-linux-os-(.*) login: ...
While in 2019.10 the job is failing with the following: ... start: 7.3 auto-login-action (timeout 00:05:57) [target] auto-login-action: Wait for prompt ['Linux version [0-9]'] (timeout 00:06:00) Trying ::1... Connected to localhost. Escape character is '^]'. � Mbed Linux OS mbl-warrior-dev-production_build74 mbed-linux-os-1764 ttymxc5 ...
Another thing we observed in LAVA 2019.10: using the dev images (hence with output data) it works without any problem.
LAVA waits for 'Linux version' string to start waiting for login prompt. Matching this string tells LAVA that bootloader stage is completed. This is why your developer build still works. Do I understand correctly that the production image doesn't output anything until login prompt? If this is true auto login action needs to be tricked somehow. I'll check if I can come up with proper solution.
milosz
Here a snippet of our pipeline:
....
boot: namespace: recovery timeout: minutes: 5 method: recovery commands: recovery
deploy: timeout: minutes: 10 to: recovery namespace: recovery connection: lxc images: images to download..... os: debian
test: namespace: lxc connection: lxc timeout: minutes: 10 definitions:
- from: inline name: flash-image path: inline/flash-image.yaml repository: metadata: format: Lava-Test Test Definition 1.0 name: flash-image description: "Flash image to board in recovery mode" os: - oe run: steps: - steps to recover the board
boot: namespace: recovery timeout: minutes: 5 method: recovery commands: exit
boot: namespace: target method: minimal failure_retry: 3 prompts: - "mbed-linux-os-(.*) login:" timeout: minutes: 6
....
Looking at the LAVA code we think the following commit changed the behaviour: https://git.lavasoftware.org/lava/lava/commit/6b5698a2d3ed23031e40aa1d861819...
How are we supposed to change the pipeline to make it work again?
Thanks
-- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/docs/mbed-linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
LAVA waits for 'Linux version' string to start waiting for login prompt. Matching this string tells LAVA that bootloader stage is completed. This is why your developer build still works. Do I understand correctly that the production image doesn't output anything until login prompt? If this is true auto login action needs to be tricked somehow. I'll check if I can come up with proper solution.
milosz
Hello,
thanks to Dean Birch suggestion we solved the issue specifying the kernel_start_message to an empty string. See PR: https://github.com/ARMmbed/mbl-tools/pull/351/ This works for now but in the future we might be disable the login prompt altogether hence we need an alternative solution to that. The idea is that we might have some polling process on the LXC that checks if the board has booted properly.
Cheers IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
lava-users@lists.lavasoftware.org