Hi,
I have a question related to uboot boot action’s retry settings, our job is:
- boot:
failure_retry: 2
namespace: test_suite_1
connection-namespace: burning-uboot_1
method: u-boot
commands: nfs
auto_login:
login_prompt: '(.*) login:'
username: root
prompts:
- 'root@(.*):~#'
timeout:
minutes: 10
1. From the code:
“UBootAction” extends from a RetryAction, while in its internal pipeline, there is action named “UBootRetry” which also extends from RetryAction.
If we define a “retry”, when exception happened in “RetryAction”, it will first cause “UbootRetry” to retry, then “UBootAction” to retry again.
Sounds confuse, I wonder for what reason we should had a nested retry here?
2. In fact the real issue here for us is next:
Let’s suppose we define failure_retry: 2, our situation is:
1) First boot timeout for some random block issue.
2) Then, it start Retrying: 4.4 uboot-retry (599 sec), but timeout again.
3) Then, it start Retrying: 4 uboot-action (599 sec), but timeout again.
4) Then, it start Retrying: 4.4 uboot-retry (599 sec), this time a lucky boot here, but before
we are happy, it finish the last action “export-device-env” in uboot-retry. Then, looks like “UBootAction” timeout resume, then the lucky boot becomes useless although it’s in fact successfully boot.
The log is:
start: 4.4.5 expect-shell-connection (timeout 00:07:23) [test_suite_1]
Forcing a shell prompt, looking for ['root@(.*):~#']
root@imx8mnevk:~#
expect-shell-connection: Wait for prompt ['root@(.*):~#'] (timeout 00:10:00)
Waiting using forced prompt support. 299.9747439622879s timeout
end: 4.4.5 expect-shell-connection (duration 00:00:00) [test_suite_1]
start: 4.4.6 export-device-env (timeout 00:07:23) [test_suite_1]
end: 4.4.6 export-device-env (duration 00:00:00) [test_suite_1]
uboot-action timed out after 727 seconds
end: 4.4 uboot-retry (duration 00:02:07) [test_suite_1]
I’m not sure, but looks like: for second “uboot-action”, there is two “uboot-retry” inside it because of “retry”, which will make when “uboot-action” timeout resume, the time diff becomes less than 0, which directly raise
exception? Is it a bug or I misunderstand it?
duration = round(action_max_end_time - time.time())
if duration <= 0:
signal.alarm(0)
parent.timeout._timed_out(None, None)
Any suggestion for this?