Hi,
I have a question related to uboot boot action's retry settings, our job is:
- boot: failure_retry: 2 namespace: test_suite_1 connection-namespace: burning-uboot_1 method: u-boot commands: nfs auto_login: login_prompt: '(.*) login:' username: root prompts: - 'root@(.*):~#' timeout: minutes: 10
1. From the code:
"UBootAction" extends from a RetryAction, while in its internal pipeline, there is action named "UBootRetry" which also extends from RetryAction. If we define a "retry", when exception happened in "RetryAction", it will first cause "UbootRetry" to retry, then "UBootAction" to retry again.
Sounds confuse, I wonder for what reason we should had a nested retry here?
2. In fact the real issue here for us is next: Let's suppose we define failure_retry: 2, our situation is: 1) First boot timeout for some random block issue. 2) Then, it start Retrying: 4.4 uboot-retry (599 sec), but timeout again. 3) Then, it start Retrying: 4 uboot-action (599 sec), but timeout again. 4) Then, it start Retrying: 4.4 uboot-retry (599 sec), this time a lucky boot here, but before we are happy, it finish the last action "export-device-env" in uboot-retry. Then, looks like "UBootAction" timeout resume, then the lucky boot becomes useless although it's in fact successfully boot.
The log is: start: 4.4.5 expect-shell-connection (timeout 00:07:23) [test_suite_1] Forcing a shell prompt, looking for ['root@(.*):~#']
root@imx8mnevk:~# expect-shell-connection: Wait for prompt ['root@(.*):~#'] (timeout 00:10:00) Waiting using forced prompt support. 299.9747439622879s timeout end: 4.4.5 expect-shell-connection (duration 00:00:00) [test_suite_1] start: 4.4.6 export-device-env (timeout 00:07:23) [test_suite_1] end: 4.4.6 export-device-env (duration 00:00:00) [test_suite_1] uboot-action timed out after 727 seconds end: 4.4 uboot-retry (duration 00:02:07) [test_suite_1]
I'm not sure, but looks like: for second "uboot-action", there is two "uboot-retry" inside it because of "retry", which will make when "uboot-action" timeout resume, the time diff becomes less than 0, which directly raise exception? Is it a bug or I misunderstand it?
duration = round(action_max_end_time - time.time()) if duration <= 0: signal.alarm(0) parent.timeout._timed_out(None, None)
Any suggestion for this?