Hi,
After the 2025.02 release download timeouts are terribly broken. This should have never happened. LAVA is silently dividing the action timeout by number of repetitions: https://gitlab.com/lava/lava/-/wikis/releases/2025.02#timeout-and-retries https://gitlab.com/lava/lava/-/commit/07b13b5e4a2bb335dcfff155974ade81ef17b0...
This causes perfectly good jobs to fail. Example: https://lava.infra.foundries.io/scheduler/job/73929
If such change is required it should have been done with explicit parameter set in the job definition. In the current form I consider it a bug. The only way to work around this bug is to set the timeout to some ridiculous number (3 times as long as it is normally required). IMHO this should be reverted before the next release.
Best Regards, Milosz
Hi Milosz,
Per the discussion here https://gitlab.com/lava/lava/-/merge_requests/2800#note_2503048170, what about we set the default download retries back to 1? When the retries is increased explicitly, the timeout should be set to 'retries * timeout'. Make sense?
Cheers, Chase
On Fri, 9 May 2025 at 18:11, Milosz Wasilewski < milosz.wasilewski@foundries.io> wrote:
Hi,
After the 2025.02 release download timeouts are terribly broken. This should have never happened. LAVA is silently dividing the action timeout by number of repetitions: https://gitlab.com/lava/lava/-/wikis/releases/2025.02#timeout-and-retries
https://gitlab.com/lava/lava/-/commit/07b13b5e4a2bb335dcfff155974ade81ef17b0...
This causes perfectly good jobs to fail. Example: https://lava.infra.foundries.io/scheduler/job/73929
If such change is required it should have been done with explicit parameter set in the job definition. In the current form I consider it a bug. The only way to work around this bug is to set the timeout to some ridiculous number (3 times as long as it is normally required). IMHO this should be reverted before the next release.
Best Regards, Milosz _______________________________________________ lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
On Mon, May 19, 2025 at 12:13 PM Chase Qi chase.qi@linaro.org wrote:
Hi Milosz,
Per the discussion here https://gitlab.com/lava/lava/-/merge_requests/2800#note_2503048170, what about we set the default download retries back to 1? When the retries is increased explicitly, the timeout should be set to 'retries * timeout'. Make sense?
This is _a_ solution. What is implemented currently must be changed. The problem with that is that it disables the possibility of setting http-download action timeout separately. It's always going to come from "division" of the parent timeout by the number of repetitions, even if it's dividing by 1.
Another approach would be to use a new flag: divide_timeout_by_retries (or sth like this). It should be set to false by default to preserve backward compatibility.
Best Regards, Milosz
On Mon, 19 May 2025 at 19:53, Milosz Wasilewski < milosz.wasilewski@foundries.io> wrote:
On Mon, May 19, 2025 at 12:13 PM Chase Qi chase.qi@linaro.org wrote:
Hi Milosz,
Per the discussion here
https://gitlab.com/lava/lava/-/merge_requests/2800#note_2503048170, what about we set the default download retries back to 1? When the retries is increased explicitly, the timeout should be set to 'retries * timeout'. Make sense?
This is _a_ solution. What is implemented currently must be changed. The problem with that is that it disables the possibility of setting http-download action timeout separately. It's always going to come from "division" of the parent timeout by the number of repetitions, even if it's dividing by 1.
I am not sure what is missing. The named action timeout below works fine for me.
``` - deploy: timeout: minutes: 5 timeouts: http-download: minutes: 3 to: downloads images: boot: url: http://192.168.18.190:8088/db410c/hc/boot-linaro-buster-dragonboard-410c-359... compression: gz ```
Named action timeout always has a priority, and it wouldn't be divided. When a named action is not provided, lava needs to set a timeout for the child actions.
Cheers, Chase
Another approach would be to use a new flag: divide_timeout_by_retries (or sth like this). It should be set to false by default to preserve backward compatibility.
Best Regards, Milosz
On Tue, May 20, 2025 at 2:22 AM Chase Qi chase.qi@linaro.org wrote:
On Mon, 19 May 2025 at 19:53, Milosz Wasilewski milosz.wasilewski@foundries.io wrote:
On Mon, May 19, 2025 at 12:13 PM Chase Qi chase.qi@linaro.org wrote:
Hi Milosz,
Per the discussion here https://gitlab.com/lava/lava/-/merge_requests/2800#note_2503048170, what about we set the default download retries back to 1? When the retries is increased explicitly, the timeout should be set to 'retries * timeout'. Make sense?
This is _a_ solution. What is implemented currently must be changed. The problem with that is that it disables the possibility of setting http-download action timeout separately. It's always going to come from "division" of the parent timeout by the number of repetitions, even if it's dividing by 1.
I am not sure what is missing. The named action timeout below works fine for me.
- deploy: timeout: minutes: 5 timeouts: http-download: minutes: 3 to: downloads images: boot: url: http://192.168.18.190:8088/db410c/hc/boot-linaro-buster-dragonboard-410c-359.img.gz compression: gz
Named action timeout always has a priority, and it wouldn't be divided. When a named action is not provided, lava needs to set a timeout for the child actions.
If you define it in job timeouts as:
timeouts: actions: http-download: minutes: 3
It won't work.
Best Regards, Milosz
On Tue, 20 May 2025 at 13:33, Milosz Wasilewski < milosz.wasilewski@foundries.io> wrote:
On Tue, May 20, 2025 at 2:22 AM Chase Qi chase.qi@linaro.org wrote:
On Mon, 19 May 2025 at 19:53, Milosz Wasilewski <
milosz.wasilewski@foundries.io> wrote:
On Mon, May 19, 2025 at 12:13 PM Chase Qi chase.qi@linaro.org wrote:
Hi Milosz,
Per the discussion here
https://gitlab.com/lava/lava/-/merge_requests/2800#note_2503048170, what about we set the default download retries back to 1? When the retries is increased explicitly, the timeout should be set to 'retries * timeout'. Make sense?
This is _a_ solution. What is implemented currently must be changed. The problem with that is that it disables the possibility of setting http-download action timeout separately. It's always going to come from "division" of the parent timeout by the number of repetitions, even if it's dividing by 1.
I am not sure what is missing. The named action timeout below works fine
for me.
- deploy: timeout: minutes: 5 timeouts: http-download: minutes: 3 to: downloads images: boot: url:
http://192.168.18.190:8088/db410c/hc/boot-linaro-buster-dragonboard-410c-359...
compression: gz
Named action timeout always has a priority, and it wouldn't be divided.
When a named action is not provided, lava needs to set a timeout for the child actions.
If you define it in job timeouts as:
timeouts: actions: http-download: minutes: 3
It won't work.
This is not related to 2025.02 or the fix for action retries. It won't work with old releases like 2024.09 either, see https://validation.linaro.org/scheduler/job/4149138. IMO, the job level named action timeout should be respected too if it is smaller than its parent action timeout and no action block level named action timeout is provided. MRs are welcome.
Cheers, Chase
Best Regards, Milosz
On Tue, May 20, 2025 at 8:38 AM Chase Qi chase.qi@linaro.org wrote:
On Tue, 20 May 2025 at 13:33, Milosz Wasilewski milosz.wasilewski@foundries.io wrote:
On Tue, May 20, 2025 at 2:22 AM Chase Qi chase.qi@linaro.org wrote:
On Mon, 19 May 2025 at 19:53, Milosz Wasilewski milosz.wasilewski@foundries.io wrote:
On Mon, May 19, 2025 at 12:13 PM Chase Qi chase.qi@linaro.org wrote:
Hi Milosz,
Per the discussion here https://gitlab.com/lava/lava/-/merge_requests/2800#note_2503048170, what about we set the default download retries back to 1? When the retries is increased explicitly, the timeout should be set to 'retries * timeout'. Make sense?
This is _a_ solution. What is implemented currently must be changed. The problem with that is that it disables the possibility of setting http-download action timeout separately. It's always going to come from "division" of the parent timeout by the number of repetitions, even if it's dividing by 1.
I am not sure what is missing. The named action timeout below works fine for me.
- deploy: timeout: minutes: 5 timeouts: http-download: minutes: 3 to: downloads images: boot: url: http://192.168.18.190:8088/db410c/hc/boot-linaro-buster-dragonboard-410c-359.img.gz compression: gz
Named action timeout always has a priority, and it wouldn't be divided. When a named action is not provided, lava needs to set a timeout for the child actions.
If you define it in job timeouts as:
timeouts: actions: http-download: minutes: 3
It won't work.
This is not related to 2025.02 or the fix for action retries. It won't work with old releases like 2024.09 either, see https://validation.linaro.org/scheduler/job/4149138. IMO, the job level named action timeout should be respected too if it is smaller than its parent action timeout and no action block level named action timeout is provided. MRs are welcome.
Ah, so it's just coincidentally another bug, good to know. I only tested on 2025.04 after upgrade so I didn't notice it didn't work earlier. I would gladly send a patch if the time to review and merge was reasonable. How can we fix this underlying issue?
Best Regards, Milosz
On Tue, 20 May 2025 at 15:46, Milosz Wasilewski < milosz.wasilewski@foundries.io> wrote:
On Tue, May 20, 2025 at 8:38 AM Chase Qi chase.qi@linaro.org wrote:
On Tue, 20 May 2025 at 13:33, Milosz Wasilewski <
milosz.wasilewski@foundries.io> wrote:
On Tue, May 20, 2025 at 2:22 AM Chase Qi chase.qi@linaro.org wrote:
On Mon, 19 May 2025 at 19:53, Milosz Wasilewski <
milosz.wasilewski@foundries.io> wrote:
On Mon, May 19, 2025 at 12:13 PM Chase Qi chase.qi@linaro.org
wrote:
Hi Milosz,
Per the discussion here
https://gitlab.com/lava/lava/-/merge_requests/2800#note_2503048170, what about we set the default download retries back to 1? When the retries is increased explicitly, the timeout should be set to 'retries * timeout'. Make sense?
This is _a_ solution. What is implemented currently must be changed. The problem with that is that it disables the possibility of setting http-download action timeout separately. It's always going to come from "division" of the parent timeout by the number of repetitions, even if it's dividing by 1.
I am not sure what is missing. The named action timeout below works
fine for me.
- deploy: timeout: minutes: 5 timeouts: http-download: minutes: 3 to: downloads images: boot: url:
http://192.168.18.190:8088/db410c/hc/boot-linaro-buster-dragonboard-410c-359...
compression: gz
Named action timeout always has a priority, and it wouldn't be
divided. When a named action is not provided, lava needs to set a timeout for the child actions.
If you define it in job timeouts as:
timeouts: actions: http-download: minutes: 3
It won't work.
This is not related to 2025.02 or the fix for action retries. It won't
work with old releases like 2024.09 either, see https://validation.linaro.org/scheduler/job/4149138. IMO, the job level named action timeout should be respected too if it is smaller than its parent action timeout and no action block level named action timeout is provided. MRs are welcome.
Ah, so it's just coincidentally another bug, good to know. I only tested on 2025.04 after upgrade so I didn't notice it didn't work earlier. I would gladly send a patch if the time to review and merge was reasonable. How can we fix this underlying issue?
I would say MRs after discussion on the channels or with an issue should be easier to review/merge.
Cheers, Chase
Best Regards, Milosz
On Tue, May 20, 2025 at 9:21 AM Chase Qi chase.qi@linaro.org wrote:
On Tue, 20 May 2025 at 15:46, Milosz Wasilewski milosz.wasilewski@foundries.io wrote:
On Tue, May 20, 2025 at 8:38 AM Chase Qi chase.qi@linaro.org wrote:
On Tue, 20 May 2025 at 13:33, Milosz Wasilewski milosz.wasilewski@foundries.io wrote:
On Tue, May 20, 2025 at 2:22 AM Chase Qi chase.qi@linaro.org wrote:
On Mon, 19 May 2025 at 19:53, Milosz Wasilewski milosz.wasilewski@foundries.io wrote:
On Mon, May 19, 2025 at 12:13 PM Chase Qi chase.qi@linaro.org wrote: > > Hi Milosz, > > Per the discussion here https://gitlab.com/lava/lava/-/merge_requests/2800#note_2503048170, what about we set the default download retries back to 1? When the retries is increased explicitly, the timeout should be set to 'retries * timeout'. Make sense?
This is _a_ solution. What is implemented currently must be changed. The problem with that is that it disables the possibility of setting http-download action timeout separately. It's always going to come from "division" of the parent timeout by the number of repetitions, even if it's dividing by 1.
I am not sure what is missing. The named action timeout below works fine for me.
- deploy: timeout: minutes: 5 timeouts: http-download: minutes: 3 to: downloads images: boot: url: http://192.168.18.190:8088/db410c/hc/boot-linaro-buster-dragonboard-410c-359.img.gz compression: gz
Named action timeout always has a priority, and it wouldn't be divided. When a named action is not provided, lava needs to set a timeout for the child actions.
If you define it in job timeouts as:
timeouts: actions: http-download: minutes: 3
It won't work.
This is not related to 2025.02 or the fix for action retries. It won't work with old releases like 2024.09 either, see https://validation.linaro.org/scheduler/job/4149138. IMO, the job level named action timeout should be respected too if it is smaller than its parent action timeout and no action block level named action timeout is provided. MRs are welcome.
Ah, so it's just coincidentally another bug, good to know. I only tested on 2025.04 after upgrade so I didn't notice it didn't work earlier. I would gladly send a patch if the time to review and merge was reasonable. How can we fix this underlying issue?
I would say MRs after discussion on the channels or with an issue should be easier to review/merge.
Chase, the problem I'm trying to highlight here is a very long time to get any MR reviewed or merged. To give an exampe, we discussed git submodules in MR over 2 years ago: https://gitlab.com/lava/lava/-/merge_requests/1799
I totally forgot about it and submitted almost identical patch recently: https://gitlab.com/lava/lava/-/merge_requests/2754
This one has been sitting in lava purgatory for 6 weeks now.
I really don't want to spend the next couple of months wondering whether my 1 line patch is merged or not and why not. If you can propose a solution to this issue I'm happy to submit patches.
Best Regards, Milosz
lava-users@lists.lavasoftware.org