lava test doesn't wait for command to finish running

List overview All Threads
Download

newer

older

RecursionError: maximum recursion...

MultiNode to run...

Michael Peddie

13 Jun 2024 13 Jun '24

6:13 a.m.

Just as the subject says. I am using lava-test-case to confirm whether this particular command is done running and is successful or not, because the test has previously always ended prematurely and not done what I needed it to do. When I run the command manually outside of a lava test job it works just fine, no errors and doesn't end early. It does take a while to complete, but I don't think that's the issue as I use wget to download some large files and those have taken several minutes longer than this one command is supposed to.

How do I get lava to wait for this command to end? Or to change how it checks for command failure? I get "Received signal: <ENDTC>" almost immediately after running the command, and subsequently the result=fail signal. I really don't know why it won't work, so if there is a change I can make to the files or a config somewhere I would like to know.

Best regards, Michael

Show replies by date

Michael Peddie

13 Jun 13 Jun

7:12 a.m.

Just thought it may be a good idea to mention that when the command is run without lava-test-case this is the output following the command: <LAVA_TEST_RUNNER EXIT> ok: lava_test_shell seems to have completed Marking unfinished test run as failed

Each on separate lines and coloured green, blue, and red respectively. I am not sure if this information helps at all, but I thought I would provide it regardless.

Best regards, Michael

Milosz Wasilewski

3:20 p.m.

On Thu, Jun 13, 2024 at 12:13 AM Michael Peddie michael.peddie@gallagher.com wrote:

...

Just thought it may be a good idea to mention that when the command is run without lava-test-case this is the output following the command:

<LAVA_TEST_RUNNER EXIT> ok: lava_test_shell seems to have completed Marking unfinished test run as failed

Looks like the command doesn't return 0 exit code. But it's really hard to tell without seeing the code and the job definition.

Best Regards, Milosz

...

Each on separate lines and coloured green, blue, and red respectively. I am not sure if this information helps at all, but I thought I would provide it regardless.

Best regards, Michael _______________________________________________ lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Michael Peddie

14 Jun 14 Jun

5:29 a.m.

How do I find the exit code that was returned? I don't see anything on the web ui and the job output log doesn't have anything by the looks of it. If it's the endtc and starttc in the log, then the endtc is 780 and the starttc is 778.

Here's the job def:

device_type: controller job_name: controller deploy test

context: lava_test_results_dir: /tmp/lava-%s

timeouts: job: minutes: 15 action: minutes: 10 connection: minutes: 5 priority: medium visibility: public

actions: - deploy: timeout: minutes: 5 to: tftp kernel: url: file:///kernel.img type: uimage ramdisk: url: file:///ramdisk.gz compression: gz dtb: url: file:///u-boot.dtb

- boot: transfer_overlay: transfer_method: http download_command: cd / ; tar -C /tmp/ -xzf /data/wgetcurl.tar.gz ; cd /tmp/ ; ./wgetcurl/wget -nv unpack_command: tar -C /tmp -xzf

timeout: minutes: 5 method: minimal prompts: ["# $"] auto_login: login_prompt: "login:" username: root

- test: timeout: minutes: 10 definitions: - repository: https://<pat>@github.com/MichaelPed/lava-tests.git from: git path: artifactory/artifactory.yaml name: artifactory history: False

Michael Peddie

10:11 a.m.

I went into the lava-test-case script on my worker, and modified it to print the exit code retrieved from $? using the printf command, and it returned 255. A quick online search says that 255 exit code means the command did not exit as expected, which for LAVA Im not sure what they expect. The command itself doesn't print anything to the console, and just exits cleanly if nothing goes wrong during it's execution, otherwise it prints to the console when something isnt right (wrong args or something).

Any idea what could be happening?

Best regards, Michael

Michael Peddie

11:37 a.m.

Did manual testing after seeing that lava-test-case uses eval. The execution of the command through eval ended quite quickly, and when I checked the exit code it was 255. I don't think there is an alternative to using eval, except maybe a subshell through echo? e.g. echo $(cmd)? But that wouldn't return the exit code for the command would it? I know eval is used because it allows the running of the command passed as an argument, and then being able to see the exit code to determine if the command worked or not.

I did do a manual run of echo $(cmd), and it ran for around the expected time, and the command did work. The only problem thereafter is knowing how to determine if it was a fail or success using the existing infrastructure. Will do some testing in the meantime with echo and subshells to see what I can do.

Best regards, Michael

Michael Peddie

17 Jun 17 Jun

7:40 a.m.

Nothing I have tried has worked. The subshell idea only yielded the same results as the already implemented method. I tried writing and running a script outside of LAVA to download the file and run the command, in case it was the fact it was running inside a script that was the issue, but the command worked. I then imported that script into my GitHub set up and ran a job using that, but it then no longer worked. So I am quite confident the problem lies with LAVA, but what exactly it could be, I don't have a clue.

Just for reference, the subshell I wrote and used looked like this (keep in mind I completely removed the eval command when running this): rc=$(echo $($* ; echo $?))

Best regards, Michael

Michael Peddie

20 Jun 20 Jun

6:45 a.m.

Hi Milosz,

After some further looking and discussion with others, do you think it is possible that the way the command I'm running works could be the issue?

The command gets passed flags and arguments, and after processing that spawns a child thread, after which the parent thread from the initial call ends to let the child thread run. Is it possible that LAVA is picking up on the parent thread ending and not letting the child thread run through?

I am working on testing this with custom scripts, but that is the leading idea at the moment, please let me know what you think.

Best regards, Michael

Milosz Wasilewski

3:45 p.m.

Michael,

On Wed, Jun 19, 2024 at 11:45 PM Michael Peddie michael.peddie@gallagher.com wrote:

...

Hi Milosz,

After some further looking and discussion with others, do you think it is possible that the way the command I'm running works could be the issue?

The command gets passed flags and arguments, and after processing that spawns a child thread, after which the parent thread from the initial call ends to let the child thread run. Is it possible that LAVA is picking up on the parent thread ending and not letting the child thread run through?

This sounds plausible. I don't think LAVA would stop execution for the background task or thread. Test execution in LAVA is pretty simple. It calls run.sh script that it creates from test-definition steps [1][2]. Once the steps exit, LAVA test execution completes. Anything running in the background is ignored.

I can't tell much without looking at the code. Your code is in the private repository, so there isn't much for me to comment on.

[1] https://gitlab.com/lava/lava/-/blob/master/lava_dispatcher/lava_test_shell/l... [2] https://gitlab.com/lava/lava/-/blob/master/lava_dispatcher/lava_test_shell/l...

Best Regards, Milosz

...

I am working on testing this with custom scripts, but that is the leading idea at the moment, please let me know what you think.

Best regards, Michael _______________________________________________ lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Michael Peddie

21 Jun 21 Jun

5:29 a.m.

Hi,

Thank you for the verification. I have written some code in C for testing that creates a fork which runs a function that sleeps for a time (less than the command I actually want to run, but long enough for the parent thread to end). It can be found here: https://github.com/MichaelPed/test-scripts-lava

I have tried numerous ways to get the PID of the child process spawned by this code, and wait until it does running. This can be said to be two areas: Direct wait methods, and polling. Neither have worked so far, with them either not running at all, returning an error (process is not a child...) or they continue to "wait" until well after the process has ended (sometimes even not ending at all, requiring I cancel the job or let it timeout).

This is strange as they all work fine when run manually. For a better idea of what the job is actually doing for testing (so I will skip deploy and boot explanation): my yaml files run steps pretty much only runs the command "lava-test-case test-script --shell ./test.sh". Any another commands run before this are just to make sure that that script can be run (e.g. cd and chmod).

This test.sh has the following code in it: ./test ; while kill -0 $(pgrep test) ; do sleep 0.1; done

That while loop is one of several variations I have tried, I've also tried starting the test code as a background process (so using & after call). I and others are quite stumped by this, since it works fine waiting for the background process spawned to end when run manually, but in LAVA it refuses to work.

I hope any of this helps you to help me, thank you for the help so far.

Best regards, Michael

Michael Peddie

1:14 p.m.

I landed on my bash script running the below code, and it runs both for the test code I sent in the previous reply and for the actual command I want to run. However, the command still does not seem to run correctly, as the results of its run can't be seen at all.

Code: eval "<cmd and params>" tPID=$(pgrep <cmd without params>) # using sed like this after ps instead of the --pid= option because the DUT doesn't support that while [ -n "$(ps | sed "/${tPID}/!d; s/ //g; /sed/d; /${tPID}/q")" ] do sleep 0.1 done

If there are any clues or ideas that may be of use, then that would be much appreciated, thank you.

Best regards, Michael

Michael Peddie

24 Jun 24 Jun

11:46 a.m.

I'm not sure how important it is, but if this should have been mentioned from the start my apologies, but the command I am trying to run is supposed to update the FIT image on the DUT. Ideally at the end it reboots the device (so the FIT image actually applies or whatever the term is) and then the actual test can run, but that isn't a part of this thread.

Is it even possible for LAVA to run such a command? If so, how? If not, what architecture/service is best recommended to do that, that works with LAVA for testing?

Best regards, Michael

Michael Peddie

27 Jun 27 Jun

7:24 a.m.

Hi Milosz,

After some further testing (changing command options, switching connection command to DUT from worker and others), I am quite certain the problem is something to do with the way LAVA interacts with the DUT and runs commands on it. I can run any and all scripts that do the job I want manually, and run each command individually, but when I try to automate it through LAVA it stops working immediately. I don't expect to be able to get or come to a conclusion on what the exact issue is, nor get a solution, but I would like assistance in generating ideas and working towards something.

Best regards, Michael

Milosz Wasilewski

3:58 p.m.

On Thu, Jun 27, 2024 at 12:24 AM Michael Peddie michael.peddie@gallagher.com wrote:

...

Hi Milosz,

After some further testing (changing command options, switching connection command to DUT from worker and others), I am quite certain the problem is something to do with the way LAVA interacts with the DUT and runs commands on it. I can run any and all scripts that do the job I want manually, and run each command individually, but when I try to automate it through LAVA it stops working immediately. I don't expect to be able to get or come to a conclusion on what the exact issue is, nor get a solution, but I would like assistance in generating ideas and working towards something.

Michael,

I think I mentioned this before. It's impossible for me to help or even debug the problem without looking at the actual code. Since you're not sharing it, there is no chance of any meaningful help. Please reconsider your sharing policy.

Best Regards, Milosz

Michael Peddie

1 Jul 1 Jul

4:22 a.m.

Hi Milosz,

Apologies, I must have missed or misunderstood what you said. Unfortunately I cannot share the code as it is proprietary, but I appreciate the help thus far. I will continue to work towards a solution independently.

Best regards, Michael

Michael Peddie

13 Jun 13 Jun

11:12 a.m.

On an unrelated note: Where can I modify the output to the web UI? I had found it previously but after updating my lava versions on my server and worker my changes were overwritten and I forgot to note down where they all were.

Regards, Michael

404

days inactive

422

days old

lava-users@lists.lavasoftware.org

15 comments

participants

tags (0)

participants (2)

Michael Peddie
Milosz Wasilewski