lava-users December 2023

lava-users@lists.lavasoftware.org

4 participants
8 discussions

How to add actions after a reboot triggered by a watchdog (for ex: in case of kernel panic) and continue the test definition

by sai.sathujoda＠toshiba-tsip.com

Hello everyone, I am trying to implement a test case scenario in which I need to confirm partition scheme, environment variables after the reboot which is triggered by the watchdog I am using. So basically I need to be able to successfully login to the image after the reboot triggered by the watchdog. It feels like the link between the test Job and the Linux image is lost after the reboot done by the watchdog. Whatever actions present after that reboot ( boot or test actions ) are not working. Below is the template of my Job definition: job details... device_type: qemu #### (timeouts) ###(priority) ##(notify, context etc.) actions - deploy images: ### firmware: ### - boot auto_login: login_prompt: "###" username: "##" password_prompt: "###" password: "##" - test: definitions: ### repository: #### metadata: ### run: steps: - swupdate -i #### - reboot ------------------------------------------------- At this stage I introduced a service which causes kernel panic during the reboot (by the command explicitly given by me above in the job). So after 'X' seconds of watchdog timeout , a second reboot is triggered by the watchdog. During that reboot done by watchdog, the kernel booted successfully, all the services were set up fine, but the test stopped at the login stage. I think it did not get the "login_prompt" and ''password_prompt" from the boot action I wrote after the above test action ( i.e after reboot done by watchdog ). - boot auto_login: login_prompt: "###" username: "##" password_prompt: "###" password: "##" So Is there a way to add boot and test actions in such a way that the test job can be continued after reboot is done by a watchdog ? Note: When I explicitly provide "reboot" in steps section in test action, then the link did not break and I was able to reboot, login and run test action steps successfully no matter how many times I wanted. This query is specifically for cases in which reboot happened out of test writer's scope. (i.e like reboot triggered by a watchdog)

1 year, 7 months

Unable to detect the reboot triggered by watchdog or software updates in LAVA

by sai.sathujoda＠toshiba-tsip.com

Hello everyone, I would like to open thread of discussion to understand about LAVA test framework support for some of the use cases where I’m facing issues. While testing a reboot scenario in CIP (https://gitlab.com/cip-project/cip-core/isar-cip-core) where reboot is triggered by watchdog. LAVA is unable to do successful reboot. Following are the steps: device_type: qemu job_name: qemu x86_64 software update testing timeouts: job: minutes: 20 action: minutes: 10 actions: power-off: seconds: 60 priority: high visibility: public notify: criteria: status: finished recipients: - to: method: email email: sai.sathujoda(a)toshiba-tsip.com context: arch: x86_64 lava_test_dir: '/home/lava-%s' # ACTION BLOCK actions: - deploy: timeout: minutes: 15 to: tmpfs images: system: image_arg: '-drive file={system},discard=unmap,if=none,id=disk,format=raw -m 1G -serial mon:stdio -cpu qemu64 -smp 4 -machine q35,accel=tcg -global ICH9-LPC.noreboot=off -device ide-hd,drive=disk -nographic' url: ######.wic.xz compression: xz firmware: image_arg: '-drive if=pflash,format=raw,unit=0,readonly=on,file={firmware}' url: ###### # BOOT BLOCK - boot: timeout: minutes: 5 method: qemu media: tmpfs prompts: ["root@demo:~#"] auto_login: login_prompt: "demo login:" username: "root" password_prompt: "Password:" password: "root" # TEST_BLOCK - test: timeout: minutes: 5 definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: sample-test description: "check reboot version" run: steps: - lava-test-case uname --shell uname -a - cd /home - wget --no-check-certificate #### - lsblk - swupdate -i cip-core-* - reboot from: inline name: sample-test-1 path: inline/sample-test.yaml - boot: timeout: minutes: 5 method: qemu media: tmpfs prompts: ["root@demo:"] auto_login: login_prompt: "demo login:" username: "root" password_prompt: "Password:" password: "root" - test: timeout: minutes: 5 definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: sample-test description: "check partition switch" run: steps: - lsblk from: inline name: sample-test-2 path: inline/sample-test.yaml context: arch: x86_64 lava_test_results_dir: '/home/lava-%s' A reboot is triggered by watchdog following the reboot done in the test action due to failed case. The reboot triggered by watchdog failed with timeout error at login stage which can be interpreted that last boot action in the above job definition failed to give the assigned login prompts. I have already received some opinion about this from LAVA users community that LAVA does not support the board being rebooted outside it’s control ( whether by a watchdog or a package ). However, CIP extensively uses LAVA as test framework to regressively perform many kinds of tests on CIP supported hardware. Testing the watchdog is an important use case in CIP. Since LAVA is supposed to be test framework which can help to test many type of hardware. We as CIP project member would like to understand LAVA community future plan to support this use case. Thanks and Regards, Sai Ashrith

1 year, 7 months

lava-publisher not publishing

by Milosz Wasilewski

Hi, During last couple of weeks I had at least 2 occasions when lava-publisher stopped working. There is nothing in the logs that suggest a failure. The only symptom was that no events were published. Restarting the service fixes the issue. Is this a known bug? I'm running 2023.10 release. Best Regards, Milosz

1 year, 7 months

How to fail a test run on kernel warnings that happen after the boot action?

by Florian Bezdeka

Hi all, I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see... We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed". This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time. After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution. Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed". Any ideas? Did I overlook something? Best regards, Florian [1] https://git.lavasoftware.org/lava/lava/-/issues/576

1 year, 7 months

How to set ssh password to autologin with ssh

by irreallich＠126.com

Hi, the test job : https://gitlab.com/lava/lava/-/blob/master/tests/lava_dispatcher/sample_job… device types: https://gitlab.com/lava/lava/-/blob/master/etc/dispatcher-config/device-typ… device: {% extends 'ssh.jinja2' %} {% set ssh_host = 'localhost' %} {% set ssh_user = 'test' %} lava version : 2023.08 The job failed because of missing passwd. log as below: test@localhost: Permission denied (publickey,password). Connection closed end: 2.1 scp-deploy (duration 00:00:00) [common] It would be very grateful if you could tell me how to autologin ssh server with user password.

1 year, 7 months

Test Jobs always going in submitted state for QEMU devices

by Ankit Gupta

Hi All, I am trying to setup LAVA 2023.08 lava-server and lava-dispatcher on a single machine, I successfully installed and was able to add workers and a QEMU device. From logs, it seems the workers are communicating with the lava server, I am trying to execute a simple QEMU sample job and it always going in the submitted state. I verified the device dictionary is correct. As per my understanding if communication is happening between lava-server/worker then a device should be assigned that is configured on the worker, but in my case, no device is assigned to the test job and it always keeps in a submitted state. Can someone let me know if any special settings are required for running a test job on a QEMU device? or share any docs link s to resolve this, any input will be appreciated. My Testjob YAML: https://docs.lavasoftware.org/lava/examples/test-jobs/qemu-amd64-standard-s… Thanks, Ankit

1 year, 7 months

Resting the ECU between test cases

by gemad＠outlook.com

Hello, Is there a standard way to reboot the ECU by calling the reboot command defined in the device type (hard_reset_command), between test case, so we are sure the tests are not impacting each other? Thanks.

1 year, 7 months

[SSL: CERTIFICATE_VERIFY_FAILED] self signed certificate

by Ankit Gupta

Hi All, I am setting an HTTPS instance of LAVA, I can access the LAVA UI and am able to log in as well. I am using a self-signed SSL certificate. I have added *URL="https://172.16.60.178/ <https://172.16.60.178/>"* in the /etc/lava-dispatcher/lava-worker file and restarted the service. It's giving me the below error. *2023-10-30 07:10:47,383 ERROR -> server error: code 5032023-10-30 07:10:47,383 DEBUG --> HTTPSConnectionPool(host='172.16.60.178', port=443): Max retries exceeded with url: /scheduler/internal/v1/workers/debian/?version=2023.10 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1123)')))* Do we need to configure some extra settings in case we are using a self-signed certificate? any help will be appreciated. Thanks, Ankit

1 year, 7 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

lava-users December 2023