Hello everyone,
I am trying to implement a test case scenario in which I need to confirm partition scheme, environment variables after the reboot which is triggered by the watchdog I am using. So basically I need to be able to successfully login to the image after the reboot triggered by the watchdog.
It feels like the link between the test Job and the Linux image is lost after the reboot done by the watchdog. Whatever actions present after that reboot ( boot or test actions ) are not working.
Below is the template of my Job definition:
job details...
device_type: qemu
#### (timeouts)
###(priority)
##(notify, context etc.)
actions
- deploy
images: ###
firmware: ###
- boot
auto_login:
login_prompt: "###"
username: "##"
password_prompt: "###"
password: "##"
- test:
definitions: ###
repository: ####
metadata: ###
run:
steps:
- swupdate -i ####
- reboot
-------------------------------------------------
At this stage I introduced a service which causes kernel panic during the reboot (by the command explicitly given by me above in the job). So after 'X' seconds of watchdog timeout , a second reboot is triggered by the watchdog. During that reboot done by watchdog, the kernel booted successfully, all the services were set up fine, but the test stopped at the login stage.
I think it did not get the "login_prompt" and ''password_prompt" from the boot action I wrote after the above test action ( i.e after reboot done by watchdog ).
- boot
auto_login:
login_prompt: "###"
username: "##"
password_prompt: "###"
password: "##"
So Is there a way to add boot and test actions in such a way that the test job can be continued after reboot is done by a watchdog ?
Note: When I explicitly provide "reboot" in steps section in test action, then the link did not break and I was able to reboot, login and run test action steps successfully no matter how many times I wanted.
This query is specifically for cases in which reboot happened out of test writer's scope. (i.e like reboot triggered by a watchdog)
Hello everyone,
I would like to open thread of discussion to understand about LAVA test framework support for some of the use cases where I’m facing issues.
While testing a reboot scenario in CIP (https://gitlab.com/cip-project/cip-core/isar-cip-core) where reboot is triggered by watchdog. LAVA is unable to do successful reboot.
Following are the steps:
device_type: qemu
job_name: qemu x86_64 software update testing
timeouts:
job:
minutes: 20
action:
minutes: 10
actions:
power-off:
seconds: 60
priority: high
visibility: public
notify:
criteria:
status: finished
recipients:
- to:
method: email
email: sai.sathujoda(a)toshiba-tsip.com
context:
arch: x86_64
lava_test_dir: '/home/lava-%s'
# ACTION BLOCK
actions:
- deploy:
timeout:
minutes: 15
to: tmpfs
images:
system:
image_arg: '-drive file={system},discard=unmap,if=none,id=disk,format=raw -m 1G -serial mon:stdio -cpu qemu64 -smp 4 -machine q35,accel=tcg -global ICH9-LPC.noreboot=off -device ide-hd,drive=disk -nographic'
url: ######.wic.xz
compression: xz
firmware:
image_arg: '-drive if=pflash,format=raw,unit=0,readonly=on,file={firmware}'
url: ######
# BOOT BLOCK
- boot:
timeout:
minutes: 5
method: qemu
media: tmpfs
prompts: ["root@demo:~#"]
auto_login:
login_prompt: "demo login:"
username: "root"
password_prompt: "Password:"
password: "root"
# TEST_BLOCK
- test:
timeout:
minutes: 5
definitions:
- repository:
metadata:
format: Lava-Test Test Definition 1.0
name: sample-test
description: "check reboot version"
run:
steps:
- lava-test-case uname --shell uname -a
- cd /home
- wget --no-check-certificate ####
- lsblk
- swupdate -i cip-core-*
- reboot
from: inline
name: sample-test-1
path: inline/sample-test.yaml
- boot:
timeout:
minutes: 5
method: qemu
media: tmpfs
prompts: ["root@demo:"]
auto_login:
login_prompt: "demo login:"
username: "root"
password_prompt: "Password:"
password: "root"
- test:
timeout:
minutes: 5
definitions:
- repository:
metadata:
format: Lava-Test Test Definition 1.0
name: sample-test
description: "check partition switch"
run:
steps:
- lsblk
from: inline
name: sample-test-2
path: inline/sample-test.yaml
context:
arch: x86_64
lava_test_results_dir: '/home/lava-%s'
A reboot is triggered by watchdog following the reboot done in the test action due to failed case. The reboot triggered by watchdog failed with timeout error at login stage which can be interpreted that last boot action in the above job definition failed to give the assigned login prompts.
I have already received some opinion about this from LAVA users community that LAVA does not support the board being rebooted outside it’s control ( whether by a watchdog or a package ).
However, CIP extensively uses LAVA as test framework to regressively perform many kinds of tests on CIP supported hardware.
Testing the watchdog is an important use case in CIP. Since LAVA is supposed to be test framework which can help to test many type of hardware.
We as CIP project member would like to understand LAVA community future plan to support this use case.
Thanks and Regards,
Sai Ashrith
Hi,
During last couple of weeks I had at least 2 occasions when
lava-publisher stopped working. There is nothing in the logs that
suggest a failure. The only symptom was that no events were published.
Restarting the service fixes the issue. Is this a known bug? I'm
running 2023.10 release.
Best Regards,
Milosz
Hi all,
I'm basically repeating [1] here as there was no reaction for some
months now. Maybe I used the wrong communication channel, let's see...
We have a testsuite that is able to trigger a RCU WARNING inside the
Linux kernel. My expectation was that whenever a kernel warning / oops
/ call stack dump / ... occurs the LAVA job is marked as "failed".
This assumption seems to be wrong. It took some time to realize that we
have a real problem as manual inspection of test logs only happens from
time to time.
After scanning the code my understanding is that the output of the
connection (serial connection in my case) is only parsed during kernel
boot (until the login action takes over). That is not sufficient for
detecting problems that happen during test execution.
Is there a way to scan the full log for the same patterns that are used
by the boot action? If so, how to configure that? Whenever a kernel
problem occurs my test run should be marked as "failed".
Any ideas? Did I overlook something?
Best regards,
Florian
[1] https://git.lavasoftware.org/lava/lava/-/issues/576
Hi All,
I am trying to setup LAVA 2023.08 lava-server and lava-dispatcher on a
single machine, I successfully installed and was able to add workers and a
QEMU device.
From logs, it seems the workers are communicating with the lava server, I
am trying to execute a simple QEMU sample job and it always going in the
submitted state. I verified the device dictionary is correct.
As per my understanding if communication is happening between
lava-server/worker then a device should be assigned that is configured on
the worker, but in my case, no device is assigned to the test job and it
always keeps in a submitted state.
Can someone let me know if any special settings are required for running a
test job on a QEMU device? or share any docs link s to resolve this, any
input will be appreciated.
My Testjob YAML:
https://docs.lavasoftware.org/lava/examples/test-jobs/qemu-amd64-standard-s…
Thanks,
Ankit
Hello,
Is there a standard way to reboot the ECU by calling the reboot command defined in the device type (hard_reset_command),
between test case, so we are sure the tests are not impacting each other?
Thanks.
Hi All,
I am setting an HTTPS instance of LAVA, I can access the LAVA UI and am
able to log in as well. I am using a self-signed SSL certificate.
I have added *URL="https://172.16.60.178/ <https://172.16.60.178/>"* in the
/etc/lava-dispatcher/lava-worker file and restarted the service. It's
giving me the below error.
*2023-10-30 07:10:47,383 ERROR -> server error: code 5032023-10-30
07:10:47,383 DEBUG --> HTTPSConnectionPool(host='172.16.60.178',
port=443): Max retries exceeded with url:
/scheduler/internal/v1/workers/debian/?version=2023.10 (Caused by
SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED]
certificate verify failed: self signed certificate (_ssl.c:1123)')))*
Do we need to configure some extra settings in case we are using a
self-signed certificate? any help will be appreciated.
Thanks,
Ankit