Hi all,
I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...
We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".
This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.
After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.
Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".
Any ideas? Did I overlook something?
Best regards, Florian
Hey Florian!
Yeah, communication isn't that easy, that's what I figured out too.
As far as I understood your observations are correct. With standard setup (LAVA attached to console) it isn't easy to distinguish between script and kernel log outputs after login prompt, maybe that's why it isn't done anymore after login.
But there's a manual workaround you can put into your jobs: * Run 'dmesg -c' to clear the ringbuffer * Run that test that might trigger the RCU WARNING * Run 'dmesg' and parse the output if that warning string appears
So that triggering test might not fail, but parsing dmesg output will fail if that string appears, thus your job will report a failed test.
For parsing the dmesg output you can use a inline test job definition like this, or put something similar somewhere into your scripts: - test: timeout: minutes: 1 definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: parse-dmesg-output description: "Test for RCU WARNING in kernel log" run: steps: - lava-test-case test-RCU-WARNING --shell test $(dmesg | grep "RCU WARNING" | wc -l) -eq 0 from: inline name: env-dut-inline path: inline/env-dut.yaml
Hope this is an idea for you.
Best regards
Stefan
On 5/4/2023 5:44 PM, Florian Bezdeka wrote:
Hi all,
I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...
We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".
This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.
After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.
Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".
Any ideas? Did I overlook something?
Best regards, Florian
[1] https://git.lavasoftware.org/lava/lava/-/issues/576
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Hi Stefan!
On Fri, 2023-05-05 at 14:21 +0200, Stefan wrote:
Hey Florian!
Yeah, communication isn't that easy, that's what I figured out too.
As far as I understood your observations are correct. With standard setup (LAVA attached to console) it isn't easy to distinguish between script and kernel log outputs after login prompt, maybe that's why it isn't done anymore after login.
But there's a manual workaround you can put into your jobs:
- Run 'dmesg -c' to clear the ringbuffer
- Run that test that might trigger the RCU WARNING
- Run 'dmesg' and parse the output if that warning string appears
So that triggering test might not fail, but parsing dmesg output will fail if that string appears, thus your job will report a failed test.
For parsing the dmesg output you can use a inline test job definition like this, or put something similar somewhere into your scripts:
- test:
timeout: minutes: 1 definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: parse-dmesg-output description: "Test for RCU WARNING in kernel log" run: steps: - lava-test-case test-RCU-WARNING --shell test $(dmesg | grep "RCU WARNING" | wc -l) -eq 0 from: inline name: env-dut-inline path: inline/env-dut.yaml
Well, yes, I could do it manually but I expected it to be one of the main use cases of LAVA to run tests on a Linux system and check if any WARNING, BUG, ..., occurred. Not all Kernel bugs trigger a complete system hang or crash.
If there is really no build-in test suite providing that functionality I wonder if all main users of LAVA (especially kernel-ci) had to implement that on their own. I would expect a lot of such lava-test- cases to exist, so a lot of code duplication...
Following the upstream first principle I should not implement and especially maintain that my own...
Florian
Hope this is an idea for you.
Best regards
Stefan
On 5/4/2023 5:44 PM, Florian Bezdeka wrote:
Hi all,
I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...
We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".
This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.
After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.
Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".
Any ideas? Did I overlook something?
Best regards, Florian
[1] https://git.lavasoftware.org/lava/lava/-/issues/576
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Hello,
Le ven. 5 mai 2023 à 15:05, Bezdeka, Florian florian.bezdeka@siemens.com a écrit :
Hi Stefan!
On Fri, 2023-05-05 at 14:21 +0200, Stefan wrote:
Hey Florian!
Yeah, communication isn't that easy, that's what I figured out too.
As far as I understood your observations are correct. With standard
setup (LAVA attached to console) it isn't easy to distinguish between script and kernel log outputs after login prompt, maybe that's why it isn't done anymore after login.
Your understanding is right: LAVA does not parse the kernel message after the boot (in fact after the shell prompt is matched).
But there's a manual workaround you can put into your jobs:
- Run 'dmesg -c' to clear the ringbuffer
- Run that test that might trigger the RCU WARNING
- Run 'dmesg' and parse the output if that warning string appears
You can also fail the job immediately by calling "lava-test-raise" helper.
So that triggering test might not fail, but parsing dmesg output will
fail if that string appears, thus your job will report a failed test.
For parsing the dmesg output you can use a inline test job definition
like this, or put something similar somewhere into your scripts:
- test: timeout: minutes: 1 definitions:
- repository: metadata: format: Lava-Test Test Definition 1.0 name: parse-dmesg-output description: "Test for RCU WARNING in kernel log" run: steps: - lava-test-case test-RCU-WARNING --shell test $(dmesg | grep
"RCU WARNING" | wc -l) -eq 0
from: inline name: env-dut-inline path: inline/env-dut.yaml
Well, yes, I could do it manually but I expected it to be one of the main use cases of LAVA to run tests on a Linux system and check if any WARNING, BUG, ..., occurred. Not all Kernel bugs trigger a complete system hang or crash.
If there is really no build-in test suite providing that functionality I wonder if all main users of LAVA (especially kernel-ci) had to implement that on their own. I would expect a lot of such lava-test- cases to exist, so a lot of code duplication...
Following the upstream first principle I should not implement and especially maintain that my own...
Florian
Hope this is an idea for you.
Best regards
Stefan
On 5/4/2023 5:44 PM, Florian Bezdeka wrote:
Hi all,
I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...
We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".
This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.
After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.
Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".
Any ideas? Did I overlook something?
Best regards, Florian
[1] https://git.lavasoftware.org/lava/lava/-/issues/576
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to
lava-users-leave@lists.lavasoftware.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
On Tue, 2023-05-16 at 09:20 +0200, Remi Duraffort wrote:
Hello,
Le ven. 5 mai 2023 à 15:05, Bezdeka, Florian florian.bezdeka@siemens.com a écrit :
Hi Stefan!
On Fri, 2023-05-05 at 14:21 +0200, Stefan wrote:
Hey Florian!
Yeah, communication isn't that easy, that's what I figured out too.
As far as I understood your observations are correct. With standard setup (LAVA attached to console) it isn't easy to distinguish between script and kernel log outputs after login prompt, maybe that's why it isn't done anymore after login.
Your understanding is right: LAVA does not parse the kernel message after the boot (in fact after the shell prompt is matched).
But there's a manual workaround you can put into your jobs:
- Run 'dmesg -c' to clear the ringbuffer
- Run that test that might trigger the RCU WARNING
- Run 'dmesg' and parse the output if that warning string appears
You can also fail the job immediately by calling "lava-test-raise" helper.
Failing the job is not my main concern. I think that a kernel warning / bug should always fail a test run. No?
I'm searching for a "generic" way of doing so. I think it doesn't make sense that all LAVA users have to implement their own dmesg or job log parser.
So that triggering test might not fail, but parsing dmesg output will fail if that string appears, thus your job will report a failed test.
For parsing the dmesg output you can use a inline test job definition like this, or put something similar somewhere into your scripts:
- test:
timeout: minutes: 1 definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: parse-dmesg-output description: "Test for RCU WARNING in kernel log" run: steps: - lava-test-case test-RCU-WARNING --shell test $(dmesg | grep "RCU WARNING" | wc -l) -eq 0 from: inline name: env-dut-inline path: inline/env-dut.yaml
Well, yes, I could do it manually but I expected it to be one of the main use cases of LAVA to run tests on a Linux system and check if any WARNING, BUG, ..., occurred. Not all Kernel bugs trigger a complete system hang or crash.
If there is really no build-in test suite providing that functionality I wonder if all main users of LAVA (especially kernel-ci) had to implement that on their own. I would expect a lot of such lava- test- cases to exist, so a lot of code duplication...
Following the upstream first principle I should not implement and especially maintain that my own...
Florian
Hope this is an idea for you.
Best regards
Stefan
On 5/4/2023 5:44 PM, Florian Bezdeka wrote:
Hi all,
I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...
We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".
This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.
After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.
Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".
Any ideas? Did I overlook something?
Best regards, Florian
[1] https://git.lavasoftware.org/lava/lava/-/issues/576
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
-- Rémi Duraffort Principal Tech Lead Automation Software Team Linaro
Le mar. 16 mai 2023 à 11:26, Florian Bezdeka florian.bezdeka@siemens.com a écrit :
On Tue, 2023-05-16 at 09:20 +0200, Remi Duraffort wrote:
Hello,
Le ven. 5 mai 2023 à 15:05, Bezdeka, Florian florian.bezdeka@siemens.com a écrit :
Hi Stefan!
On Fri, 2023-05-05 at 14:21 +0200, Stefan wrote:
Hey Florian!
Yeah, communication isn't that easy, that's what I figured out too.
As far as I understood your observations are correct. With standard setup (LAVA attached to console) it isn't easy to distinguish between script and kernel log outputs after login prompt, maybe that's why it isn't done anymore after login.
Your understanding is right: LAVA does not parse the kernel message after the boot (in fact after the shell prompt is matched).
But there's a manual workaround you can put into your jobs:
- Run 'dmesg -c' to clear the ringbuffer
- Run that test that might trigger the RCU WARNING
- Run 'dmesg' and parse the output if that warning string appears
You can also fail the job immediately by calling "lava-test-raise" helper.
Failing the job is not my main concern. I think that a kernel warning / bug should always fail a test run. No?
Right now, LAVA parses kernel messages only when booting the DUT, not after.
I'm searching for a "generic" way of doing so. I think it doesn't make sense that all LAVA users have to implement their own dmesg or job log parser.
So that triggering test might not fail, but parsing dmesg output will fail if that string appears, thus your job will report a failed test.
For parsing the dmesg output you can use a inline test job definition like this, or put something similar somewhere into your scripts:
- test: timeout: minutes: 1 definitions:
- repository: metadata: format: Lava-Test Test Definition 1.0 name: parse-dmesg-output description: "Test for RCU WARNING in kernel log" run: steps: - lava-test-case test-RCU-WARNING --shell test $(dmesg
| grep "RCU WARNING" | wc -l) -eq 0 from: inline name: env-dut-inline path: inline/env-dut.yaml
Well, yes, I could do it manually but I expected it to be one of the main use cases of LAVA to run tests on a Linux system and check if any WARNING, BUG, ..., occurred. Not all Kernel bugs trigger a complete system hang or crash.
If there is really no build-in test suite providing that functionality I wonder if all main users of LAVA (especially kernel-ci) had to implement that on their own. I would expect a lot of such lava- test- cases to exist, so a lot of code duplication...
Following the upstream first principle I should not implement and especially maintain that my own...
Florian
Hope this is an idea for you.
Best regards
Stefan
On 5/4/2023 5:44 PM, Florian Bezdeka wrote:
Hi all,
I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...
We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".
This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.
After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.
Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".
Any ideas? Did I overlook something?
Best regards, Florian
[1] https://git.lavasoftware.org/lava/lava/-/issues/576
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
-- Rémi Duraffort Principal Tech Lead Automation Software Team Linaro
lava-users@lists.lavasoftware.org