How to fail a test run on kernel warnings that happen after the boot action?

List overview All Threads
Download

newer

older

lava-publisher not publishing

How to set ssh password to...

Florian Bezdeka

4 May 2023 4 May '23

5:44 p.m.

Hi all,

I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...

We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".

This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.

After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.

Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".

Any ideas? Did I overlook something?

Best regards, Florian

[1] https://git.lavasoftware.org/lava/lava/-/issues/576

Show replies by date

Stefan

5 May 5 May

2:21 p.m.

New subject: [Lava-users] Re: How to fail a test run on kernel warnings that happen after the boot action?

Hey Florian!

Yeah, communication isn't that easy, that's what I figured out too.

As far as I understood your observations are correct. With standard setup (LAVA attached to console) it isn't easy to distinguish between script and kernel log outputs after login prompt, maybe that's why it isn't done anymore after login.

But there's a manual workaround you can put into your jobs: * Run 'dmesg -c' to clear the ringbuffer * Run that test that might trigger the RCU WARNING * Run 'dmesg' and parse the output if that warning string appears

So that triggering test might not fail, but parsing dmesg output will fail if that string appears, thus your job will report a failed test.

For parsing the dmesg output you can use a inline test job definition like this, or put something similar somewhere into your scripts: - test: timeout: minutes: 1 definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: parse-dmesg-output description: "Test for RCU WARNING in kernel log" run: steps: - lava-test-case test-RCU-WARNING --shell test $(dmesg | grep "RCU WARNING" | wc -l) -eq 0 from: inline name: env-dut-inline path: inline/env-dut.yaml

Hope this is an idea for you.

Best regards

Stefan

On 5/4/2023 5:44 PM, Florian Bezdeka wrote:

...

Hi all,

I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...

We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".

This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.

After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.

Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".

Any ideas? Did I overlook something?

Best regards, Florian

[1] https://git.lavasoftware.org/lava/lava/-/issues/576

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Bezdeka, Florian

3:05 p.m.

New subject: [Lava-users] Re: How to fail a test run on kernel warnings that happen after the boot action?

Hi Stefan!

On Fri, 2023-05-05 at 14:21 +0200, Stefan wrote:

...

Hey Florian!

Yeah, communication isn't that easy, that's what I figured out too.

As far as I understood your observations are correct. With standard setup (LAVA attached to console) it isn't easy to distinguish between script and kernel log outputs after login prompt, maybe that's why it isn't done anymore after login.

But there's a manual workaround you can put into your jobs:

Run 'dmesg -c' to clear the ringbuffer

Run that test that might trigger the RCU WARNING

Run 'dmesg' and parse the output if that warning string appears

So that triggering test might not fail, but parsing dmesg output will fail if that string appears, thus your job will report a failed test.

For parsing the dmesg output you can use a inline test job definition like this, or put something similar somewhere into your scripts:

test:

timeout: minutes: 1 definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: parse-dmesg-output description: "Test for RCU WARNING in kernel log" run: steps: - lava-test-case test-RCU-WARNING --shell test $(dmesg | grep "RCU WARNING" | wc -l) -eq 0 from: inline name: env-dut-inline path: inline/env-dut.yaml

Well, yes, I could do it manually but I expected it to be one of the main use cases of LAVA to run tests on a Linux system and check if any WARNING, BUG, ..., occurred. Not all Kernel bugs trigger a complete system hang or crash.

If there is really no build-in test suite providing that functionality I wonder if all main users of LAVA (especially kernel-ci) had to implement that on their own. I would expect a lot of such lava-test- cases to exist, so a lot of code duplication...

Following the upstream first principle I should not implement and especially maintain that my own...

Florian

...

Hope this is an idea for you.

Best regards

Stefan

On 5/4/2023 5:44 PM, Florian Bezdeka wrote:

...
Hi all,

I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...

We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".

This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.

After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.

Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".

Any ideas? Did I overlook something?

Best regards, Florian

[1] https://git.lavasoftware.org/lava/lava/-/issues/576

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Remi Duraffort

16 May 16 May

9:20 a.m.

New subject: [Lava-users] Re: How to fail a test run on kernel warnings that happen after the boot action?

Hello,

Le ven. 5 mai 2023 à 15:05, Bezdeka, Florian florian.bezdeka@siemens.com a écrit :

...

Hi Stefan!

On Fri, 2023-05-05 at 14:21 +0200, Stefan wrote:

...
Hey Florian!

Yeah, communication isn't that easy, that's what I figured out too.

As far as I understood your observations are correct. With standard

setup (LAVA attached to console) it isn't easy to distinguish between script and kernel log outputs after login prompt, maybe that's why it isn't done anymore after login.

Your understanding is right: LAVA does not parse the kernel message after the boot (in fact after the shell prompt is matched).

...

...
But there's a manual workaround you can put into your jobs:

Run 'dmesg -c' to clear the ringbuffer

Run that test that might trigger the RCU WARNING

Run 'dmesg' and parse the output if that warning string appears

You can also fail the job immediately by calling "lava-test-raise" helper.

...

...
So that triggering test might not fail, but parsing dmesg output will

fail if that string appears, thus your job will report a failed test.

...
For parsing the dmesg output you can use a inline test job definition

like this, or put something similar somewhere into your scripts:

...

test: timeout: minutes: 1 definitions:

repository: metadata: format: Lava-Test Test Definition 1.0 name: parse-dmesg-output description: "Test for RCU WARNING in kernel log" run: steps: - lava-test-case test-RCU-WARNING --shell test $(dmesg | grep

"RCU WARNING" | wc -l) -eq 0

...
  from: inline
  name: env-dut-inline
  path: inline/env-dut.yaml
Well, yes, I could do it manually but I expected it to be one of the main use cases of LAVA to run tests on a Linux system and check if any WARNING, BUG, ..., occurred. Not all Kernel bugs trigger a complete system hang or crash.

If there is really no build-in test suite providing that functionality I wonder if all main users of LAVA (especially kernel-ci) had to implement that on their own. I would expect a lot of such lava-test- cases to exist, so a lot of code duplication...

Following the upstream first principle I should not implement and especially maintain that my own...

Florian

...
Hope this is an idea for you.

Best regards

Stefan

On 5/4/2023 5:44 PM, Florian Bezdeka wrote:

...
Hi all,

I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...

We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".

This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.

After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.

Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".

Any ideas? Did I overlook something?

Best regards, Florian

[1] https://git.lavasoftware.org/lava/lava/-/issues/576

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to

lava-users-leave@lists.lavasoftware.org

...
...
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

-- Rémi Duraffort Principal Tech Lead Automation Software Team Linaro

Florian Bezdeka

11:26 a.m.

New subject: [Lava-users] Re: How to fail a test run on kernel warnings that happen after the boot action?

On Tue, 2023-05-16 at 09:20 +0200, Remi Duraffort wrote:

...

Hello,

Le ven. 5 mai 2023 à 15:05, Bezdeka, Florian florian.bezdeka@siemens.com a écrit :

...
Hi Stefan!

On Fri, 2023-05-05 at 14:21 +0200, Stefan wrote:

...
Hey Florian!

Yeah, communication isn't that easy, that's what I figured out too.

As far as I understood your observations are correct. With standard setup (LAVA attached to console) it isn't easy to distinguish between script and kernel log outputs after login prompt, maybe that's why it isn't done anymore after login.

Your understanding is right: LAVA does not parse the kernel message after the boot (in fact after the shell prompt is matched).

...
...
But there's a manual workaround you can put into your jobs:

Run 'dmesg -c' to clear the ringbuffer

Run that test that might trigger the RCU WARNING

Run 'dmesg' and parse the output if that warning string appears

You can also fail the job immediately by calling "lava-test-raise" helper.

Failing the job is not my main concern. I think that a kernel warning / bug should always fail a test run. No?

I'm searching for a "generic" way of doing so. I think it doesn't make sense that all LAVA users have to implement their own dmesg or job log parser.

...

...
...
So that triggering test might not fail, but parsing dmesg output will fail if that string appears, thus your job will report a failed test.

For parsing the dmesg output you can use a inline test job definition like this, or put something similar somewhere into your scripts:

test:

timeout: minutes: 1 definitions: - repository: metadata: format: Lava-Test Test Definition 1.0 name: parse-dmesg-output description: "Test for RCU WARNING in kernel log" run: steps: - lava-test-case test-RCU-WARNING --shell test $(dmesg | grep "RCU WARNING" | wc -l) -eq 0 from: inline name: env-dut-inline path: inline/env-dut.yaml

Well, yes, I could do it manually but I expected it to be one of the main use cases of LAVA to run tests on a Linux system and check if any WARNING, BUG, ..., occurred. Not all Kernel bugs trigger a complete system hang or crash.

If there is really no build-in test suite providing that functionality I wonder if all main users of LAVA (especially kernel-ci) had to implement that on their own. I would expect a lot of such lava- test- cases to exist, so a lot of code duplication...

Following the upstream first principle I should not implement and especially maintain that my own...

Florian

...
Hope this is an idea for you.

Best regards

Stefan

On 5/4/2023 5:44 PM, Florian Bezdeka wrote:

...
Hi all,

I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...

We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".

This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.

After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.

Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".

Any ideas? Did I overlook something?

Best regards, Florian

[1] https://git.lavasoftware.org/lava/lava/-/issues/576

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

-- Rémi Duraffort Principal Tech Lead Automation Software Team Linaro

Remi Duraffort

18 Dec 18 Dec

10:59 a.m.

New subject: [Lava-users] Re: How to fail a test run on kernel warnings that happen after the boot action?

Le mar. 16 mai 2023 à 11:26, Florian Bezdeka florian.bezdeka@siemens.com a écrit :

...

On Tue, 2023-05-16 at 09:20 +0200, Remi Duraffort wrote:

...
Hello,

Le ven. 5 mai 2023 à 15:05, Bezdeka, Florian florian.bezdeka@siemens.com a écrit :

...
Hi Stefan!

On Fri, 2023-05-05 at 14:21 +0200, Stefan wrote:

...
Hey Florian!

Yeah, communication isn't that easy, that's what I figured out too.

As far as I understood your observations are correct. With standard setup (LAVA attached to console) it isn't easy to distinguish between script and kernel log outputs after login prompt, maybe that's why it isn't done anymore after login.

Your understanding is right: LAVA does not parse the kernel message after the boot (in fact after the shell prompt is matched).

...
...
But there's a manual workaround you can put into your jobs:

Run 'dmesg -c' to clear the ringbuffer

Run that test that might trigger the RCU WARNING

Run 'dmesg' and parse the output if that warning string appears

You can also fail the job immediately by calling "lava-test-raise" helper.

Failing the job is not my main concern. I think that a kernel warning / bug should always fail a test run. No?

Right now, LAVA parses kernel messages only when booting the DUT, not after.

...

I'm searching for a "generic" way of doing so. I think it doesn't make sense that all LAVA users have to implement their own dmesg or job log parser.

...
...
...
So that triggering test might not fail, but parsing dmesg output will fail if that string appears, thus your job will report a failed test.

For parsing the dmesg output you can use a inline test job definition like this, or put something similar somewhere into your scripts:

test: timeout: minutes: 1 definitions:

repository: metadata: format: Lava-Test Test Definition 1.0 name: parse-dmesg-output description: "Test for RCU WARNING in kernel log" run: steps: - lava-test-case test-RCU-WARNING --shell test $(dmesg

| grep "RCU WARNING" | wc -l) -eq 0 from: inline name: env-dut-inline path: inline/env-dut.yaml

Well, yes, I could do it manually but I expected it to be one of the main use cases of LAVA to run tests on a Linux system and check if any WARNING, BUG, ..., occurred. Not all Kernel bugs trigger a complete system hang or crash.

If there is really no build-in test suite providing that functionality I wonder if all main users of LAVA (especially kernel-ci) had to implement that on their own. I would expect a lot of such lava- test- cases to exist, so a lot of code duplication...

Following the upstream first principle I should not implement and especially maintain that my own...

Florian

...
Hope this is an idea for you.

Best regards

Stefan

On 5/4/2023 5:44 PM, Florian Bezdeka wrote:

...
Hi all,

I'm basically repeating [1] here as there was no reaction for some months now. Maybe I used the wrong communication channel, let's see...

We have a testsuite that is able to trigger a RCU WARNING inside the Linux kernel. My expectation was that whenever a kernel warning / oops / call stack dump / ... occurs the LAVA job is marked as "failed".

This assumption seems to be wrong. It took some time to realize that we have a real problem as manual inspection of test logs only happens from time to time.

After scanning the code my understanding is that the output of the connection (serial connection in my case) is only parsed during kernel boot (until the login action takes over). That is not sufficient for detecting problems that happen during test execution.

Is there a way to scan the full log for the same patterns that are used by the boot action? If so, how to configure that? Whenever a kernel problem occurs my test run should be marked as "failed".

Any ideas? Did I overlook something?

Best regards, Florian

[1] https://git.lavasoftware.org/lava/lava/-/issues/576

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Lava-users mailing list -- lava-users@lists.lavasoftware.org To unsubscribe send an email to lava-users-leave@lists.lavasoftware.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

-- Rémi Duraffort Principal Tech Lead Automation Software Team Linaro

-- Rémi Duraffort Principal Tech Lead LAVA Tech Lead Automation Software Team Linaro

778

days inactive

1006

days old

lava-users@lists.lavasoftware.org

5 comments

participants

tags (0)

participants (4)

Bezdeka, Florian
Florian Bezdeka
Remi Duraffort
Stefan