Le mar. 16 mai 2023 à 11:26, Florian Bezdeka <florian.bezdeka@siemens.com> a écrit :

On Tue, 2023-05-16 at 09:20 +0200, Remi Duraffort wrote:
> Hello,
>
>
> Le ven. 5 mai 2023 à 15:05, Bezdeka, Florian
> <florian.bezdeka@siemens.com> a écrit :
> > Hi Stefan!
> >
> > On Fri, 2023-05-05 at 14:21 +0200, Stefan wrote:
> > > Hey Florian!
> > >
> > > Yeah, communication isn't that easy, that's what I figured out
> > > too.
> > >
> > > As far as I understood your observations are correct. With
> > > standard setup (LAVA attached to console) it isn't easy to
> > > distinguish between script and kernel log outputs after login
> > > prompt, maybe that's why it isn't done anymore after login.
> >
>
>
> Your understanding is right: LAVA does not parse the kernel message
> after the boot (in fact after the shell prompt is matched).
>
>
> > >
> > > But there's a manual workaround you can put into your jobs:
> > > * Run 'dmesg -c' to clear the ringbuffer
> > > * Run that test that might trigger the RCU WARNING
> > > * Run 'dmesg' and parse the output if that warning string appears
> >
>
>
> You can also fail the job immediately by calling "lava-test-raise"
> helper.

Failing the job is not my main concern. I think that a kernel warning /
bug should always fail a test run. No?

Right now, LAVA parses kernel messages only when booting the DUT, not after.

I'm searching for a "generic" way of doing so. I think it doesn't make
sense that all LAVA users have to implement their own dmesg or job log
parser.

>
>
> > >
> > > So that triggering test might not fail, but parsing dmesg output
> > > will fail if that string appears, thus your job will report a
> > > failed test.
> > >
> > > For parsing the dmesg output you can use a inline test job
> > > definition like this, or put something similar somewhere into
> > > your scripts:
> > > - test:
> > >     timeout:
> > >       minutes: 1
> > >     definitions:
> > >     - repository:
> > >         metadata:
> > >           format: Lava-Test Test Definition 1.0
> > >           name: parse-dmesg-output
> > >           description: "Test for RCU WARNING in kernel log"
> > >         run:
> > >           steps:
> > >           - lava-test-case test-RCU-WARNING --shell test $(dmesg
> > > | grep "RCU WARNING" | wc -l) -eq 0
> > >       from: inline
> > >       name: env-dut-inline
> > >       path: inline/env-dut.yaml
> > >
> >
> > Well, yes, I could do it manually but I expected it to be one of
> > the
> > main use cases of LAVA to run tests on a Linux system and check if
> > any
> > WARNING, BUG, ..., occurred. Not all Kernel bugs trigger a complete
> > system hang or crash.
> >
> > If there is really no build-in test suite providing that
> > functionality
> > I wonder if all main users of LAVA (especially kernel-ci) had to
> > implement that on their own. I would expect a lot of such lava-
> > test-
> > cases to exist, so a lot of code duplication...
> >
> > Following the upstream first principle I should not implement and
> > especially maintain that my own...
> >
> > Florian
> >
> > >
> > > Hope this is an idea for you.
> > >
> > > Best regards
> > >
> > > Stefan
> > >
> > > On 5/4/2023 5:44 PM, Florian Bezdeka wrote:
> > > > Hi all,
> > > >
> > > > I'm basically repeating [1] here as there was no reaction for
> > > > some
> > > > months now. Maybe I used the wrong communication channel, let's
> > > > see...
> > > >
> > > > We have a testsuite that is able to trigger a RCU WARNING
> > > > inside the
> > > > Linux kernel. My expectation was that whenever a kernel warning
> > > > / oops
> > > > / call stack dump / ... occurs the LAVA job is marked as
> > > > "failed".
> > > >
> > > > This assumption seems to be wrong. It took some time to realize
> > > > that we
> > > > have a real problem as manual inspection of test logs only
> > > > happens from
> > > > time to time.
> > > >
> > > > After scanning the code my understanding is that the output of
> > > > the
> > > > connection (serial connection in my case) is only parsed during
> > > > kernel
> > > > boot (until the login action takes over). That is not
> > > > sufficient for
> > > > detecting problems that happen during test execution.
> > > >
> > > > Is there a way to scan the full log for the same patterns that
> > > > are used
> > > > by the boot action? If so, how to configure that? Whenever a
> > > > kernel
> > > > problem occurs my test run should be marked as "failed".
> > > >
> > > > Any ideas? Did I overlook something?
> > > >
> > > > Best regards,
> > > > Florian
> > > >
> > > > [1] https://git.lavasoftware.org/lava/lava/-/issues/576
> > > >
> > > > _______________________________________________
> > > > Lava-users mailing list -- lava-users@lists.lavasoftware.org
> > > > To unsubscribe send an email to
> > > > lava-users-leave@lists.lavasoftware.org
> > > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> > > >
> > > >
> > >
> > > _______________________________________________
> > > Lava-users mailing list -- lava-users@lists.lavasoftware.org
> > > To unsubscribe send an email to
> > > lava-users-leave@lists.lavasoftware.org
> > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> >
> > _______________________________________________
> > Lava-users mailing list -- lava-users@lists.lavasoftware.org
> > To unsubscribe send an email to
> > lava-users-leave@lists.lavasoftware.org
> > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
>
> --
> Rémi Duraffort
> Principal Tech Lead
> Automation Software Team
> Linaro

Rémi Duraffort

Principal Tech Lead

LAVA Tech Lead

Automation Software Team

Linaro