Re: [Lava-users] health check and lava-test-case result

7 Jan 2019


      On Thu, 3 Jan 2019 at 23:13, Milosz Wasilewski
milosz.wasilewski@linaro.org wrote:
...
On Thu, 3 Jan 2019 at 22:19, Andrei Narkevitch
Andrei.Narkevitch@cypress.com wrote:
...
Hello,
What is the rationale to ignore individual lava-test-case results when running a health check jobs?
For example, the following job failed one case:  https://validation.linaro.org/scheduler/job/1902316
A failing test case can be an indication of device malfunction (e.g. out of disk space, hw issues). Is it possible to force LAVA to fail a health check and thus put device in a “bad” state if one of the test cases is not successful?
IMHO the idea is that the health check is more directed towards
deployment/boot than actual test. If you really would like to have a
health check in which every test counts you should probably rewrite it
using lava-test-raise:
https://master.lavasoftware.org/static/docs/v2/writing-tests.html#index-8
This would terminate the health check at any failure.
milosz
I'll update the docs at
https://master.lavasoftware.org/static/docs/v2/healthchecks.html#using-lava-...
as https://git.lavasoftware.org/lava/lava/issues/196
Early on in LAVA, health checks were commonly only boot tests - if the
device deployed and booted, the infrastructure was deemed to be
working correctly.
Things have developed and now there are many infrastructure elements
which benefit from being tested inside a test action. These checks
need to be considered as "setup" checks, e.g. for external hardware or
peripheral support and what you will tend to find is that these checks
both need to be done in health checks but also as a setup phase of
lots / all test jobs on the DUT as well. Therefore, a dedicated test
definition is advised which expressly tests that essential peripherals
and other infrastructure needs to be created. Any check which does not
cause the DUT to fail to boot upon error needs one of these test
definitions. Milosz is correct, as a "setup" test definition, each one
should use lava-test-raise:
https://master.lavasoftware.org/static/docs/v2/writing-tests.html#call-lava-...
Add these setup test definitions to every test job as the first
defniitions in the test action block. It's not practical to run a
health check every time and every deployment has the potential to
affect some kind of external support, so check it and fail early
before spending time configuring and running the other test actions.
I would advise against making these checks too intrusive or
time-consuming. Also avoid testing things like available disk space
unless you *know* how much space the following test actions are going
to need. (You might be able to use test action parameters for this,
depending on your setup). If space is constrained, look at running the
test using NFS and having some kind of on-device storage as scratch
space, maybe a USB external drive. Disentangle the test requirements
from the device requirements, so that you know what you are testing.
Test only one thing at a time, break up your test structures so that
you are always within the limits of the DUT except for those few times
when you are explicitly testing the limits of the DUT (and then test
one limit at a time). At all costs, avoid testing everything in one go
- a hundred different test jobs are better than a single test job
which tries to run 100 tests and fails at test 45 - that gives you no
data at all for the majority. Careful use of setup actions,
lava-test-raise and portable test scripts is required to get to a
point where intermittent and cascading errors can be identified and
fixed. That's how labs get from a 40% failure rate to a 0.4% failure
rate.
Feel free to use lava-common to help make your test action scripts
portable: https://git.lavasoftware.org/lava/functional-tests/blob/master/testdefs/lava...
Simplest way to use lava-common is something like:
https://git.lavasoftware.org/lava/functional-tests/blob/master/testdefs/disp...
- in a custom setup script, if the individual command MUST operate
100% successfully, make it a command. If not, make it a testcase. If a
command fails for any reason, lava-test-raise is called and the test
job ends with the device in Bad health.
LAVA also makes other checks automatically and raises infrastructure
exceptions if those fail - e.g. if static_info is defined but the
defined hardware cannot be found:
https://staging.validation.linaro.org/scheduler/device/staging-hi960-hikey-0...
which can result in a health check failing:
https://staging.validation.linaro.org/scheduler/job/247199
This is another example of using lava-test-raise, this time from a
python custom script - it's not a health check because in this case,
there is one DUT with external hardware and one without.
https://staging.validation.linaro.org/scheduler/job/246700/definition
https://git.lavasoftware.org/lava/functional-tests/blob/master/testdefs/arm-...
https://git.lavasoftware.org/lava/functional-tests/blob/master/testdefs/aep-...
...
...
Thanks,
Andrei Narkevitch
Cypress Semiconductors
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.

Lava-users mailing list
Lava-users@lists.lavasoftware.org
https://lists.lavasoftware.org/mailman/listinfo/lava-users

Lava-users mailing list
Lava-users@lists.lavasoftware.org
https://lists.lavasoftware.org/mailman/listinfo/lava-users
-- 

Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Lava-users] health check and lava-test-case result