Re: [Lava-users] Deadlock in scheduler

20 Apr 2018


      On 20 April 2018 at 07:56, Neil Williams neil.williams@linaro.org wrote:
...
On 19 April 2018 at 20:11, Quentin Schulz quentin.schulz@bootlin.com
wrote:
...
Hi all,
I've encountered a deadlock in my LAVA server with the following scheme.
I have an at91rm9200ek in my lab that got submitted a lot of multi-node
...
jobs requesting an other "board" (a laptop of type dummy-ssh).
All of my other boards in the lab have received the same multi-node jobs
requesting the same and only laptop.
That is the source of the resource starvation - multiple requirements of a
single device. The scheduler needs to be greedy and grab whatever suitable
devices it can as soon as it can to be able to run MultiNode. The primary
ordering of scheduling is the Test Job ID which is determined at submission.
If you have an imbalance between the number of machines which can be
available and then submit MultiNode jobs which all rely on the starved
resource, there is not much LAVA can do currently. We are looking at a way
to reschedule MultiNode test jobs but it is very complex and low priority.
What version of lava-server and lava-dispatcher are you running?
What is the structure of your current lab?
MultiNode is complex - not just at the test job synchronization level but
also at the lab structure / administrative level.
...
I had to take the at91rm9200ek out of the lab because it was behaving.
...
However, LAVA is still scheduling multi-node jobs on the laptop which
requires the at91rm9200ek as the other part of the job, while its status
is clearly Maintenance.
A device in Maintenance is still available for scheduling - only Retired
is excluded - test jobs submitted to a Retired device are rejected.
Once a test job has been submitted, it will be either scheduled or
cancelled.
There is something we can improve here though - the current UI describes
Health Bad and Health Maintenance as "no submissions possible" when what it
should say is "no test jobs can be scheduled". The difference is
important...
https://projects.linaro.org/browse/LAVA-1299
...
Now, until I put the at91rm9200ek back in the lab, all my boards are
...
reserved and scheduling for a multi-node job and thus, my lab is
basically dead.
The correct fix here is to have enough devices of the device-type of the
starved resource such that one of each other device-type can use that
resource simultaneously and then use device-tags to match up groups of
devices so that submitting lots of jobs for one type all at the same time
does not simply consume all of the available resources.
e.g. four device-types - phone, hikey, qemu and panda. Each multinode job
wants a single QEMU with each of the others, so the QEMU type becomes
starved, depending on how jobs are submitted. If two hikey-qemu jobs are
submitted together, then 1 QEMU gets scheduled, waiting for the hikey to
become free after running the first job. If each QEMU has device-tags, then
the second hikey-qemu job will wait not only for the hikey but will also
wait for the one QEMU which has the hikey device tag. This way, only those
jobs would then wait for a QEMU device. There would be three QEMU devices,
one with a device tag like "phone", one with "hikey" and one with "panda".
If another panda device is added, another QEMU with the "panda" device tag
would be required. The number of QEMU devices required is the sum of the
number of devices of each other device-type which may be required in a
MultiNode test job.
This is a structural problem within your lab.
You would need one "laptop" for each other device-type which can use that
device-type in your lab. Then each "laptop" gets unique a device-tag . Each
test job for at91rm9200ek must specify that the "laptop" device must have
the matching device tag. Each test job for each other device-type uses the
matching device-tag for that device-type. We had this problem in the
Harston lab for a long time when using V1 and had to implement just such a
structure of matched devices and device tags. However, the need for this
disappeared when the Harston lab transitioned all devices and test jobs to
LAVA V2.
...
Let me know if I can be of any help debugging this thing or testing a
possible fix. I'd have a look at the scheduler but you, obviously
knowing the code base way better than I do, might have a quick patch on
hand.
Patches would be a bad solution for a structural problem.
As a different approach, why do you need MultiNode with a "laptop" type
device in the first place? Can the test jobs be reconfigured to use LXC
which does not use MultiNode? What is the "laptop" device-type doing that
cannot be done in an LXC? LXC is created on-the-fly, one for each device,
when the test job requests one. This solved the resource starvation problem
with the majority of MultiNode issues because the work previously done in
the generic QEMU / "laptop" role can just as easily be done in an LXC.
What you are describing sounds like a misuse of MultiNode resulting in
resource starvation and the fix is to have enough of the limited resource
to prevent starvation - either by adding hardware and changing the current
test jobs to use device-tags or relocating the work done on the starved
resource into an LXC so that every device can have a dedicated "container"
to do things which cannot be easily done on the device.
...
Best regards,
Quentin

Lava-users mailing list
Lava-users@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/
-- 

Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Lava-users] Deadlock in scheduler

Neil Williams