Hi all,
I've encountered a deadlock in my LAVA server with the following scheme.
I have an at91rm9200ek in my lab that got submitted a lot of multi-node jobs requesting an other "board" (a laptop of type dummy-ssh). All of my other boards in the lab have received the same multi-node jobs requesting the same and only laptop.
I had to take the at91rm9200ek out of the lab because it was behaving.
However, LAVA is still scheduling multi-node jobs on the laptop which requires the at91rm9200ek as the other part of the job, while its status is clearly Maintenance.
Now, until I put the at91rm9200ek back in the lab, all my boards are reserved and scheduling for a multi-node job and thus, my lab is basically dead.
Let me know if I can be of any help debugging this thing or testing a possible fix. I'd have a look at the scheduler but you, obviously knowing the code base way better than I do, might have a quick patch on hand.
Best regards,
Quentin