Hi. all

I am struggling with situation that one of multinode job stuck in scheduling status forever.
image.png
It occurs 1 in 10 times.
Sometimes one of the multinode job stuck in 'scheduling' status and the other job goes timeout waiting first multinode job.

- lava-scheduler log
image.png

when this  issue occur, the lava-scheduler's log indicate only one of multinode job scheduled, and the other's not.

- lava-dispatcher log
image.png
and when issue occur, lava-dispatcher's log seems only one job has triggered.

I am using lava-server and lava-dispatcher with docker instance (version 2023.08)
It occur in 2023.06 too.
It seems the issue related lava-scheduler. What should i check for resolve this issue?
Please advise.

Thank you