Hi. all
I am struggling with situation that one of multinode job stuck in scheduling status forever. [image: image.png] It occurs 1 in 10 times. Sometimes one of the multinode job stuck in 'scheduling' status and the other job goes timeout waiting first multinode job.
- lava-scheduler log [image: image.png]
when this issue occur, the lava-scheduler's log indicate only one of multinode job scheduled, and the other's not.
- lava-dispatcher log [image: image.png] and when issue occur, lava-dispatcher's log seems only one job has triggered.
I am using lava-server and lava-dispatcher with docker instance (version 2023.08) It occur in 2023.06 too. It seems the issue related lava-scheduler. What should i check for resolve this issue? Please advise.
Thank you
lava-users@lists.lavasoftware.org