Neil Williams neil.williams@linaro.org writes:
On 16 August 2017 at 09:06, Robert Marshall robert.marshall@codethink.co.uk wrote:
Neil Williams neil.williams@linaro.org writes:
On 15 August 2017 at 11:27, Robert Marshall robert.marshall@codethink.co.uk wrote:
Neil Williams neil.williams@linaro.org writes:
Neil
Thanks for this response, comments below:
Typo in my example.
On 15 August 2017 at 10:04, Robert Marshall robert.marshall@codethink.co.uk wrote:
Hi,
I've got 2 jobs stuck in canceled mode which are preventing any other job from running.
I'm running lava (2017-7) in a VM and have tried rebooting the VM to clear the issue but without success (ie the jobs still block the queue).
an extract from /var/log/lava-server/django.log is attached
That's reporting that there was no start_time or end_time associated with the test job. It sounds like a bug but I'm not clear on how to reproduce it. For now, you can modify the test job to have a start and end time. Try: $ sudo lava-server manage shell
import time from lava_scheduler_app.models import TestJob job = TestJob.objects.filter(id=123456)
job = TestJob.objects.get(id=123456)
or
job = TestJob.objects.filter(id=123456)[0]
Filter returns a QuerySet (which is a list).
job.start_time job.start_time = time.now() job.end_time job.end_time = time.now() job.save()
i.e. if there is no start time, modify both start and end time. If there's no end_time, just add an end_time.
Thanks - both jobs had no start time but did have an end time - because I cancelled before the job started? I set the start_time = end_time which has got rid of the '500' error
But the queue still appears jammed - there's a health check with status 'Submitted' and it doesn't get any further - jobs before it in the list are either Incomplete or Canceled.
There's nothing in django.log
Robert
OK, this is now a standard scheduling issue - check the scheduling logs. It sounds like one or more devices still have a current job set. This can be fixed up in the django admin interface. Also check the status of each of the daemons. The python traceback in django.log is now fixed but that event probably interrupted one of the cleanup actions within the scheduler.
This is in the admin docs: https://validation.linaro.org/static/docs/v2/simple-admin.html#log-files and https://validation.linaro.org/static/docs/v2/pipeline-debug.html
I suspect this was the initial problem
sudo lava-server manage check --deploy .... bbb01: Invalid configuration
I attempted to comment out some device dictionary stuff but it doesn't appear to have liked the syntax. I've now put back something better that doesn't give that error.
I've been through the django admin and can't see where the current job is set (if indeed it is)
http://vmhost:port/admin/lava_scheduler_app/device/ has no current job for one of the devices derived from that device type I've created another device and that shows as submitted but isn't running
The logs don't flag up anything and I've rebooted the VM. I can run health checks on another device type I've just created
Robert