Hi,
I've got 2 jobs stuck in canceled mode which are preventing any other job from running.
I'm running lava (2017-7) in a VM and have tried rebooting the VM to clear the issue but without success (ie the jobs still block the queue).
an extract from /var/log/lava-server/django.log is attached
I get this 500 error when viewing the results for the job
Is there a manual way of clearing this? The health check has notification associated with it (and set to verbose) and every time I reboot I get an email and irc saying that it's finished!
Robert
On 15 August 2017 at 10:04, Robert Marshall robert.marshall@codethink.co.uk wrote:
Hi,
I've got 2 jobs stuck in canceled mode which are preventing any other job from running.
I'm running lava (2017-7) in a VM and have tried rebooting the VM to clear the issue but without success (ie the jobs still block the queue).
an extract from /var/log/lava-server/django.log is attached
That's reporting that there was no start_time or end_time associated with the test job. It sounds like a bug but I'm not clear on how to reproduce it. For now, you can modify the test job to have a start and end time. Try: $ sudo lava-server manage shell
import time from lava_scheduler_app.models import TestJob job = TestJob.objects.filter(id=123456) job.start_time job.start_time = time.now() job.end_time job.end_time = time.now() job.save()
i.e. if there is no start time, modify both start and end time. If there's no end_time, just add an end_time.
I get this 500 error when viewing the results for the job
Is there a manual way of clearing this? The health check has notification associated with it (and set to verbose) and every time I reboot I get an email and irc saying that it's finished!
Robert
Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
Neil Williams neil.williams@linaro.org writes:
Neil
Thanks for this response, comments below:
On 15 August 2017 at 10:04, Robert Marshall robert.marshall@codethink.co.uk wrote:
Hi,
I've got 2 jobs stuck in canceled mode which are preventing any other job from running.
I'm running lava (2017-7) in a VM and have tried rebooting the VM to clear the issue but without success (ie the jobs still block the queue).
an extract from /var/log/lava-server/django.log is attached
That's reporting that there was no start_time or end_time associated with the test job. It sounds like a bug but I'm not clear on how to reproduce it. For now, you can modify the test job to have a start and end time. Try: $ sudo lava-server manage shell
import time from lava_scheduler_app.models import TestJob job = TestJob.objects.filter(id=123456) job.start_time job.start_time = time.now() job.end_time job.end_time = time.now() job.save()
i.e. if there is no start time, modify both start and end time. If there's no end_time, just add an end_time.
Something appears deeply garbled here in the lava structures, I assume the id for the filter is the one in the recent jobs list rather than some internal value
job = TestJob.objects.filter(id=24) job.start_time Traceback (most recent call last): AttributeError: 'RestrictedTestJobQuerySet' object has no attribute 'start_time'
type(job) <class 'lava_scheduler_app.managers.RestrictedTestJobQuerySet'>
and there's no save function either
To attempt to remember the sequence of operations
- I updated the data dictionary - the health check had been logging that it wasn't using soft reboot which 2016 was doing so I made a change - prob wrong - to try to add that. - That job started but showed no output, I left it for 10 mins or so - I cancelled it - I then put back the data dictionary as it was and submitted another health check - that queued so I decided to reboot
Robert
I get this 500 error when viewing the results for the job
Is there a manual way of clearing this? The health check has notification associated with it (and set to verbose) and every time I reboot I get an email and irc saying that it's finished!
Robert
Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
On 15 August 2017 at 11:27, Robert Marshall robert.marshall@codethink.co.uk wrote:
Neil Williams neil.williams@linaro.org writes:
Neil
Thanks for this response, comments below:
Typo in my example.
On 15 August 2017 at 10:04, Robert Marshall robert.marshall@codethink.co.uk wrote:
Hi,
I've got 2 jobs stuck in canceled mode which are preventing any other job from running.
I'm running lava (2017-7) in a VM and have tried rebooting the VM to clear the issue but without success (ie the jobs still block the queue).
an extract from /var/log/lava-server/django.log is attached
That's reporting that there was no start_time or end_time associated with the test job. It sounds like a bug but I'm not clear on how to reproduce it. For now, you can modify the test job to have a start and end time. Try: $ sudo lava-server manage shell
import time from lava_scheduler_app.models import TestJob job = TestJob.objects.filter(id=123456)
job = TestJob.objects.get(id=123456)
or
job = TestJob.objects.filter(id=123456)[0]
Filter returns a QuerySet (which is a list).
job.start_time job.start_time = time.now() job.end_time job.end_time = time.now() job.save()
i.e. if there is no start time, modify both start and end time. If there's no end_time, just add an end_time.
Something appears deeply garbled here in the lava structures, I assume the id for the filter is the one in the recent jobs list rather than some internal value
job = TestJob.objects.filter(id=24) job.start_time Traceback (most recent call last): AttributeError: 'RestrictedTestJobQuerySet' object has no attribute 'start_time'
type(job) <class 'lava_scheduler_app.managers.RestrictedTestJobQuerySet'>
and there's no save function either
To attempt to remember the sequence of operations
- I updated the data dictionary - the health check had been logging that it wasn't using soft reboot which 2016 was doing so I made a change - prob wrong - to try to add that.
- That job started but showed no output, I left it for 10 mins or so
- I cancelled it
- I then put back the data dictionary as it was and submitted another health check
- that queued so I decided to reboot
Robert
I get this 500 error when viewing the results for the job
Is there a manual way of clearing this? The health check has notification associated with it (and set to verbose) and every time I reboot I get an email and irc saying that it's finished!
Robert
Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
Neil Williams neil.williams@linaro.org writes:
On 15 August 2017 at 11:27, Robert Marshall robert.marshall@codethink.co.uk wrote:
Neil Williams neil.williams@linaro.org writes:
Neil
Thanks for this response, comments below:
Typo in my example.
On 15 August 2017 at 10:04, Robert Marshall robert.marshall@codethink.co.uk wrote:
Hi,
I've got 2 jobs stuck in canceled mode which are preventing any other job from running.
I'm running lava (2017-7) in a VM and have tried rebooting the VM to clear the issue but without success (ie the jobs still block the queue).
an extract from /var/log/lava-server/django.log is attached
That's reporting that there was no start_time or end_time associated with the test job. It sounds like a bug but I'm not clear on how to reproduce it. For now, you can modify the test job to have a start and end time. Try: $ sudo lava-server manage shell
import time from lava_scheduler_app.models import TestJob job = TestJob.objects.filter(id=123456)
job = TestJob.objects.get(id=123456)
or
job = TestJob.objects.filter(id=123456)[0]
Filter returns a QuerySet (which is a list).
job.start_time job.start_time = time.now() job.end_time job.end_time = time.now() job.save()
i.e. if there is no start time, modify both start and end time. If there's no end_time, just add an end_time.
Thanks - both jobs had no start time but did have an end time - because I cancelled before the job started? I set the start_time = end_time which has got rid of the '500' error
But the queue still appears jammed - there's a health check with status 'Submitted' and it doesn't get any further - jobs before it in the list are either Incomplete or Canceled.
There's nothing in django.log
Robert
Something appears deeply garbled here in the lava structures, I assume the id for the filter is the one in the recent jobs list rather than some internal value
job = TestJob.objects.filter(id=24) job.start_time Traceback (most recent call last): AttributeError: 'RestrictedTestJobQuerySet' object has no attribute 'start_time'
type(job) <class 'lava_scheduler_app.managers.RestrictedTestJobQuerySet'>
and there's no save function either
To attempt to remember the sequence of operations
- I updated the data dictionary - the health check had been logging
that it wasn't using soft reboot which 2016 was doing so I made a change - prob wrong - to try to add that.
- That job started but showed no output, I left it for 10 mins or so
- I cancelled it
- I then put back the data dictionary as it was and submitted another health check
- that queued so I decided to reboot
Robert
I get this 500 error when viewing the results for the job
Is there a manual way of clearing this? The health check has notification associated with it (and set to verbose) and every time I reboot I get an email and irc saying that it's finished!
Robert
Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
On 16 August 2017 at 09:06, Robert Marshall robert.marshall@codethink.co.uk wrote:
Neil Williams neil.williams@linaro.org writes:
On 15 August 2017 at 11:27, Robert Marshall robert.marshall@codethink.co.uk wrote:
Neil Williams neil.williams@linaro.org writes:
Neil
Thanks for this response, comments below:
Typo in my example.
On 15 August 2017 at 10:04, Robert Marshall robert.marshall@codethink.co.uk wrote:
Hi,
I've got 2 jobs stuck in canceled mode which are preventing any other job from running.
I'm running lava (2017-7) in a VM and have tried rebooting the VM to clear the issue but without success (ie the jobs still block the queue).
an extract from /var/log/lava-server/django.log is attached
That's reporting that there was no start_time or end_time associated with the test job. It sounds like a bug but I'm not clear on how to reproduce it. For now, you can modify the test job to have a start and end time. Try: $ sudo lava-server manage shell
import time from lava_scheduler_app.models import TestJob job = TestJob.objects.filter(id=123456)
job = TestJob.objects.get(id=123456)
or
job = TestJob.objects.filter(id=123456)[0]
Filter returns a QuerySet (which is a list).
job.start_time job.start_time = time.now() job.end_time job.end_time = time.now() job.save()
i.e. if there is no start time, modify both start and end time. If there's no end_time, just add an end_time.
Thanks - both jobs had no start time but did have an end time - because I cancelled before the job started? I set the start_time = end_time which has got rid of the '500' error
But the queue still appears jammed - there's a health check with status 'Submitted' and it doesn't get any further - jobs before it in the list are either Incomplete or Canceled.
There's nothing in django.log
Robert
OK, this is now a standard scheduling issue - check the scheduling logs. It sounds like one or more devices still have a current job set. This can be fixed up in the django admin interface. Also check the status of each of the daemons. The python traceback in django.log is now fixed but that event probably interrupted one of the cleanup actions within the scheduler.
This is in the admin docs: https://validation.linaro.org/static/docs/v2/simple-admin.html#log-files and https://validation.linaro.org/static/docs/v2/pipeline-debug.html
Something appears deeply garbled here in the lava structures, I assume the id for the filter is the one in the recent jobs list rather than some internal value
job = TestJob.objects.filter(id=24) job.start_time Traceback (most recent call last): AttributeError: 'RestrictedTestJobQuerySet' object has no attribute 'start_time'
type(job) <class 'lava_scheduler_app.managers.RestrictedTestJobQuerySet'>
and there's no save function either
To attempt to remember the sequence of operations
- I updated the data dictionary - the health check had been logging
that it wasn't using soft reboot which 2016 was doing so I made a change - prob wrong - to try to add that.
- That job started but showed no output, I left it for 10 mins or so
- I cancelled it
- I then put back the data dictionary as it was and submitted another health check
- that queued so I decided to reboot
Robert
I get this 500 error when viewing the results for the job
Is there a manual way of clearing this? The health check has notification associated with it (and set to verbose) and every time I reboot I get an email and irc saying that it's finished!
Robert
Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
Neil Williams neil.williams@linaro.org writes:
On 16 August 2017 at 09:06, Robert Marshall robert.marshall@codethink.co.uk wrote:
Neil Williams neil.williams@linaro.org writes:
On 15 August 2017 at 11:27, Robert Marshall robert.marshall@codethink.co.uk wrote:
Neil Williams neil.williams@linaro.org writes:
Neil
Thanks for this response, comments below:
Typo in my example.
On 15 August 2017 at 10:04, Robert Marshall robert.marshall@codethink.co.uk wrote:
Hi,
I've got 2 jobs stuck in canceled mode which are preventing any other job from running.
I'm running lava (2017-7) in a VM and have tried rebooting the VM to clear the issue but without success (ie the jobs still block the queue).
an extract from /var/log/lava-server/django.log is attached
That's reporting that there was no start_time or end_time associated with the test job. It sounds like a bug but I'm not clear on how to reproduce it. For now, you can modify the test job to have a start and end time. Try: $ sudo lava-server manage shell
import time from lava_scheduler_app.models import TestJob job = TestJob.objects.filter(id=123456)
job = TestJob.objects.get(id=123456)
or
job = TestJob.objects.filter(id=123456)[0]
Filter returns a QuerySet (which is a list).
job.start_time job.start_time = time.now() job.end_time job.end_time = time.now() job.save()
i.e. if there is no start time, modify both start and end time. If there's no end_time, just add an end_time.
Thanks - both jobs had no start time but did have an end time - because I cancelled before the job started? I set the start_time = end_time which has got rid of the '500' error
But the queue still appears jammed - there's a health check with status 'Submitted' and it doesn't get any further - jobs before it in the list are either Incomplete or Canceled.
There's nothing in django.log
Robert
OK, this is now a standard scheduling issue - check the scheduling logs. It sounds like one or more devices still have a current job set. This can be fixed up in the django admin interface. Also check the status of each of the daemons. The python traceback in django.log is now fixed but that event probably interrupted one of the cleanup actions within the scheduler.
This is in the admin docs: https://validation.linaro.org/static/docs/v2/simple-admin.html#log-files and https://validation.linaro.org/static/docs/v2/pipeline-debug.html
I suspect this was the initial problem
sudo lava-server manage check --deploy .... bbb01: Invalid configuration
I attempted to comment out some device dictionary stuff but it doesn't appear to have liked the syntax. I've now put back something better that doesn't give that error.
I've been through the django admin and can't see where the current job is set (if indeed it is)
http://vmhost:port/admin/lava_scheduler_app/device/ has no current job for one of the devices derived from that device type I've created another device and that shows as submitted but isn't running
The logs don't flag up anything and I've rebooted the VM. I can run health checks on another device type I've just created
Robert
lava-users@lists.lavasoftware.org