Neil Williams neil.williams@linaro.org writes:
On 23 May 2018 at 15:36, Robert Marshall robert.marshall@codethink.co.uk wrote:
At some point last week - I think because of network connectivity issues a job got stuck and I I cancelled it, it when run again it again appeared to hang. I again cancelled it and am now seeing the health check not start (at least no output appears on the job's webspage.
What is the status of the relevant device(s) and any associated test jobs?
The status of the device was Bad - as the problems with the device have now resolved maybe it is hard to diagnose further? But adding below what I can see.
Check the /var/log/lava-server/lava-master.log for the reasons why the device is not being assigned.
I think this was when it was failing rather than when I was cancelling it.
2018-05-23 14:26:18,620 ERROR [32] Error: b'Traceback (most recent call last): File "/usr/bin/lava-run", line 246, in <module> sys.exit(main()) File "/usr/bin/lava-run", line 233, in main logger.close() # pylint: disable=no-member File "/usr/lib/python3/dist-packages/lava_dispatcher/log.py", line 87, in close self.handler.close(linger) File "/usr/lib/python3/dist-packages/lava_dispatcher/log.py", line 71, in close self.context.destroy(linger=linger) File "zmq/backend/cython/context.pyx", line 244, in zmq.backend.cython.context.Context.destroy (zmq/backend/cython/context.c:3067) File "zmq/backend/cython/context.pyx", line 136, in zmq.backend.cython.context.Context.term (zmq/backend/cython/context.c:2348) File "zmq/backend/cython/checkrc.pxd", line 12, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/context.c:3216) File "/usr/bin/lava-run", line 151, in cancelling_handler raise JobCanceled("The job was canceled") lava_dispatcher.action.JobCanceled: The job was canceled '
Though this is maybe more interesting?:
2018-05-23 14:26:18,655 ERROR [32] Unable to dump 'description.yaml' 2018-05-23 14:26:18,655 ERROR [32] Compressed data ended before the end-of-stream marker was reached Traceback (most recent call last): File "/usr/lib/python3/dist-packages/lava_server/management/commands/lava-master.py", line 333, in _handle_end description = lzma.decompress(compressed_description) File "/usr/lib/python3.5/lzma.py", line 340, in decompress raise LZMAError("Compressed data ended before the " _lzma.LZMAError: Compressed data ended before the end-of-stream marker was reached
Check the status of all daemons, including lava-logs
WARNING lava-logs is offline: can't schedule jobs
sudo service lava-master status sudo service lava-logs status sudo service lava-slave status Looking at the output.yaml (in /var/lib/lava-server/default/media/job-output/2018/05/23/32 ) I see ... progress output for downloading https://images.validation.linaro.org/kvm/standard/stretch-2.img.gz
- {"dt": "2018-05-23T07:39:54.728015", "lvl": "debug", "msg": "[common] Preparing overlay tarball in
/var/lib/lava/dispatcher/tmp/32/lava-overlay-aye3n2ke"}
- {"dt":
- "2018-05-23T07:39:54.728root@stretch:/var/lib/lava-server/default/media/job-output/2018/05/23/32
But none of this appears in http://localhost:8080/scheduler/job/32
and at the head of that page I see the message:
Unable to parse invalid logs: This is maybe a bug in LAVA that should be reported.
which other logs are best for checking whether this is an error that should be fed back?
(LAVA 2018.4)
Robert _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users