Re: [Lava-users] QEMU health check apparently not running

29 May 2018

      Neil Williams neil.williams@linaro.org writes:
...
On 23 May 2018 at 15:36, Robert Marshall robert.marshall@codethink.co.uk wrote:
At some point last week - I think because of network connectivity issues
 a job got stuck and I I cancelled it, it when run again it again appeared to hang. I again
 cancelled it and am now seeing the health check not start (at least no
 output appears on the job's webspage.
What is the status of the relevant device(s) and any associated test jobs?
The status of the device was Bad - as the problems with the device have now 
resolved maybe it is hard to diagnose further? But adding below what I
can see.
...
Check the /var/log/lava-server/lava-master.log for the reasons why the device is not being assigned.
I think this was when it was failing rather than when I was cancelling it.
2018-05-23 14:26:18,620   ERROR [32] Error: b'Traceback (most recent
  call last):
  File "/usr/bin/lava-run", line 246, in <module>
  sys.exit(main())
  File "/usr/bin/lava-run", line 233, in main
  logger.close()  # pylint: disable=no-member
  File "/usr/lib/python3/dist-packages/lava_dispatcher/log.py", line 87,
  in close
  self.handler.close(linger)
  File "/usr/lib/python3/dist-packages/lava_dispatcher/log.py", line 71,
  in close
  self.context.destroy(linger=linger)
  File "zmq/backend/cython/context.pyx", line 244, in
  zmq.backend.cython.context.Context.destroy
  (zmq/backend/cython/context.c:3067)
  File "zmq/backend/cython/context.pyx", line 136, in
  zmq.backend.cython.context.Context.term
  (zmq/backend/cython/context.c:2348)
  File "zmq/backend/cython/checkrc.pxd", line 12, in
  zmq.backend.cython.checkrc._check_rc
  (zmq/backend/cython/context.c:3216)
  File "/usr/bin/lava-run", line 151, in cancelling_handler
  raise JobCanceled("The job was canceled")
  lava_dispatcher.action.JobCanceled: The job was canceled
  '
Though this is maybe more interesting?:
2018-05-23 14:26:18,655   ERROR [32] Unable to dump 'description.yaml'
2018-05-23 14:26:18,655   ERROR [32] Compressed data ended before the end-of-stream marker was reached
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/lava_server/management/commands/lava-master.py", line 333, in _handle_end
    description = lzma.decompress(compressed_description)
  File "/usr/lib/python3.5/lzma.py", line 340, in decompress
    raise LZMAError("Compressed data ended before the "
_lzma.LZMAError: Compressed data ended before the end-of-stream marker was reached
...
Check the status of all daemons, including lava-logs
WARNING lava-logs is offline: can't schedule jobs
...
sudo service lava-master status
sudo service lava-logs status
sudo service lava-slave status

 Looking at the output.yaml  (in /var/lib/lava-server/default/media/job-output/2018/05/23/32 ) I see
 ... progress output for downloading https://images.validation.linaro.org/kvm/standard/stretch-2.img.gz

{"dt": "2018-05-23T07:39:54.728015", "lvl": "debug", "msg": "[common] Preparing overlay tarball in

/var/lib/lava/dispatcher/tmp/32/lava-overlay-aye3n2ke"}

{"dt":
"2018-05-23T07:39:54.728root@stretch:/var/lib/lava-server/default/media/job-output/2018/05/23/32

But none of this appears in http://localhost:8080/scheduler/job/32
and at the head of that page I see the message:
Unable to parse invalid logs: This is maybe a bug in LAVA that should be reported.
which other logs are best for checking whether this is an error that
 should be fed back?
(LAVA 2018.4)
Robert
 _______________________________________________
 Lava-users mailing list
 Lava-users@lists.linaro.org
 https://lists.linaro.org/mailman/listinfo/lava-users
-- 
Robert Marshall, Software Developer                       Codethink Ltd
Telephone: +44 7762 840 414       3rd Floor, Dale House, 35 Dale Street
https://www.codethink.co.uk/         MANCHESTER, M1 2HF. United Kingdom
We respect your privacy.   See https://www.codethink.co.uk/privacy.html

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Lava-users] QEMU health check apparently not running