Hi Neil et al,
I'm trying to debug a simple qemu job that goes straight from running to incomplete without log creation (used to working ok, but I reinstalled everything on a different machine...)
Looking at /var/log/lava-server/lava-scheduler.log I see the following:
2015-12-09 15:22:27,838 [INFO] [lava_scheduler_daemon.job.JobRunner.14] starting job {u'timeout': 18000, 'health_check': False, u'job_name': u'qemu-arm-test', u'actions': [{u'command': u'deploy_linaro_kernel', u'parameters': {u'login_prompt': u'login:', u'kernel': u' http://images.validation.linaro.org/functional-test-images/qemu-arm/zImage-q...', u'username': u'root', u'rootfs': u' http://images.validation.linaro.org/functional-test-images/qemu-arm/core-ima..., {u'command': u'boot_linaro_image', u'parameters': {u'test_image_prompt': u'root@qemu-system-arm:~#'}}], u'target': u'qemu0'} 2015-12-09 15:22:27,838 [INFO] [lava_scheduler_daemon.job.MonitorJob] monitoring "setsid lava-server manage schedulermonitor 14 lava-dispatch qemu0 /tmp/tmpPd4nGs -l info -f /var/log/lava-server/lava-scheduler.log" 2015-12-09 15:22:29,171 [INFO] [lava_scheduler_daemon.job.Job.qemu0] executing "lava-dispatch /tmp/tmpFltuQQ --output-dir /var/lib/lava-server/default/media/job-output/job-14" 2015-12-09 15:22:30,388 [INFO] [lava_scheduler_daemon.job.DispatcherProcessProtocol] childConnectionLost for qemu0: 0 2015-12-09 15:22:30,389 [INFO] [lava_scheduler_daemon.job.DispatcherProcessProtocol] childConnectionLost for qemu0: 1 2015-12-09 15:22:30,389 [INFO] [lava_scheduler_daemon.job.DispatcherProcessProtocol] childConnectionLost for qemu0: 2 2015-12-09 15:22:30,389 [INFO] [lava_scheduler_daemon.job.DispatcherProcessProtocol] processExited for qemu0: A process has ended with a probable error condition: process ended with exit code 1. 2015-12-09 15:22:30,389 [INFO] [lava_scheduler_daemon.job.DispatcherProcessProtocol] processEnded for qemu0: A process has ended with a probable error condition: process ended with exit code 1. 2015-12-09 15:22:30,389 [INFO] [lava_scheduler_daemon.job.Job.qemu0] job finished on qemu0 2015-12-09 15:22:30,389 [INFO] [lava_scheduler_daemon.job.Job.qemu0] job incomplete: reported 1 exit code 2015-12-09 15:22:30,422 [INFO] [lava_scheduler_daemon.dbjobsource.DatabaseJobSource] job 14 completed on qemu0
I tried to run manually:
setsid lava-server manage schedulermonitor 14 lava-dispatch qemu0 qemu-arm.json
powerci@lab-baylibre:~/POWERCI/scripts/user$ 2015-12-09 15:23:23,285 [ERROR] [lava_scheduler_daemon.job.Job.qemu0] AttributeError: 'Job' object has no attribute '_protocol' Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1203, in mainLoop self.runUntilCurrent() File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 798, in runUntilCurrent f(*a, **kw) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 393, in callback self._startRunCallbacks(result) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 501, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/python2.7/dist-packages/lava_scheduler_daemon/job.py", line 226, in _run self.cancel(exc) File "/usr/lib/python2.7/dist-packages/lava_scheduler_daemon/job.py", line 157, in cancel self._protocol.transport.signalProcess(getattr(signal, signame)) exceptions.AttributeError: 'Job' object has no attribute '_protocol'
Note that I get the same issue with other jobs (boards, kvm): submission is OK, but incomplete, and no log.
Any help would be much appreciated!
Many thanks, Marc.
On 9 December 2015 at 15:30, Marc Titinger mtitinger@baylibre.com wrote:
Hi Neil et al,
I'm trying to debug a simple qemu job that goes straight from running to incomplete without log creation (used to working ok, but I reinstalled everything on a different machine...)
That is the most likely problem. For completeness, I copied out your test and it ran fine here: https://staging.validation.linaro.org/scheduler/job/138097
I tried to run manually:
setsid lava-server manage schedulermonitor 14 lava-dispatch qemu0 File "/usr/lib/python2.7/dist-packages/lava_scheduler_daemon/job.py", line 226, in _run self.cancel(exc) File "/usr/lib/python2.7/dist-packages/lava_scheduler_daemon/job.py", line 157, in cancel self._protocol.transport.signalProcess(getattr(signal, signame)) exceptions.AttributeError: 'Job' object has no attribute '_protocol'
Note that I get the same issue with other jobs (boards, kvm): submission is OK, but incomplete, and no log.
In that case, it is going to be a configuration error or a mismatch in what is actually installed.
It sounds initially like you don't have the version of lava-dispatcher installed that you expect - also check that you haven't done any pip installs and that /usr/local/ has no python installs on that machine.
On 09/12/2015 16:57, Neil Williams wrote:
On 9 December 2015 at 15:30, Marc Titinger mtitinger@baylibre.com wrote:
Hi Neil et al,
I'm trying to debug a simple qemu job that goes straight from running to incomplete without log creation (used to working ok, but I reinstalled everything on a different machine...)
That is the most likely problem. For completeness, I copied out your test and it ran fine here: https://staging.validation.linaro.org/scheduler/job/138097
I tried to run manually:
setsid lava-server manage schedulermonitor 14 lava-dispatch qemu0 File "/usr/lib/python2.7/dist-packages/lava_scheduler_daemon/job.py", line 226, in _run self.cancel(exc) File "/usr/lib/python2.7/dist-packages/lava_scheduler_daemon/job.py", line 157, in cancel self._protocol.transport.signalProcess(getattr(signal, signame)) exceptions.AttributeError: 'Job' object has no attribute '_protocol'
Note that I get the same issue with other jobs (boards, kvm): submission is OK, but incomplete, and no log.
In that case, it is going to be a configuration error or a mismatch in what is actually installed.
It sounds initially like you don't have the version of lava-dispatcher installed that you expect - also check that you haven't done any pip installs and that /usr/local/ has no python installs on that machine.
As a matter of fact I did some pip install consequently to hacking in lavapdu. But there is nothing that suspect under /usr/{bin,lib}/local only the lavapdu stuff. I did a pip install of daemon-1.1 though.
It's not so easy to debug through stepping, any stance you would recommend to attach or run the distpacher directly with pdb ?
Many thanks, M.
Do not use pip with LAVA. The process which starts LAVA jobs uses python-daemon and will fail if this is misconfigured. Remove everything that pip has installed on that system.
Ensure the scheduler daemon (/etc/init.d/lava_server) is using debug logging level and watch the logs to ensure it is actually running. This may have nothing to do with lava-dispatch. lava-dispatch isn't even being called - the pip install is causing the scheduler to look for the wrong support.
On 10 December 2015 at 09:01, Marc Titinger mtitinger@baylibre.com wrote:
On 09/12/2015 16:57, Neil Williams wrote:
On 9 December 2015 at 15:30, Marc Titinger mtitinger@baylibre.com wrote:
Hi Neil et al,
I'm trying to debug a simple qemu job that goes straight from running to incomplete without log creation (used to working ok, but I reinstalled everything on a different machine...)
That is the most likely problem. For completeness, I copied out your test and it ran fine here: https://staging.validation.linaro.org/scheduler/job/138097
I tried to run manually:
setsid lava-server manage schedulermonitor 14 lava-dispatch qemu0 File "/usr/lib/python2.7/dist-packages/lava_scheduler_daemon/job.py", line 226, in _run self.cancel(exc) File "/usr/lib/python2.7/dist-packages/lava_scheduler_daemon/job.py", line 157, in cancel self._protocol.transport.signalProcess(getattr(signal, signame)) exceptions.AttributeError: 'Job' object has no attribute '_protocol'
Note that I get the same issue with other jobs (boards, kvm): submission is OK, but incomplete, and no log.
In that case, it is going to be a configuration error or a mismatch in what is actually installed.
It sounds initially like you don't have the version of lava-dispatcher installed that you expect - also check that you haven't done any pip installs and that /usr/local/ has no python installs on that machine.
As a matter of fact I did some pip install consequently to hacking in lavapdu. But there is nothing that suspect under /usr/{bin,lib}/local only the lavapdu stuff. I did a pip install of daemon-1.1 though.
It's not so easy to debug through stepping, any stance you would recommend to attach or run the distpacher directly with pdb ?
Many thanks, M.
lava-users@lists.lavasoftware.org