Re: [Lava-users] LAVA V2 hangs over non-existent job!

22 Feb 2018


      On 22 February 2018 at 08:57, Zoran S zoran.stojsavljevic.de@gmail.com
wrote:
...
Hello Neil,
I found the cause why the whole Lava behaves insane. It is that VM
Virtual Box stretch.vmdk reached the limit (device full).
Since stretch.vmdk is fixed size file, I've managed to clone and
create stretch.vdi (dynamic Virtual disk), which is 4x size of stretch
.vmdk. So far the new (cloned) VM, till Lava (and there are tons of SW
running prior Lava) behave very correctly.
/dev/sda1       38240460 8160436  28118092  23% /
I have tried again to run the scheduler, with qemu01 which perfectly
worked before. It behaves the same. The job (qemu) stays indefinitely
in submitted state.
Then this is a local issue in the database in that VM resulting from the
local problem of ENOSPACE. Sorry, but this is not something we can solve,
it has to be resolved locally.
...
This job (qemu) should be NOT dependent of anything else, correct?
It is a test job. It is dependent on device configuration like any other.
It is dependent on database configuration like any other. It looks like
there is a problem in the database state on your local VM. You will need to
use the django admin interface and the available log files to resolve that,
inside that VM.
...
If
it is (of other independent jobs), the whole Lava project is created
as wrong architecture?!
Sorry, that makes no sense to me. QEMU can support multiple architectures.
...
It is very clear to me that Lava does not behave correctly. It is
Lava's fault, I am sure.
Sorry, this is a problem inside the database in your local VM resulting
from the problems arising from your local VM running out of space and
possibly other issues inside the VM, particularly some of the current
values in the database. That can only be resolved by running commands on
that database.
...
Please, provide the way to reset the whole Lava in some synch-ed way.
Commands service lava-server restart and similar do not work.
...
Please understand that a service restart just restarts a single process -
the problem is in the database which that process then uses. That needs
investigation, not a reset.
Depending on how much data you have in that VM, it might be best to throw
it away and start again with a fresh installation without space issues and
taking your time to go through the documentation thoroughly and
*carefully*. Things get a lot easier if you have a dedicated machine to run
Debian Stretch instead of relying on virtual machines, however, we
regularly test LAVA installations inside QEMU VMs too. If you have a backup
of the database, then restore the backup.
https://staging.validation.linaro.org/static/docs/v2/admin-backups.html
https://staging.validation.linaro.org/results/query/~neil.williams/staging-f...
We also regularly test running jobs in a fresh virtual machine installation.
This particular VM needs work to investigate the database issues before it
can be upgraded. Some of the values for device state and test job state
have been corrupted and you'll find more information in the log files.
...
Thank you,
Zoran
_______
On Wed, Feb 21, 2018 at 3:05 PM, Neil Williams neil.williams@linaro.org
wrote:
...
On 21 February 2018 at 13:58, Zoran S zoran.stojsavljevic.de@gmail.com
wrote:
...
Hello Neil,
This does not help me at all. I ran qemu01 device, which was perfectly
OK. Now it is running, and stuck in running state. I cancelled this
job, but it is stucked in cancelling state!?
...
I need method to reset the whole LAVA.
Not necessarily. You can do that but you are better off learning about
the
...
problems and the causes. LAVA is not a small utility that you reset at
the
...
first sign of problems. Everyone installing LAVA needs some level of
administrative skills and that can involve a learning curve. apt-get
install
...
is only the very beginning of the work.
...
...
What version of LAVA are you running? There were important changes in
scheduling after the removal of V1 in 2018.1
root@stretch:/etc/lava-server/dispatcher-config/devices# dpkg -l
lava-server lava-dispatcher
Desired=Unknown/Install/Remove/Purge/Hold
|
Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/
trig-aWait/Trig-pend
...
...
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                               Version
Architecture           Description
+++-==================================-=====================
=-======================-===================================
...
...
ii  lava-dispatcher                    2017.7-1~bpo9+1        amd64
              Linaro Automated Validation Architecture dispatcher
ii  lava-server                        2017.7-1~bpo9+1        all
              Linaro Automated Validation Architecture server
root@stretch:/etc/lava-server/dispatcher-config/devices#
How I can upgrade to latest Lava. I know that apt-get upgrade Lava
does do the magic.
The documentation covers upgrades -
https://validation.linaro.org/static/docs/v2/installing_on_
debian.html#lava-repositories
...
...
...
Check the lava-master and lava-slave logs to find the
misconfiguration.
...
...
...
It's likely to be invalid state of the device and/or test job or
misconfigured device configuration.
No idea where the logs are? /var/log/apache2/lava-server.log ???
This is also in the documentation.
https://validation.linaro.org/static/docs/v2/simple-admin.
html#where-to-find-debug-information
...
...
No idea what the output means.
There is apache documentation to help you with this but apache is doing
the
...
serving of the pages, not the scheduling.
...
Thank you,
Zoran
_______
On Wed, Feb 21, 2018 at 2:45 PM, Neil Williams <
neil.williams@linaro.org>
...
...
wrote:
...
On 21 February 2018 at 13:33, Zoran S <zoran.stojsavljevic.de@gmail.
com>
...
...
...
wrote:
...
Hello to LAVA admins,
You have fatal error in Lava V2. I submitted job, which hangs in
submitted state indefinitely.
This will be down to local misconfiguration. Submitted jobs will stay
in
...
...
...
submitted until everything is ready for the job to start.
What version of LAVA are you running? There were important changes in
scheduling after the removal of V1 in 2018.1
...
I also see the non-existent job 92 (which is deleted) running there?!
Test jobs should not typically be deleted - except possibly as part of
archival.
...
http://localhost:8080/scheduler/device/bbb01
254 2018-02-21 12:05 Reserved → Running —
Job 92 running
We don't use this format for device transitions anymore. There were
issues
with the old transitions but those could not be addressed until V1 was
removed.
https://validation.linaro.org/scheduler/device/beaglebone-black02
...
253 2018-02-21 12:04 Idle → Reserved —
Reserved for job 92
252 2018-02-21 12:03 Running → Idle —
Job 91 cancelled
251 2018-02-21 12:03 Reserved → Running —
Job 91 running
250 2018-02-21 12:03 Idle → Reserved —
Reserved for job 91
249 2018-02-21 11:33 Running → Idle —
Job 89 has ended. Setting job status Incomplete
But when I look to this local pointer:
http://localhost:8080/admin/lava_scheduler_app/testjob/
102 Submittednobodylessbeaglebone-black -Feb. 21, 2018, 1:07 p.m.--
76   Incompletenobodylessbeaglebone-black bbb01 (Running, health
Unknown)Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:50 a.m.Feb. 20,
2018,
...
...
...
...
9:55 a.m.
49   Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16,
2018, 10:44 a.m.Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:50 a.m.
43   Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16,
2018, 8:42 a.m.Feb. 16, 2018, 8:43 a.m.Feb. 16, 2018, 8:49 a.m.
How to force submitted job to be active job (from both CLI and GUI)?
Check the lava-master and lava-slave logs to find the
misconfiguration.
...
...
...
It's
likely to be invalid state of the device and/or test job or
misconfigured
device configuration.
...
How to delete nonexistent jobs (from both CLI and GUI)?
Not recommended. lava-server manage has helpers to delete objects but
the
first thing to do is debug what is happening on your localhost.
...
Thank you,
Zoran
_______________________________________________
Lava-users mailing list
Lava-users@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/
-- 

Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Lava-users] LAVA V2 hangs over non-existent job!

=-======================-===================================

Neil Williams

Neil Williams