That's really strange and does not give many ways to debug it remotely.

A new merge request is going to be merged that will allow to debug lava-run itself remotely. See https://git.lavasoftware.org/lava/lava/merge_requests/943

In the meantime I don't see how I can reproduce this issue :(

Le jeu. 23 janv. 2020 à 17:11, Alexander Moore <Alexander.Moore@cypress.com> a écrit :

Hi Remi,

 

The docker-compose logs are the only logs I’m aware of. Is the job log what appears in the web interface for the job, i.e. in this case it would be http://lava-server/scheduler/job/4284 ? There are no logs there in this case, the crash happens before any logs are written to the job log. If the job log is something different, please let me know where I can find it and I will send it to you.

 

From what I can see, it looks like the description.yaml is sent to the master successfully, but on rare occasions a problem occurs with the master sending it to the slave.

 

Thanks,

Alex

 

From: Remi Duraffort <remi.duraffort@linaro.org>
Sent: Thursday, January 23, 2020 2:25 AM
To: Alexander Moore <Alexander.Moore@cypress.com>
Cc: lava-users@lists.lavasoftware.org
Subject: Re: [Lava-users] lava-run crash after submitting a job

 

Hello,

 

the relevant logs are:

 

lava-dispatcher    | 2020-01-20 20:03:50,031    INFO [4284] Job END
lava-dispatcher    | 2020-01-20 20:03:50,031   ERROR [4284] Unable to read 'description.yaml'
lava-dispatcher    | 2020-01-20 20:03:50,032   ERROR [Errno 2] No such file or directory: '/var/lib/lava/dispatcher/slave/tmp/4284/description.yaml'
lava-dispatcher    | Traceback (most recent call last):
lava-dispatcher    |   File "/usr/bin/lava-slave", line 239, in description
lava-dispatcher    |     data = open(filename, 'r').read()
lava-dispatcher    | FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/lava/dispatcher/slave/tmp/4284/description.yaml'
lava-dispatcher    | 2020-01-20 20:03:50,034   ERROR [4284] lava-run crashed

 

The main problem is that we don't see anything in this log.

 

Could you provide the job logs?

 

Le mer. 22 janv. 2020 à 20:14, Alexander Moore <Alexander.Moore@cypress.com> a écrit :

Hi Remi,

 

For more information, I have followed https://collaborate.linaro.org/pages/viewpage.action?pageId=118293253 which uses a docker-compose setup from https://github.com/Linaro/lite-lava-docker-compose

 

Below pastebin has all docker-compose logs for recent lava-run crash (job 4284) using 2019.12 images.

https://pastebin.com/hEXZ7Nbm

 

Thanks,

Alex

 

From: Remi Duraffort <remi.duraffort@linaro.org>
Sent: Wednesday, January 22, 2020 6:01 AM
To: Alexander Moore <Alexander.Moore@cypress.com>
Cc: lava-users@lists.lavasoftware.org
Subject: Re: [Lava-users] lava-run crash after submitting a job

 

If you use the lava-dispatcher docker container, then the lava-slve logs are printed on stdout. So "docker logs" would give you the logs.

 

 

Rgds

 

Le mar. 21 janv. 2020 à 22:59, Alexander Moore <Alexander.Moore@cypress.com> a écrit :

In my lava-dispatcher docker container, /var/log/lava-dispatcher/ exists but is empty.

 

Thanks,

Alex

 

From: Remi Duraffort <remi.duraffort@linaro.org>
Sent: Tuesday, January 21, 2020 8:37 AM
To: Alexander Moore <Alexander.Moore@cypress.com>
Cc: lava-users@lists.lavasoftware.org
Subject: Re: [Lava-users] lava-run crash after submitting a job

 

On the dispatcher, look for /var/log/lava-dispatcher/lava-slave.log

 

Le mar. 21 janv. 2020 à 16:58, Alexander Moore <Alexander.Moore@cypress.com> a écrit :

Hi Remi,

 

The log states “lava-run crashed”, I don’t think there is any lava-master crash because lava-master continues working fine for subsequent jobs. How can I get the lava-slave logs to check for a crash there?

 

Thanks,

Alex

 

From: Remi Duraffort <remi.duraffort@linaro.org>
Sent: Tuesday, January 21, 2020 12:52 AM
To: Alexander Moore <Alexander.Moore@cypress.com>
Cc: lava-users@lists.lavasoftware.org
Subject: Re: [Lava-users] lava-run crash after submitting a job

 

Hello,

 

this is lava-master crashing, not lava-run right ?

If lava-run is crashing, please provide the corresponding lava-slave logs.

 

 

Rgds

 

Le lun. 20 janv. 2020 à 23:34, Alexander Moore <Alexander.Moore@cypress.com> a écrit :

Hello,

 

Our internal LAVA setup has been hitting this crash intermittently (it reproduces about one out of every 30 job submissions). The below log snippet is from the 2019.07 LAVA docker images, but we updated to the 2019.12 images and the crash still occurs with the same error signature.

 

lava-master | 2019-10-22 15:37:17,539 DEBUG |--> [523] scheduling

lava-master | 2019-10-22 15:37:17,854 INFO [523] START => lava-dispatcher (CY8CKIT_062-01)

lava-master | 2019-10-22 15:37:17,969 INFO [523] lava-dispatcher => START_OK

lava-master | 2019-10-22 15:37:22,981 INFO [523] lava-dispatcher => END (lava-run crashed, mark job as INCOMPLETE)

lava-master | 2019-10-22 15:37:23,038 ERROR [523] Unable to dump 'description.yaml'

lava-master | 2019-10-22 15:37:23,038 ERROR [523] Compressed data ended before the end-of-stream marker was reached

lava-master | Traceback (most recent call last):

lava-master | File "/usr/lib/python3/dist-packages/lava_server/management/commands/lava-master.py", line 337, in _handle_end

lava-master | description = lzma.decompress(compressed_description)

lava-master | File "/usr/lib/python3.5/lzma.py", line 340, in decompress

lava-master | raise LZMAError("Compressed data ended before the "

lava-master | _lzma.LZMAError: Compressed data ended before the end-of-stream marker was reached

 

Please let me know if I can provide any other info to help debug.

 

Thanks,

Alex


This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.

_______________________________________________
Lava-users mailing list
Lava-users@lists.lavasoftware.org
https://lists.lavasoftware.org/mailman/listinfo/lava-users


 

--

Rémi Duraffort

LAVA Architect

Linaro


This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.


 

--

Rémi Duraffort

LAVA Architect

Linaro


This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.


 

--

Rémi Duraffort

LAVA Architect

Linaro


This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.


 

--

Rémi Duraffort

LAVA Architect

Linaro


This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.



--
Rémi Duraffort
LAVA Architect
Linaro