lava-users January 2018

lava-users@lists.lavasoftware.org

20 participants
19 discussions

by Guillaume Tucker

A change was sent a while ago to add support for the Coreboot / Depthcharge bootloader which is used on Chromebook devices. This is useful in particular to avoid having to install U-Boot on Chromebook devices. See this Gerrit review here for previous history: https://review.linaro.org/#/c/15203/ I'm now opening this case again to try and get this resolved, there seem to be several issues with the original patch that would need to be clarified. Also, some things might have changed since then in LAVA or Coreboot which could potentially lead to a different approach - any feedback on this would be welcome. To start with, I understand that running mkimage on the dispatcher is not a valid thing to do, it should receive a FIT (flattened image tree) kernel image ready to be booted. This complicates things a bit for projects like kernelci.org where only a plain kernel image is built and ramdisks are served separately, but it's fair enough to say that LAVA is not meant to be packaging kernel images on the fly. Then I believe creating the command line file in LAVA should be fine, although it probably makes more sense to have both the FIT image and cmdline file generated by the same build system. In any case, both files would need to be served from the dispatcher TFTP server to the target device running Coreboot / Depthcharge. So the idea was basically to have an option in Coreboot / Depthcharge to interactively tell it where to find these files for the current job to run, say: <JOB_NUMBER>/tftp-deploy-<RANDOM>/kernel/vmlinuz <JOB_NUMBER>/tftp-deploy-<RANDOM>/kernel/cmdline It looks like the current patch in Gerrit relies on this location to be hard-coded in the bootloader, which works fine for a private development set-up but not for LAVA. To recap, my understanding is that the "depthcharge" boot support code in LAVA would need to: * maybe create the cmdline file with basically the kernel command line split up with one argument per line * or just download the cmdline file along with the vmlinuz FIT * place both the cmdline and vmlinuz FIT files in the job's TFTP directory on the dispatcher * turn on the device and open the serial console... * interactively pass at least the path to the job TFTP directory on the serial console (and if possible the server IP address as well, and maybe even the individual file names rather than hard-coded vmlinuz and cmdline) * look for a bootloader message to know when the kernel starts to load and hand over to the next action (login...) Please let me know if this sounds reasonable or if we should be doing anything differently. I think it would be good to have some agreement and a clear understanding of how this is going to be implemented before starting to work on the code again. Best wishes, Guillaume

7 years, 5 months

LAB Notice: LAVA Instance downtime

by Dave Pigott

[bcc’d to everyone] Hi all, Next week, starting on Monday, we will be upgrading all the LAVA servers to Debian Stretch and then upgrading to LAVA 2018.01. This will involve minimal downtime for each instance. Devices will be off-lined ahead of each upgrade. The order of upgrade will be: Monday: LNG: lng.validation.linaro.org PMWG: pmwg.validation.linaro.org Tuesday: Production: validation.linaro.org LKFT: lkft.validation.linaro.org All being well, the downtime, i.e. the time you will not be able to submit jobs or see the web site, will be of the order of one hour per instance. If that is going to change I will send out a new notification. Thanks for your patience, Dave ---------------- Dave Pigott LAVA Lab Lead Linaro Ltd t: (+44) (0) 1223 400063

7 years, 5 months

Re: [Lava-users] No result coming when job submitted in lava

by Robert Marshall

Smita, Yes I looked and confirm that analysis, you've not commented on what the U-Boot prompt is? I'm putting lava-users back on the cc list in case others wish to comment, and making my comment below clearer Robert "Gumansingh, Smita" <Smita_Gumansingh(a)mentor.com> writes: > Hi Robert, > > Have you got any chance to see my bbb health check log > > bbb health check fails at this point > > ------------ ---------------------- > send-reboot-commands timed out after 179 seconds > end: 2.4.1.1 send-reboot-commands (duration 00:02:59) > ---------------------------- > > Please have a look on this log https://pastebin.com/gPmMG84J > > Thanks & Regards, > Smita Gumansingh > > ________________________________________ > From: Robert Marshall <robert.marshall(a)codethink.co.uk> > Sent: Wednesday, January 10, 2018 6:21 PM > To: Gumansingh, Smita > Subject: Re: [Lava-users] No result coming when job submitted in lava > > Smita > > Though you only to replace that line *if* the U-Boot prompt doesn't ^ need > consist of "U-Boot⇒" - interrupt the BBB boot to see what it actually is, > and if it is necessary, you also need to remove the other line - > apologies for the instructions not being clear here. > > Thanks for the fuller output in the other email! > > Robert > > "Gumansingh, Smita" <Smita_Gumansingh(a)mentor.com> writes: > >> Thanks Robert for the quick response >> >> Currently u-boot is showing in /etc/lava-server/dispatcher-config/device-types/beaglebone-black.jinja2 is: >> >> {% set bootloader_prompt = bootloader_prompt|default('U-Boot') %} >> >> As you suggested I added the line: {% set bootloader_prompt = bootloader_prompt|default('⇒') %} and submitted a job >> >> No output from the job when submitted... >> >> Thanks & Regards, >> Smita Gumansingh >> >> ________________________________________ >> From: Robert Marshall <robert.marshall(a)codethink.co.uk> >> Sent: Wednesday, January 10, 2018 5:08 PM >> To: Gumansingh, Smita >> Cc: lava-users(a)lists.linaro.org >> Subject: Re: [Lava-users] No result coming when job submitted in lava >> >> Hi, some comments below! >> >> "Gumansingh, Smita" <Smita_Gumansingh(a)mentor.com> writes: >> >>> Hi, >>> I am new to lava and trying to submit a job on lava scheduler ,job submitted nut no result is coming. I am trying to >>> test the CIP Kernel on the Beaglebone Black(board is physically connected to my linux machine). Health checkup is >>> working somehow . >> >> By 'somehow' do you mean the health check is completing correctly and the >> device is shown as online? >> >>> I am following the steps from here >>> https://wiki.linuxfoundation.org/civilinfrastructureplatform/ciptestingrefe… >>> >>> Pre-requise: >>> 1. I have built(cip_v4.4.92) the CIP Kernel with Kernel CI as the steps mentioned in >>> https://wiki.linuxfoundation.org/civilinfrastructureplatform/cipsystembuild… >>> 2.Target is booted and up and flashed with debian 4.9 rootfs >>> >> >> What is the U-Boot prompt with this version, if it is ⇒ rather than U-Boot⇒ >> >> on the vagrant machine you need to >> >> sudo vi /etc/lava-server/dispatcher-config/device-types/beaglebone-black.jinja2 >> And add the line: {% set bootloader_prompt = bootloader_prompt|default('⇒') %} >> >> >>> Job Definination is pasted here >>> >>> https://pastebin.com/YwnPXidK >>> >>> Need help for going further !!!! >> >> Do you get any output from the job? >> >>> >>> Thanks & Regards, >>> Smita Gumansingh >>> >> >> Robert

7 years, 5 months

No result coming when job submitted in lava

by Gumansingh, Smita

Hi, I am new to lava and trying to submit a job on lava scheduler ,job submitted nut no result is coming. I am trying to test the CIP Kernel on the Beaglebone Black(board is physically connected to my linux machine). Health checkup is working somehow . I am following the steps from here https://wiki.linuxfoundation.org/civilinfrastructureplatform/ciptestingrefe… Pre-requise: 1. I have built(cip_v4.4.92) the CIP Kernel with Kernel CI as the steps mentioned in https://wiki.linuxfoundation.org/civilinfrastructureplatform/cipsystembuild… 2.Target is booted and up and flashed with debian 4.9 rootfs Job Definination is pasted here https://pastebin.com/YwnPXidK Need help for going further !!!! Thanks & Regards, Smita Gumansingh

7 years, 5 months

Better bootloader failure detection

by Matt Hart

Currently it is difficult to tell the difference between an infrastructure problem in a device bootloader, or a kernel failure. If a kernel silently fails to boot, LAVA throws a bootloader-commands timeout because it hasn’t matched ‘Linux version’ to know the kernel has started. However, this timeout could also be caused by a real problem in the bootloader, such as a DHCP failure or a TFTP timeout. KernelCI would like to catch actual infrastructure problems in the bootloader, but can’t tell if the kernel just didn’t boot, or the commands actually timed out in the bootloader. To fix this, we're going to: - change the bootloader-commands action to finish when it has sent the last command - have auto-login-action takeover monitoring the kernel boot process - extend bootloader-commands to match more infrastructure problems - update uboot commands to execute the commands in order (like the other bootloader implementations), rather than building a script and then calling that as the last command This work is scoped for the January 2018.1 LAVA release.

7 years, 5 months

Job stuck

by Thomas Petazzoni

Hello, We have an installation where we use LAVA 2017.12. We are regularly seeing jobs that remain stuck for several days. For example, I have a job right now on the Armada XP GP that is stuck since 1 day and 11 hours. The log visible in the LAVA Web interface looks like this: http://code.bulix.org/7pvru8-255308?raw This is job #855671 in our setup. The logs on the lava-slave look like this: http://code.bulix.org/c5tejy-255312?raw So, from the lava-slave point of view, the job is finished. However, the "Job END" message had to be resent several times to the master. Interestingly, this sequence lead to a very nice: ERROR [855671] lava-run crashed On the lava-master side (which runs on another machine), the logs look like this: http://code.bulix.org/b61keb-255316?raw And this happens for lots of jobs. Pretty much every day or two, we have ten boards stuck in this situation. I have the lava-master logs with DEBUGs if this can be helpful. However, must DEBUG logs don't have the job number in them, so it makes it difficult to associate the DEBUG messages with the problematic job (since numerous other jobs are running). Does anyone has an idea what could be causing this ? Or how to debug this further ? Best regards, Thomas Petazzoni -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com

7 years, 5 months

LXC device not working in multinode jobs

by Andrei Gheorghiu

Hi, I am having issues using an LXC device within the multinode LAVA protocole/API. The job yaml gets validated, but it fails to run with the following error: Missing protocol 'lava-lxc' in ['lava-multinode'] Full yaml used can be found here: https://pastebin.com/BUsX0G0C While searching for a fix I also found this email from a year ago, which wasn't answered (seems related): http://linaro-validation.linaro.narkive.com/mXhxhHqy/issues-with-lava-multi… Thanks, Andrei

7 years, 5 months

Help: using custom script - run.sh error

by Mylene JOSSERAND

Hello everyone, I am trying to use lava with a custom script[1]. Previously, my test was used as a shell script (using exit) and I updated it to be a custom script to be able to use "skip" result. I ran a job and I got this error: /lava-854766/0/tests/1_custom-tests/run.sh: line 1: ####: not found It seems to be linked with my echo of "#### Starting NAND test ####" [2]. Here is the yaml used: https://github.com/free-electrons/test_suite/blob/master/tests/custom.yaml the source of the script: https://github.com/free-electrons/test_suite/blob/master/scripts/nand.sh the jinja file: https://github.com/free-electrons/custom_tests_tool/blob/master/src/jobs_te… and some part of the output when the test ran and failed: http://code.bulix.org/ca3mc7-254957?raw As I am new to LAVA, I have probably missed something. On the current test example [3], it seems that there is no output using "echo". Is it possible that my "echo" leads to this error? It is possible to print messages in a custom script, right? [1]: https://validation.linaro.org/static/docs/v2/writing-tests.html#writing-cus… [2]: https://github.com/free-electrons/test_suite/blob/master/scripts/nand.sh#L9 [3]: https://git.linaro.org/lava-team/refactoring.git/tree/functional/unittests.… Thank you in advance for your help, Best regards, -- Mylène Josserand, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com

7 years, 5 months

Need Help With Simplistic Testing Over SSH

by Tan, Jian Chern

Hi, I'm trying to get started with LAVA by first attempting to do some simplistic testing over SSH to run some tests on a device that doesn't have a default template. I'm getting a few connection errors and others like output ['Permission denied (publickey,password).\r', 'lost connection', ''] and it's probably because I've misconfigured the jinja files, having little experience with these LAVA jinja templates. Attached are the job logs, jinja2 template files and the test yaml file. Could anyone point me in the right direction by either providing sample ssh jinja files and job files or pointing out the errors in my config. Thanks! Thanks! Jian Chern

7 years, 6 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

lava-users January 2018