Hi Neil! Happy to see you back online.
On Mon, Jan 21, 2019 at 11:08:07AM +0000, Neil Williams wrote:
On Fri, 11 Jan 2019 at 20:28, Dan Rue dan.rue@linaro.org wrote:
I'm sorry, as surely this is an FAQ but I've spent quite a bit of time troubleshooting and reading. This is very similar to Kevin's thread from May subject 'u-boot devices broken after 2018.4 upgrade, strange u-boot interaction'. In that thread's case, the issue was that interrupt_char was being set to "\n". My symptoms are the same, but interrupt_char is set to " " or "d".
Space could well be problematic. This is down to how patterns get matched in the stream coming from the serial port and it's not LAVA as such which matters here, it's pexpect.
Yea, but "d" has the same symptom (it appears an extra \n is getting sent).
I'm running LAVA from the latest released containers (2018.11), and trying to use a beaglebone-black with a more recent u-boot than exists in validation.l.o. qemu works fine.
This will need investigation with that specific build of U-Boot on a suitable device and it's probably better to take this out of the container based instance to reduce the possible permutations.
This is the hill I will die on :) The whole point of containers is to reduce permutations. You know exactly what I'm running, bit for bit, without ambiguity, cruft, or any other artifacts from past versions or ancillary packages that may be laying around the filesystem. Anyway, who's to say I'm even running debian.
Right now, we have other issues which are being tested on the beaglebone-black devices in staging.validation.linaro.org and I do not want to complicate those by adding this testing to those boards.
I do have beaglebone-black devices available via lkft-staging.validation.linaro.org, so this is probably best handled as an issue in GitLab where the U-Boot files can be attached (e.g. as a tarball I can unpack onto my own microSD card for those devices).
It's OK - thank you for the offer, but I'm not asking for someone else to investigate and solve the problem. I'm actually trying to learn, and happy to do some of the legwork myself. It does seem curious to me that this is both what seems like a trivial issue to me, and I imagine also quite common. Isn't there just an option to eat the first =>, or, to not send the extra \n? I'm missing something in my understanding.
The problem seems to be that LAVA thinks there's a prompt when there isn't, and so it sends commands too quickly. Here's example output from the serial console (job link[2]):
U-Boot 2017.07 (Aug 31 2017 - 15:35:58 +0000) CPU : AM335X-GP rev 2.1 I2C: ready DRAM: 512 MiB No match for driver 'omap_hsmmc' No match for driver 'omap_hsmmc' Some drivers were not found MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Net: cpsw, usb_ether Press SPACE to abort autoboot in 10 seconds => => setenv autoload no => setenv initrd_high 0xffffffff => setenv fdt_high 0xffffffff => dhcp link up on port 0, speed 100, full duplex BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 10.100.0.55 (1006 ms) => 172.28.0.4 Unknown command '172.28.0.4' - try 'help' => tftp 0x82000000 57/tftp-deploy-t7xus3ey/kernel/vmlinuz link up on port 0, speed 100, full duplex *** ERROR: `serverip' not set ...
When I u-boot manually, after I hit SPACE (or 'd', both work), u-boot *deletes* the character and then prints '=> ' (is that delete the root cause?). When LAVA runs, it shows an extra => and starts typing as seen
The extra => is clearly a problem because pexpect is watching for every instance of that string and exiting the wait each time.That is what causes LAVA to proceed to sending more characters.
above. dhcp takes a second or two, and so the subsequent command starts to get lost (in the above log we see an IP, because 'setenv serverip' got lost).
If I set boot_character_delay to like 1000, it works because it gives enough time for dhcp to finish before typing the next character, but obviously makes the job very slow, and still not reliable.
I'm out of ideas.. help?
P.S. Two interesting things I've learned recently:
- boot_character_delay must be specified in device_types file. it's
ignored when specified in the device file (surprising, as I see it listed in some people's device files[3]). 2) If you install ser2net from sid, you can set max-connections and do some _very handy_ voyeurism on the serial console while lava does its thing (hat tip Kevin Hilman for that one).
Thanks, Dan
[1] https://lists.lavasoftware.org/pipermail/lava-users/2018-May/001064.html [2] https://lava.therub.org/scheduler/job/57 [3] https://git.linaro.org/lava/lava-lab.git/tree/lkft.validation.linaro.org/mas...
-- Linaro - Kernel Validation
Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/