On Aug 14, 2020, at 12:09 PM, Stephen Lawrence stephen.lawrence@renesas.com wrote:
-----Original Message----- From: Stephen Lawrence Sent: 12 August 2020 18:03 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org Subject: RE: fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")
-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 12 August 2020 14:59 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org Subject: Re: fastboot reboot-bootloader from docker test shell (was "fastboot in
docker
support")
[snip]
It should be enough. see if there are any errors in the udev logs. e.g. if lava-dispatcher-host fails for some reason there will be a correspoding line there (but no output unfortunately as udev does not capture it, only the failure exit code)
Thanks for the hints. Nothing in the host logs from lava-dispatcher-host via journalctl. In the worker container there is no systemd and in the lava-slave /var/log I don't see anything.
At this current moment it feels more a fundamental lava worker in docker issue given the lack of udev events being reported by udevadm monitor in the worker container. That the udev discovery code in lava-dispatcher-host has no chance if there are no events. Unless something about the container environment means I can not take the udevadm reporting at face value.
btw just a couple of additional comments about my post earlier today. I'm really trying not to be that new guy to a list who just spams with their issues and 'fix this next one' one after another. I did not mean that the developers should be maintaining some nice document set for running LAVA in docker.
I just meant two main points. Firstly, how the devs thought udev should work in such a lava environment - if its been considered. Secondly, that the wider community can collaborate on some patterns as to what needs to be shared into a container for certain LAVA services to work. Of course that can be codified where sensible in the docker base image or downstream projects like lava-docker. Web searches about enabling udev in docker containers returns very little outside some hacks.
It looks like I have finally tracked this down to its cause. Looking back at my notes when I was debugging lava_dispatcher/utils/udev.py it was looking like the lack of udev events in the worker container might be a red herring. So I went back to the lava-dispatcher-host source and the udev rules it appears to rely on.
lava-dispatcher-host was installed but in the worker container with the rest of the worker. Not receiving udev events in the worker container means that the rule there never executes. At the same time the host does receive the udev events on DUT reboot but there is nothing to communicate them to the worker container.
This then goes back to the first question I raised yesterday about how the lava devs see udev discovery commonly working when LAVA is containerised. For example, whether it happens on the host and is somehow communicated into the worker container, or whether 'full' udev is required in the worker container or whether in some cases a different setup is possible. I'll report on the last choice below.
Web discussion suggests there are opinion differences between the systemd and docker devs which may mean that 'full' udev will not appear in docker anytime soon: https://stackoverflow.com/a/62227562 There seems to be some ways of getting some support but they appear to require hacks like run the container in a privileged mode or bridging the container directly to the host network. That leaves sharing into the worker container in some way or an alternative.
For my specific use case/issue (udev discovery in 'test with docker' and 'deploy to downloads' docker) it appeared to be how/where to run lava-dispatcher-host and what communication requirements it had to the wider system. Having udev rules on the host that tried to execute lava-dispatcher-host in the worker container might be best if it needs to communicate with the other lava processes. For 'test with docker' and 'deploy to downloads' however the device is being shared into a sibling of the worker container, not a child. So execution in the worker container may present its own complications of sharing/communication with a sibling.
A high level pass of the code appeared to show it might be self contained so as a skunkworks test I tried installing lava-dispatcher-host on the host and installed the rules. This works! At least for my use case.
Simple fastboot reboot: https://lava.genivi.org/scheduler/job/1221
More advanced flashing case with extended fastboot commands and reboots: https://lava.genivi.org/scheduler/job/1226
One side effect I noticed is that (not a big surprise) logging for the device rediscovery is lost.
So great to finally have this working on some level but there are some open questions about how best to do this longer term. I don't have a good grasp of the lava code base and its wider requirements. Running lava-dispatcher-host on the host works for my use case, but you perhaps can immediately think of cases where it wouldn't. What do the devs think?
Regards
Steve
p.s. I think I found an error in the user doc. I'll send a pull request for that for review.
Steve,
I haven’t read through all this thread, but just wanted to point out:
https://git.lavasoftware.org/lava/docker-udev-tools/-/blob/master/udev-forwa...
This was something I implemented for forwarding udev events to a lava-dispatcher running in a container.
So might or might not be useful, but just wanted to point it out.
- k