Hi folks,
We held our regular weekly design meeting today via Hangout. Summary
of discussion:
1. [stevanr] LAVA auth revamp
1. Restricted for authenticated users use case
1. With current design, everything should be open (visible
permission) by default; user, group, is_public fields removed
and permissions assigned by groups
1. [dean] would be useful to restrict to only authenticated
or per group devices/testjobs
2. So if someone wants to restrict visibility to let’say only
authenticated users, it’s a bit tricky to support it with the
current design
2. Device owner
1. My plan is to remove every way of having some arbitrary field
mess with authorization other then permissions. Does anyone
see a problem with removing this field? (physical_owner will
still remain in place but it has no say in auth)
1. [Steve] Check with Dave?
2. Do they use device owner?
3. [Rémi] how to allow one group to update one device object?
1. Is it of any use?
4. [Rémi] code/design/schema available somewhere?
2. [Steve] Future plans - when/how/where do we notify of upcoming
changes?
1. lava-devel by default for most things
2. lava-announce for breaking changes, and give notice
1. how much notice?
1. 1m? 3m? agree on 2 months for things like BD migrations
2. 2019.05 will have some migrations, then no more yet planned
until .08 or .09
3. [Steve] When will schema validation become rigid? .08 or .09 as well?
1. Print warnings before then
2. Schema validation in strict mode will complain about most jobs
(80%?), so we can't turn that on!
3. For now, allow submissions. Also run the validator again in
strict mode and print its output as a warning on submission. Can
we put the same output into the test log too? Should we?
1. Some users may look at the LAVA job results page
2. Some may just grab the logs
3. Maybe start mailing admins with a daily summary of warnings?
Add a management command to check for warnings.
4. Talk to other people (e.g. Matt about kernelci)
4. Maybe look into versioning of schemas and support validation of
supported versions? Lots of work… :-/
5. Not complete yet, we still want people running schema checks and
letting us know about any problems found
1. lavacli jobs validate <job-def.yaml>
2. share/lava-schema.py job <job-def.yaml>
4. [Rémi] doc canonicalization
1. https://moz.com/learn/seo/canonicalization
2. Add links back to https://docs.lavasoftware.org/lava in our help
information
3. Need to get docs.ls.o set up!
5. [Steve] Stevan being added as a reviewer/admin
1. Sort out details offline
============================================================================
The LAVA design meeting is held weekly, every Wednesday at 13:00 to
14:00 UTC using Google Hangouts Meet: https://meet.google.com/qre-rgen-zwc
Feel free to comment here or join us directly in the meeting.
Minutes from this and previous meetings are also stored in the LAVA wiki:
https://git.lavasoftware.org/lava/lava/wikis/design-meetings/index
Cheers,
--
Steve McIntyre steve.mcintyre(a)linaro.org
<http://www.linaro.org/> Linaro.org | Open source software for ARM SoCs
Hi folks,
We held our regular weekly design meeting yesterday via
Hangout. Summary of discussion:
1. [Neil] Job action timeouts
a. Downgrade current change to not reject XMLRPC API job submission [Rémi]
1. lots of jobs on LKFT need changes, likely to be indicative
of a wider problem
b. Implement the XMLRPC check and get lava-schema.py out to
people in 2019.02 [Rémi]
c. Announce the new schema
d. Leave fatal exceptions until a future release
e. Confirmed. the schema validation itself is not yet part of lava_scheduler_app submit.
2. [Neil] Aarch64 gitlab-runners
a. Initial config available, needs optimisation, especially
concurrency per machine vs cores per runner
1. optimisation to be done during 2019.03 cycle.
b. [Steve] Mustang machine not booting - investigating
1. could be an issue with upgrade to buster.
3. [Neil] wisdom of running unit tests inside docker builds in the
ci-images project?
a. guarantees that the new image won't break lava.git master.
b. another item to be added to the docs of how our CI
operates. [Neil] needs an issue.
4. [Neil] Remi to authenticate the GnuPG fingerprint for
4E9995EC67B6560E0A9B97A9597DCC10C0D1B33D to enable lavasoftware.org
ansible password_store
a. Now fixed via keys.gnupg.net and pgp.earth.li
5. [Dean] Feasibility of upgrading django-ldap-auth to version 1.7?
a. https://tracker.debian.org/pkg/django-auth-ldap
b. To sync LDAP groups into Django auth.
c. We want to be able to mirror LDAP groups as groups in LAVA
(details of this:
https://django-auth-ldap.readthedocs.io/en/latest/permissions.html#group-mi…)
but examples of this given in 1.5 docs (oldest I've found) don't
appear to work in 1.3
d. I believe 1.7 is in Buster, so is it a case of moving to buster?
1. not urgent to migrate to buster now.
6. [Steve] location for documentation
a. Needs an issue to track this.
1. avoid the rabbit hole of optimising what is left.
b. certain elements need to go into the website from lava-server-doc
1. Release docs
2. Development process
3. Design overview
4. Keep development-intro and update links.
c. update all the links in lava-server-doc.
d. website will move to Sphinx instead of Pelican.
e. a lot of docs are using Sphinx RST.
1. there will be some conversion needed for items which are currently in Google Docs.
f. move design meeting doc to the wiki?
1. no - interactive shared-editing feature is very useful
2. we can simply copy text to the wiki after the fact
3. Page per meeting in the wiki with index links
4. Consider the new document as public by the end of the meeting.
7. [Remi] MuxPi rPi zero. build raw images.
a. Guestfish based on GuestFS
b. inside docker? need /boot and /lib/modules volumes. Easy scriptable
way to do things.
c. Should we publish our images and a way to rebuild them?
1. Let's not do this as an automated public build
a. Neil to close https://git.lavasoftware.org/lava/functional-tests/issues/7 (Use
GitLab CI to auto-build functional test image files for
files.lavasoftware.org) on the basis that we are not an image
building service.
2. Must document how our images are built
a. files.lavasoftware.org already includes copies of the
scripts which were used to create the Debian standard image files.
The LAVA design meeting is held weekly, every Wednesday at 13:00 to
14:00 UTC using Google Hangouts Meet: https://meet.google.com/qre-rgen-zwc
Feel free to comment here or join us directly in the meeting.
Cheers,
--
Steve McIntyre steve.mcintyre(a)linaro.org
<http://www.linaro.org/> Linaro.org | Open source software for ARM SoCs
Hi folks,
We held our regular weekly design meeting today via Hangout. Summary
of discussion:
1. [Steve] Layoffs in Linaro affecting the team
2. [Dean] A user has noticed that their jobs running the same process multiple times in a test shell have noticed that the latter iterations take much longer than earlier iterations, even though they should take the same amount of time.
1. They noticed the "Listened to connection for namespace '<NAMESPACE>' done" message appeared a lot.
2. shell.py has the following noted around this debug log:
1. # With an higher timeout, this can have a big impact on
2. # the performances of the overall loop.
1. Is there any known issues with the read feedback checks?
2. Is there any reason why this step would take longer over the course of a job? If so, is there anything we can do to mitigate this?
3. Is there any settings we can tweak to adjust performance?
4. Any further information that might help us investigate this further?
5. Maybe a pexpect problem? Changes in this area happened in 20l8.7, Dean is using 2018.5
3. [Rémi] lavafed labs
1. Neil’s lab is off
2. ARM? In process, but may take a while - needs IT involvement to open up ports etc.
3. [Rémi] Contact lava users:
1. Collabora? [done]
2. Baylibre? [done]
3. ST?
1. [Rémi] Add matt’s lab
2. [Steve] Can set up some stuff if needed (Mustang? BBB? Panda? Maybe grab old boards from Neil?)
4. [Rémi] LAVA 2019.2 release
1. When ?
2. Start the process Thu 28th, but we're not going to get it all done then
3. Expect to finish Monday 4th?
4. Need to document what functional tests we're doing manually for now (list was in Neil's head!)
5. We want to get to the point where lavafed etc. make this obsolete
1. Will need to actually work out useful tests for all the devices!
The LAVA design meeting is held weekly, every Wednesday at 13:00 to
14:00 UTC using Google Hangouts Meet: https://meet.google.com/qre-rgen-zwc
Feel free to comment here or join us directly in the meeting.
Cheers,
--
Steve McIntyre steve.mcintyre(a)linaro.org
<http://www.linaro.org/> Linaro.org | Open source software for ARM SoCs
Starting lava-run in a dedicated container
===============================
https://git.lavasoftware.org/lava/lava/issues/114
Work has already been done for the device support in this area. The
intention is that the admins can create static udev rules which add
the device to the correct container. To achieve this, the name of the
container needs to be made available to the udev rule. The plan will
be for lava-slave to create a file in /var/cache/lava-slave/. The file
will be named according to the device hostname and will contain the
container name plus some other useful data, e.g. the job ID. the udev
rule can then parse this file to know which container to use. The udev
rule would be triggered on each ADD. If lava-slave is run in a docker,
/var/cache/lava-slave/ would need to be made available to that docker
as a volume. (LAVA specifies the name of the container in advance.)
Test job to control the image to be used
-----------------------------------------------------
The LAVA documentation will need to recommend using official LAVA
Software Community Project docker images, some teams will want to
build & use images based on those to help include tools which take a
long time to install / build. LAVA will not be able to check the
provenance of the images being used, this is a test writer problem.
LAVA will need to clearly output the docker image being executed and
retain that in the permanent test job log output or result metadata.
lava-slave will need to handle "latest" URLs and turn it into a
reproducible ID using docker inspect to get the image ID. This is to
be done by passing an argument to a new lava-run option.
LAVA already outputs the version of lava-dispatcher (lava-run) in use
(and other tools) and this will continue with docker.
Admins will continue to control certificates, e.g. ZMQ
Capabilities may need to be added too.
LAVA runs the container with --rm (possibly with --force too).
Releases and milestones
===================
We have created a 2019.03 milestone which is expected to contain the
work on lava-run in a separate container above. We have moved a number
of issues and merge requests from 2019.02 into 2019.03 (or sometimes
into .05) to get to a feasible number of changes to release 2019.02.
Adding env variables to the test shell
=============================
In combination with https://git.lavasoftware.org/lava/lava/issues/228
we will be looking at making device dictionary elements and some
environment variables available inside the Lava-Test Test Definition
1.0 overlays using a test shell helper. Test writers are advised not
to rely on device-specific information unless essential.
--
Neil Williams
=============
neil.williams(a)linaro.org
http://www.linux.codehelp.co.uk/
We've come across a problem with LXC test jobs on a network which only
supports IPv6 but it's hard to replicate (this network is at a
conference, just for a few days).
Has anyone already looked at IPv6 and LXC? Are there other IPv6 issues
with LAVA?
I've found this guide but to be able to document this, the content
needs to come from someone who can replicate and test the actual
problems.
https://techoverflow.net/2018/06/06/routing-public-ipv6-addresses-to-your-l…
--
Neil Williams
=============
neil.williams(a)linaro.org
http://www.linux.codehelp.co.uk/
Hi folks,
We held our regular weekly design meeting yesterday via Hangout. Brief
summary of discussion:
* [Stevan] Further discsion about the permissions model code
+ He followed up on -devel with more details [1]
* [Steve] Sprint - where are we up to?
+ Hanging fire for now, waiting for OK from Linaro management on
budget etc.
[1] https://lists.lavasoftware.org/pipermail/lava-devel/2019-January/000017.html
The LAVA design meeting is held weekly, every Wednesday at 13:00 to
14:00 UTC using Google Hangouts Meet: https://meet.google.com/qre-rgen-zwc
Feel free to comment here or join us directly in the meeting.
Cheers,
--
Steve McIntyre steve.mcintyre(a)linaro.org
<http://www.linaro.org/> Linaro.org | Open source software for ARM SoCs
Dear all,
I'd like to propose a couple of designs and thoughts on the
authorization subject and get the feedback from the community and LAVA
core team in the process. Link to the issue:
https://git.lavasoftware.org/lava/lava/issues/201
As stated before, the main reason behind the authorization revamp is
that checking the device types accessible to the specific user is not
optimized in the slightest and does not scale. The specific device type
authorization is in some cases also used in device authorization which
also adds to the complexity.
Now, while the problems at hand can be addressed directly to mitigate
the scalability, it'd be also smart to do something about the
django-restricted-resource library. Why? Well, django already has a
perfectly sound authentication model which can also be used for a
per-object access with a little effort so essentially it means much less
complexity, code and all the benefits that go along with these.
There's two approaches here: write our own authentication backend and
use already existing django-guardian project (available as debian
package) which is well maintained. I think having our own backend is
slightly better solution just because there's no need to add more
complexity with a third-party package then we need and it seems to me
that our needs can be addressed with a small code base.
Once that's in place, the proposal is to address the device-type
authorization with a cached value (currently, if a user can access any
device from a specific device type, it can access the device type as
well), meaning that we store the device-type visibility as a separate
permission automagically so that the check for that can be performed
without checking all the device permission in that device type. We'd
also remove the complexity in the various frontend views regarding the
device/device type visibility without changing the behavior.
All comments/ideas welcome.
Cheers,
--
Stevan Radaković | LAVA Engineer
Linaro.org <www.linaro.org> │ Open source software for ARM SoCs
I replied to Dean on this in an email thread, but thought I’d capture the information in the public forum.
I’ve got a device passthrough script that doesn’t require the docker container to be run in privileged mode: https://git.linaro.org/lava/lava-lab.git/tree/shared/lab-scripts/docker/pas… <https://git.linaro.org/lava/lava-lab.git/tree/shared/lab-scripts/docker/pas…>
I can force a trigger event by setting up an appropriate udev rule, e.g. I have the following in my test rig /etc/udev/rules.d/100-lava-docker.rules:
ACTION=="add", ATTR{serial}=="21F6C6B800314249", RUN+="/root/docker/passthrough -d 21F6C6B800314249 -i lava-dispatcher-01-2018-11"
ACTION=="add", ATTR{serial}=="28B114C800334FA2", RUN+="/root/docker/passthrough -d 28B114C800334FA2 -i lava-dispatcher-02-2018-11"
Once set up, just do:
udevadm control --reload-rules
This will then allow hot plug for fastboot type devices.
My bash script for running up a docker dispatcher is as follows:
#!/bin/bash
set -e
set -x
docker run \
-v /dev/bus/usb:/dev/bus/usb \
-v /dev:/dev \
-v /dev/serial:/dev/serial \
-e "DISPATCHER_HOSTNAME=--hostname=lava-dispatcher-01-2018.11" \
-e "LOGGER_URL=tcp://172.16.1.1:5555 <tcp://172.16.1.1:5555>" \
-e "MASTER_URL=tcp://172.16.1.1:5556 <tcp://172.16.1.1:5556>" \
--net lavanet --ip 172.16.2.1 \
--name lava-dispatcher-01-2018-11 \
-it lavasoftware/amd64-lava-dispatcher:2018.11
With this, static serial type devices will just be available - i.e. Serial devices, Arm Energy Probes etc.
I am in the process of writing start scripts in Python that hides all the nasty work. I actually got this working using the Python docker library yesterday, now working on making it friendly.
(sorry if I’m preaching to the choir here, just trying to give as much info as possible!)
Hope this helps
Dave
----------------
Dave Pigott
LAVA Lab Lead
Linaro Ltd
t: (+44) (0) 1223 400063
Hi folks,
We held our regular weekly design meeting today via Hangout. Brief
summary of discussion:
* [Stevan] Publish the sprint topics to the -devel mailing list?
+ Will do once we have the new date worked out
* [Stevan] Access rights code investigation
+ He'll follow up on -devel with more details
* [Dean B] Post about the docker passthrough stuff on -devel
+ Just happened - see
https://lists.lavasoftware.org/pipermail/lava-devel/2019-January/000010.html
* [Steve] 2019.1 Release planned for Thu 24th
+ Rémi and Steve doing the release, Neil advising/observing
The LAVA design meeting is held weekly, every Wednesday at 13:00 to
14:00 UTC using Google Hangouts Meet: https://meet.google.com/qre-rgen-zwc
Feel free to comment here or join us directly in the meeting.
Cheers,
--
Steve McIntyre steve.mcintyre(a)linaro.org
<http://www.linaro.org/> Linaro.org | Open source software for ARM SoCs
Dear All,
We've been looking into running some lava dispatchers in containers from
the official lava dispatcher images, and we've been running into issues
when interfacing these with real DUTs.
This is mostly to do with USB passthrough from the physical server to the
dispatcher container. Without any USB device passthrough, lava cannot
interface with the DUT.
We are under the impression there is some scripts or code (which may still
be a work in progress) that may help us with this situation. We're
wondering if we can get our hands on this and test this against our devices
and see if it covers all our use cases.
Our use cases include a mix of static devices (present all the time),
dynamic devices (only present under some circumstances, like the DUT is
powered on). We also have cases where the device will be present on job
startup, but renumerates on certain conditions, which means we may need to
combine techniques for static and dynamic devices. We'd also like to test
different usages of these devices within containers after the passthrough
happens to check that each bit of tooling can work correctly
(adb/fastboot/mounting filesystems/etc).
Thanks,
Dean