Re: [Lava-users] Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension

10 Jun 2016

      On 10 June 2016 at 15:57, Neil Williams neil.williams@linaro.org wrote:
...
On 10 June 2016 at 14:19, Konrad Scherer Konrad.Scherer@windriver.com wrote:
...
On 2016-06-10 04:35 AM, Neil Williams wrote:
...
On 10 June 2016 at 00:15, Konrad Scherer Konrad.Scherer@windriver.com
wrote:
...
Hello,
For the 2016.4 release I had created a custom LAVA extension to add 2
commands and 2 xmlrpc methods. The extension mechanism was very
convenient.
All I had to do was install an egg on the system and restart the lava
service. When I upgraded to 2016.6 the extension did not work due to
commit
b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for
extensions".
I was not able to find a way to add my extension to INSTALLED_APPS and
register the xmlrpc methods without modifying settings/common.py and
urls.py. I looked through common.py and distro.py and could not find
support
in settings.conf for extensions. I also looked for a local_settings
import
which is referenced on the Internet as a common way to extend django, but
did not find it. If there is a way to extend LAVA without modifying the
LAVA
python code, please let me know and I will be happy to send in a
documentation patch.
It would have been nice if functionality such as the extension mechanism,
which is part of the external interface of LAVA, had gone through a
deprecation cycle. A reworked demo app showing how the new extension
mechanism works would have also been helpful.
Sorry about that - the extensions support was removed amongst a lot of
other changes to (start to) streamline the codebase and remove
historical or legacy code, particularly at the lower levels where it
affects all operations. It's part of the ongoing work for LAVA V2,
including the upcoming move to require django 1.8. The current
migration is complex and code sometimes needs to be removed or
adapted. Extensions cannot be tested as part of the development of
LAVA, so it is preferable for changes to be made inside the LAVA
codebase and extensions have therefore been removed. There wasn't a
deprecation cycle for this step, so our apologies for that.
The migration to LAVA V2 increases the number and types of methods
available to export data from LAVA - XML-RPC, REST api and ZMQ. The
intention is that customisation will take place in custom frontends
which abstract the generic LAVA data to make it relevant for
particular teams. If there are extra calls which need to be exported,
these need to be contributed to the LAVA codebase so that these can be
tested and maintained.
We've had significant problems when lava has been mixed with pip
installs or third-party eggs - introducing instability, failures and
errors. The use of any non-packaged code with lava, other than through
the existing API, is not supported.
Please talk to us about what the commands and methods were trying to
achieve - we are open to having more calls available as long as the
code being used can be properly tested within the codebase.
Thank you for the detailed response. The extension code was complex and I
understand why it was removed. Ideally the LAVA API would accommodate my
requirements so it may help to understand why I wrote the extension and what
it currently does:

A command to create a pipeline worker and a set of qemu devices.

Extending settings.conf to be able to add an app to INSTALLED_APPS would be
the simplest way to enable this.
Sorry, that doesn't scale. Pipeline workers typically have no
connection to the django settings - the only connection with the
master is over ZMQ. There is support for adding a pipeline worker but
creating devices is an admin task that can be very specific to that
instance. The worker may also need encryption configured which is
per-instance.
A one off command during initial setup of an instance has been
considered but that would not run once the instance had been
configured.
...
But why did I create the command? I am running a Mesos[1] cluster with 50+
x86 servers and am experimenting with using the cluster to run qemu/lxc
tests.
So let LAVA have a server outside the cluster and a couple of pipeline
workers then the workers can communicate with the cluster to run the
tests. No need for temporary workers. Note: temporary devices are also
a bad idea for the database - that is why we created primary and
secondary connections in V2.
...
The idea is that the LAVA server will be connected to both hardware
in a lab and to the mesos cluster and users can run tests for qemu devices
on demand using available mesos cluster resources.
I don't see why this wouldn't work in a standard LAVA arrangement of
an external box running lava-server and a few workers running
lava-slave. What's needed is a way for a permanent worker to
communicate with the cluster to run commands. That could be as simple
as using the existing support for primary SSH connections. With the
added benefit that because your cluster resource is temporary, you
don't have the usual concerns about persistence associated with
primary connections.
https://staging.validation.linaro.org/static/docs/v2/dispatcher-design.html#...
...
My proof of concept is
working, i.e. a custom mesos scheduler polls the LAVA server for submitted
qemu jobs and starts lava workers dynamically to run the jobs.
Please do not poll. There will be ways to get information from LAVA
using push notifications but polling adds load for no good reason.
Having two schedulers is adding unnecessary complexity.
This would be much better with static workers, not dynamic. Then
define a device-type which spins up the device - not the worker.
Presumably you have some connection method to the device, that could
be used to drive the test in much the same way as LAVA currently
drives ARM boards.That way, the LAVA scheduler can manage jobs itself
(which is what it is designed to do) and the workers can run the jobs
on the devices. Drop the other scheduler and just have a process that
submits to LAVA.
Dynamic workers is a very bad idea and is likely to lead to problems
as the LAVA code develops, tying the worker and the master still
closer together. We already have enough problems with the dynamic
devices used in V1.
...

A command that integrates LAVA with our internal lab management software.

Unfortunately LAVA does not own the lab devices and when a user reserves a
lab target using this management service, this command is used to mark the
device offline in LAVA and back online when the lab target is unreserved.
That is just too specific to that lab. The usual approach there
involves tools like puppet, ansible, salt etc.
...

XMLRPC methods to mark devices online and offline. I noticed that when a

pipeline worker disconnects from the server, its attached devices are not
transitioned offline and the scheduler will attempt to reserve and run jobs
on these devices. With the ephemeral workers this does not work, so I added
the xmlrpc methods to enable and disable only one qemu device per worker
when the worker starts and stops.
Included in 2016.6 - see
https://lists.linaro.org/pipermail/lava-users/2016-April/000068.html
which resulted in
https://git.linaro.org/lava/lava-server.git/commit/2091ac9c3f9305f5a4e083d15...
and a few related commits.
However, that support is NOT intended for support of dynamic workers,
it is intended to support temporary access to the devices outside of
LAVA.
...
Fundamentally LAVA assumes it owns the devices and that makes it hard to
integrate into existing infrastructure. I would be interested in any ideas
you have that could make this easier.
LAVA does indeed own the devices and that needs to stay. Temporary
reassignment of devices can be done but workers need to be permanent
objects in the database. We do expect to add code to handle workers
which go offline, it's not included yet.
Sorry, but this design makes no sense in terms of how LAVA can be
supported. It would seem to waste the majority of the support
available in LAVA, I fail to see how LAVA can provide benefit to a CI
loop based on this situation. VMs for demo usage has been considered
but only as demos.
...
It makes it very easy to setup
development instances of lava-server and test upgrades, etc. If you are
interested, I can put them on Github. Let me know what would be the best way
to share them.
IMHO docker is not a suitable way to test LAVA - full virtualisation
is the way that LAVA is tested upstream, along with dedicated hardware
for the long term testing.
...
Thank you for your time. I look forward to working with you to make LAVA
better.
I'm not sure if there is a misconception about what LAVA is able to
achieve or whether you are just trying to force LAVA into a niche you
have already defined. Continuing with the path you've outlined is
likely to cause breakage. I suggest that you reconsider your design -
especially any idea about temporary workers. The churn resulting from
that is simply not worth considering.
Thinking more about this - is this actually a case where Jenkins or
Travis would be better suited? (We use Jenkins for some operations in
LAVA development, precisely because it can spin up the necessary - x86
- resources on demand.)
LAVA is better with tests that are dependent on particular kinds of
permanent hardware and although there is QEMU support, LAVA is
primarily about providing access to unusual hardware to test whether
systems boot and run cleanly. i.e. where the architecture and physical
aspects of the hardware make a significant difference to the testing.
A mismatch at that sort of level is only going to make things harder
later. The driving force for LAVA development could be at odds with
your needs.
For LAVA, with primary and secondary connections, the "device" is just
an endpoint that can run tests - it doesn't matter if the actual
resource has only just been created, as long as it is ready when LAVA
needs to make a connection. The thing itself is just whatever box or
resource is in charge of running the tests - and one of those test
actions is simply to run QEMU.
Some of our virtualisation testing in LAVA uses an arm64 board as the
device and then uses secondary connections to run tests in VMs started
by that device. No need for temporary workers or temporary devices.
Make the server, workers and devices permanent and then let the
connections be to whatever has been made available. However, do please
take time first to consider if your needs actually align with what
LAVA is trying to achieve.
-- 

Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Lava-users] Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension