On Fri, Jun 10, 2016 at 8:20 AM, Neil Williams <neil.williams@linaro.org> wrote:

On 10 June 2016 at 15:57, Neil Williams <neil.williams@linaro.org> wrote:
> On 10 June 2016 at 14:19, Konrad Scherer <Konrad.Scherer@windriver.com> wrote:
>> On 2016-06-10 04:35 AM, Neil Williams wrote:
>>>
>>> On 10 June 2016 at 00:15, Konrad Scherer <Konrad.Scherer@windriver.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> For the 2016.4 release I had created a custom LAVA extension to add 2
>>>> commands and 2 xmlrpc methods. The extension mechanism was very
>>>> convenient.
>>>> All I had to do was install an egg on the system and restart the lava
>>>> service. When I upgraded to 2016.6 the extension did not work due to
>>>> commit
>>>> b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for
>>>> extensions".
>>>>
>>>> I was not able to find a way to add my extension to INSTALLED_APPS and
>>>> register the xmlrpc methods without modifying settings/common.py and
>>>> urls.py. I looked through common.py and distro.py and could not find
>>>> support
>>>> in settings.conf for extensions. I also looked for a local_settings
>>>> import
>>>> which is referenced on the Internet as a common way to extend django, but
>>>> did not find it. If there is a way to extend LAVA without modifying the
>>>> LAVA
>>>> python code, please let me know and I will be happy to send in a
>>>> documentation patch.
>>>>
>>>> It would have been nice if functionality such as the extension mechanism,
>>>> which is part of the external interface of LAVA, had gone through a
>>>> deprecation cycle. A reworked demo app showing how the new extension
>>>> mechanism works would have also been helpful.
>>>
>>>
>>> Sorry about that - the extensions support was removed amongst a lot of
>>> other changes to (start to) streamline the codebase and remove
>>> historical or legacy code, particularly at the lower levels where it
>>> affects all operations. It's part of the ongoing work for LAVA V2,
>>> including the upcoming move to require django 1.8. The current
>>> migration is complex and code sometimes needs to be removed or
>>> adapted. Extensions cannot be tested as part of the development of
>>> LAVA, so it is preferable for changes to be made inside the LAVA
>>> codebase and extensions have therefore been removed. There wasn't a
>>> deprecation cycle for this step, so our apologies for that.
>>>
>>> The migration to LAVA V2 increases the number and types of methods
>>> available to export data from LAVA - XML-RPC, REST api and ZMQ. The
>>> intention is that customisation will take place in custom frontends
>>> which abstract the generic LAVA data to make it relevant for
>>> particular teams. If there are extra calls which need to be exported,
>>> these need to be contributed to the LAVA codebase so that these can be
>>> tested and maintained.
>>>
>>> We've had significant problems when lava has been mixed with pip
>>> installs or third-party eggs - introducing instability, failures and
>>> errors. The use of any non-packaged code with lava, other than through
>>> the existing API, is not supported.
>>>
>>> Please talk to us about what the commands and methods were trying to
>>> achieve - we are open to having more calls available as long as the
>>> code being used can be properly tested within the codebase.
>>
>>
>> Thank you for the detailed response. The extension code was complex and I
>> understand why it was removed. Ideally the LAVA API would accommodate my
>> requirements so it may help to understand why I wrote the extension and what
>> it currently does:
>>
>> 1) A command to create a pipeline worker and a set of qemu devices.
>> Extending settings.conf to be able to add an app to INSTALLED_APPS would be
>> the simplest way to enable this.
>
> Sorry, that doesn't scale. Pipeline workers typically have no
> connection to the django settings - the only connection with the
> master is over ZMQ. There is support for adding a pipeline worker but
> creating devices is an admin task that can be very specific to that
> instance. The worker may also need encryption configured which is
> per-instance.
>
> A one off command during initial setup of an instance has been
> considered but that would not run once the instance had been
> configured.
>
>> But why did I create the command? I am running a Mesos[1] cluster with 50+
>> x86 servers and am experimenting with using the cluster to run qemu/lxc
>> tests.
>
> So let LAVA have a server outside the cluster and a couple of pipeline
> workers then the workers can communicate with the cluster to run the
> tests. No need for temporary workers. Note: temporary devices are also
> a bad idea for the database - that is why we created primary and
> secondary connections in V2.
>
>>The idea is that the LAVA server will be connected to both hardware
>> in a lab and to the mesos cluster and users can run tests for qemu devices
>> on demand using available mesos cluster resources.
>
> I don't see why this wouldn't work in a standard LAVA arrangement of
> an external box running lava-server and a few workers running
> lava-slave. What's needed is a way for a permanent worker to
> communicate with the cluster to run commands. That could be as simple
> as using the existing support for primary SSH connections. With the
> added benefit that because your cluster resource is temporary, you
> don't have the usual concerns about persistence associated with
> primary connections.
>
> https://staging.validation.linaro.org/static/docs/v2/dispatcher-design.html#persistence
>
>> My proof of concept is
>> working, i.e. a custom mesos scheduler polls the LAVA server for submitted
>> qemu jobs and starts lava workers dynamically to run the jobs.
>
> Please do not poll. There will be ways to get information from LAVA
> using push notifications but polling adds load for no good reason.
> Having two schedulers is adding unnecessary complexity.
>
> This would be much better with static workers, not dynamic. Then
> define a device-type which spins up the device - not the worker.
> Presumably you have some connection method to the device, that could
> be used to drive the test in much the same way as LAVA currently
> drives ARM boards.That way, the LAVA scheduler can manage jobs itself
> (which is what it is designed to do) and the workers can run the jobs
> on the devices. Drop the other scheduler and just have a process that
> submits to LAVA.
>
> Dynamic workers is a very bad idea and is likely to lead to problems
> as the LAVA code develops, tying the worker and the master still
> closer together. We already have enough problems with the dynamic
> devices used in V1.
>
>> 2) A command that integrates LAVA with our internal lab management software.
>> Unfortunately LAVA does not own the lab devices and when a user reserves a
>> lab target using this management service, this command is used to mark the
>> device offline in LAVA and back online when the lab target is unreserved.
>
> That is just too specific to that lab. The usual approach there
> involves tools like puppet, ansible, salt etc.
>
>> 3) XMLRPC methods to mark devices online and offline. I noticed that when a
>> pipeline worker disconnects from the server, its attached devices are not
>> transitioned offline and the scheduler will attempt to reserve and run jobs
>> on these devices. With the ephemeral workers this does not work, so I added
>> the xmlrpc methods to enable and disable only one qemu device per worker
>> when the worker starts and stops.
>
> Included in 2016.6 - see
> https://lists.linaro.org/pipermail/lava-users/2016-April/000068.html
> which resulted in
> https://git.linaro.org/lava/lava-server.git/commit/2091ac9c3f9305f5a4e083d156c04d1f098ac2fa
> and a few related commits.
>
> However, that support is NOT intended for support of dynamic workers,
> it is intended to support temporary access to the devices outside of
> LAVA.
>
>> Fundamentally LAVA assumes it owns the devices and that makes it hard to
>> integrate into existing infrastructure. I would be interested in any ideas
>> you have that could make this easier.
>
> LAVA does indeed own the devices and that needs to stay. Temporary
> reassignment of devices can be done but workers need to be permanent
> objects in the database. We do expect to add code to handle workers
> which go offline, it's not included yet.
>
> Sorry, but this design makes no sense in terms of how LAVA can be
> supported. It would seem to waste the majority of the support
> available in LAVA, I fail to see how LAVA can provide benefit to a CI
> loop based on this situation. VMs for demo usage has been considered
> but only as demos.
>
>> It makes it very easy to setup
>> development instances of lava-server and test upgrades, etc. If you are
>> interested, I can put them on Github. Let me know what would be the best way
>> to share them.
>
> IMHO docker is not a suitable way to test LAVA - full virtualisation
> is the way that LAVA is tested upstream, along with dedicated hardware
> for the long term testing.
>
>> Thank you for your time. I look forward to working with you to make LAVA
>> better.
>
> I'm not sure if there is a misconception about what LAVA is able to
> achieve or whether you are just trying to force LAVA into a niche you
> have already defined. Continuing with the path you've outlined is
> likely to cause breakage. I suggest that you reconsider your design -
> especially any idea about temporary workers. The churn resulting from
> that is simply not worth considering.

Thinking more about this - is this actually a case where Jenkins or
Travis would be better suited? (We use Jenkins for some operations in
LAVA development, precisely because it can spin up the necessary - x86
- resources on demand.)

LAVA is better with tests that are dependent on particular kinds of
permanent hardware and although there is QEMU support, LAVA is
primarily about providing access to unusual hardware to test whether
systems boot and run cleanly. i.e. where the architecture and physical
aspects of the hardware make a significant difference to the testing.
A mismatch at that sort of level is only going to make things harder
later. The driving force for LAVA development could be at odds with
your needs.

For LAVA, with primary and secondary connections, the "device" is just
an endpoint that can run tests - it doesn't matter if the actual
resource has only just been created, as long as it is ready when LAVA
needs to make a connection. The thing itself is just whatever box or
resource is in charge of running the tests - and one of those test
actions is simply to run QEMU.

Some of our virtualisation testing in LAVA uses an arm64 board as the
device and then uses secondary connections to run tests in VMs started
by that device. No need for temporary workers or temporary devices.

Make the server, workers and devices permanent and then let the
connections be to whatever has been made available. However, do please
take time first to consider if your needs actually align with what
LAVA is trying to achieve.

--

Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/
_______________________________________________
Lava-users mailing list
Lava-users@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lava-users