On 06/10/2016 11:20 AM, Neil Williams wrote:
On 10 June 2016 at 15:57, Neil Williams neil.williams@linaro.org wrote:
On 10 June 2016 at 14:19, Konrad Scherer Konrad.Scherer@windriver.com wrote:
On 2016-06-10 04:35 AM, Neil Williams wrote:
On 10 June 2016 at 00:15, Konrad Scherer Konrad.Scherer@windriver.com wrote:
Hello,
For the 2016.4 release I had created a custom LAVA extension to add 2 commands and 2 xmlrpc methods. The extension mechanism was very convenient. All I had to do was install an egg on the system and restart the lava service. When I upgraded to 2016.6 the extension did not work due to commit b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for extensions".
I was not able to find a way to add my extension to INSTALLED_APPS and register the xmlrpc methods without modifying settings/common.py and urls.py. I looked through common.py and distro.py and could not find support in settings.conf for extensions. I also looked for a local_settings import which is referenced on the Internet as a common way to extend django, but did not find it. If there is a way to extend LAVA without modifying the LAVA python code, please let me know and I will be happy to send in a documentation patch.
It would have been nice if functionality such as the extension mechanism, which is part of the external interface of LAVA, had gone through a deprecation cycle. A reworked demo app showing how the new extension mechanism works would have also been helpful.
Sorry about that - the extensions support was removed amongst a lot of other changes to (start to) streamline the codebase and remove historical or legacy code, particularly at the lower levels where it affects all operations. It's part of the ongoing work for LAVA V2, including the upcoming move to require django 1.8. The current migration is complex and code sometimes needs to be removed or adapted. Extensions cannot be tested as part of the development of LAVA, so it is preferable for changes to be made inside the LAVA codebase and extensions have therefore been removed. There wasn't a deprecation cycle for this step, so our apologies for that.
The migration to LAVA V2 increases the number and types of methods available to export data from LAVA - XML-RPC, REST api and ZMQ. The intention is that customisation will take place in custom frontends which abstract the generic LAVA data to make it relevant for particular teams. If there are extra calls which need to be exported, these need to be contributed to the LAVA codebase so that these can be tested and maintained.
We've had significant problems when lava has been mixed with pip installs or third-party eggs - introducing instability, failures and errors. The use of any non-packaged code with lava, other than through the existing API, is not supported.
Please talk to us about what the commands and methods were trying to achieve - we are open to having more calls available as long as the code being used can be properly tested within the codebase.
Thank you for the detailed response. The extension code was complex and I understand why it was removed. Ideally the LAVA API would accommodate my requirements so it may help to understand why I wrote the extension and what it currently does:
- A command to create a pipeline worker and a set of qemu devices.
Extending settings.conf to be able to add an app to INSTALLED_APPS would be the simplest way to enable this.
Sorry, that doesn't scale. Pipeline workers typically have no connection to the django settings - the only connection with the master is over ZMQ. There is support for adding a pipeline worker but creating devices is an admin task that can be very specific to that instance. The worker may also need encryption configured which is per-instance.
A one off command during initial setup of an instance has been considered but that would not run once the instance had been configured.
But why did I create the command? I am running a Mesos[1] cluster with 50+ x86 servers and am experimenting with using the cluster to run qemu/lxc tests.
So let LAVA have a server outside the cluster and a couple of pipeline workers then the workers can communicate with the cluster to run the tests. No need for temporary workers. Note: temporary devices are also a bad idea for the database - that is why we created primary and secondary connections in V2.
The idea is that the LAVA server will be connected to both hardware in a lab and to the mesos cluster and users can run tests for qemu devices on demand using available mesos cluster resources.
I don't see why this wouldn't work in a standard LAVA arrangement of an external box running lava-server and a few workers running lava-slave. What's needed is a way for a permanent worker to communicate with the cluster to run commands. That could be as simple as using the existing support for primary SSH connections. With the added benefit that because your cluster resource is temporary, you don't have the usual concerns about persistence associated with primary connections.
https://staging.validation.linaro.org/static/docs/v2/dispatcher-design.html#...
My proof of concept is working, i.e. a custom mesos scheduler polls the LAVA server for submitted qemu jobs and starts lava workers dynamically to run the jobs.
Please do not poll. There will be ways to get information from LAVA using push notifications but polling adds load for no good reason. Having two schedulers is adding unnecessary complexity.
This would be much better with static workers, not dynamic. Then define a device-type which spins up the device - not the worker. Presumably you have some connection method to the device, that could be used to drive the test in much the same way as LAVA currently drives ARM boards.That way, the LAVA scheduler can manage jobs itself (which is what it is designed to do) and the workers can run the jobs on the devices. Drop the other scheduler and just have a process that submits to LAVA.
Dynamic workers is a very bad idea and is likely to lead to problems as the LAVA code develops, tying the worker and the master still closer together. We already have enough problems with the dynamic devices used in V1.
- A command that integrates LAVA with our internal lab management software.
Unfortunately LAVA does not own the lab devices and when a user reserves a lab target using this management service, this command is used to mark the device offline in LAVA and back online when the lab target is unreserved.
That is just too specific to that lab. The usual approach there involves tools like puppet, ansible, salt etc.
- XMLRPC methods to mark devices online and offline. I noticed that when a
pipeline worker disconnects from the server, its attached devices are not transitioned offline and the scheduler will attempt to reserve and run jobs on these devices. With the ephemeral workers this does not work, so I added the xmlrpc methods to enable and disable only one qemu device per worker when the worker starts and stops.
Included in 2016.6 - see https://lists.linaro.org/pipermail/lava-users/2016-April/000068.html which resulted in https://git.linaro.org/lava/lava-server.git/commit/2091ac9c3f9305f5a4e083d15... and a few related commits.
However, that support is NOT intended for support of dynamic workers, it is intended to support temporary access to the devices outside of LAVA.
Fundamentally LAVA assumes it owns the devices and that makes it hard to integrate into existing infrastructure. I would be interested in any ideas you have that could make this easier.
LAVA does indeed own the devices and that needs to stay. Temporary reassignment of devices can be done but workers need to be permanent objects in the database. We do expect to add code to handle workers which go offline, it's not included yet.
Sorry, but this design makes no sense in terms of how LAVA can be supported. It would seem to waste the majority of the support available in LAVA, I fail to see how LAVA can provide benefit to a CI loop based on this situation. VMs for demo usage has been considered but only as demos.
It makes it very easy to setup development instances of lava-server and test upgrades, etc. If you are interested, I can put them on Github. Let me know what would be the best way to share them.
IMHO docker is not a suitable way to test LAVA - full virtualisation is the way that LAVA is tested upstream, along with dedicated hardware for the long term testing.
Thank you for your time. I look forward to working with you to make LAVA better.
I'm not sure if there is a misconception about what LAVA is able to achieve or whether you are just trying to force LAVA into a niche you have already defined. Continuing with the path you've outlined is likely to cause breakage. I suggest that you reconsider your design - especially any idea about temporary workers. The churn resulting from that is simply not worth considering.
Thinking more about this - is this actually a case where Jenkins or Travis would be better suited? (We use Jenkins for some operations in LAVA development, precisely because it can spin up the necessary - x86
- resources on demand.)
LAVA is better with tests that are dependent on particular kinds of permanent hardware and although there is QEMU support, LAVA is primarily about providing access to unusual hardware to test whether systems boot and run cleanly. i.e. where the architecture and physical aspects of the hardware make a significant difference to the testing. A mismatch at that sort of level is only going to make things harder later. The driving force for LAVA development could be at odds with your needs.
For LAVA, with primary and secondary connections, the "device" is just an endpoint that can run tests - it doesn't matter if the actual resource has only just been created, as long as it is ready when LAVA needs to make a connection. The thing itself is just whatever box or resource is in charge of running the tests - and one of those test actions is simply to run QEMU.
Some of our virtualisation testing in LAVA uses an arm64 board as the device and then uses secondary connections to run tests in VMs started by that device. No need for temporary workers or temporary devices.
Make the server, workers and devices permanent and then let the connections be to whatever has been made available. However, do please take time first to consider if your needs actually align with what LAVA is trying to achieve.
I have been researching how a solution with this setup would look. I have reread the docs and looked through the code and I am confused. We have a working V1 server with a V1 device with the following custom commands:
pre_connect_command = /pre_connect_script.py %(hostname)s connection_command = /connection_script.py %(hostname)s power_off_cmd = /power_off_script.py %(hostname)s
The connection script initiates a telnet connection to the console of the device. Is it possible to do the something similar with V2?
There are a few V2 devices in lava-dispatcher with custom pdu commands and a comment in lava_dispatcher/pipeline/test/sample_jobs/basics.yaml:
- commands: # list of pre-defined scripts installed by the server administrator at # /etc/lava-server/commands.d/* to be invoked # # Each called script will have the data in the job definition (+ plus # the actual device where the test is running) passed in by environment # variables - power-off
I can't find any code that implements running custom commands or injects values into a script environment. I created a device with some custom commands, but I couldn't figure out the correct deploy and boot methods to invoke those scripts.
Did I miss something or should I stick with V1 for now?
Thank you for your time.