Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension

List overview All Threads
Download

newer

older

Looking for suggestions for...

Fwd: pipeline vland help

Konrad Scherer

9 Jun 2016 9 Jun '16

11:15 p.m.

Hello,

For the 2016.4 release I had created a custom LAVA extension to add 2 commands and 2 xmlrpc methods. The extension mechanism was very convenient. All I had to do was install an egg on the system and restart the lava service. When I upgraded to 2016.6 the extension did not work due to commit b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for extensions".

I was not able to find a way to add my extension to INSTALLED_APPS and register the xmlrpc methods without modifying settings/common.py and urls.py. I looked through common.py and distro.py and could not find support in settings.conf for extensions. I also looked for a local_settings import which is referenced on the Internet as a common way to extend django, but did not find it. If there is a way to extend LAVA without modifying the LAVA python code, please let me know and I will be happy to send in a documentation patch.

It would have been nice if functionality such as the extension mechanism, which is part of the external interface of LAVA, had gone through a deprecation cycle. A reworked demo app showing how the new extension mechanism works would have also been helpful.

Thank you for your time.

-- Konrad Scherer, MTS, Linux Products Group, Wind River

Show replies by date

Neil Williams

10 Jun 10 Jun

8:35 a.m.

New subject: [Lava-users] Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension

On 10 June 2016 at 00:15, Konrad Scherer Konrad.Scherer@windriver.com wrote:

...

Hello,

For the 2016.4 release I had created a custom LAVA extension to add 2 commands and 2 xmlrpc methods. The extension mechanism was very convenient. All I had to do was install an egg on the system and restart the lava service. When I upgraded to 2016.6 the extension did not work due to commit b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for extensions".

I was not able to find a way to add my extension to INSTALLED_APPS and register the xmlrpc methods without modifying settings/common.py and urls.py. I looked through common.py and distro.py and could not find support in settings.conf for extensions. I also looked for a local_settings import which is referenced on the Internet as a common way to extend django, but did not find it. If there is a way to extend LAVA without modifying the LAVA python code, please let me know and I will be happy to send in a documentation patch.

It would have been nice if functionality such as the extension mechanism, which is part of the external interface of LAVA, had gone through a deprecation cycle. A reworked demo app showing how the new extension mechanism works would have also been helpful.

Sorry about that - the extensions support was removed amongst a lot of other changes to (start to) streamline the codebase and remove historical or legacy code, particularly at the lower levels where it affects all operations. It's part of the ongoing work for LAVA V2, including the upcoming move to require django 1.8. The current migration is complex and code sometimes needs to be removed or adapted. Extensions cannot be tested as part of the development of LAVA, so it is preferable for changes to be made inside the LAVA codebase and extensions have therefore been removed. There wasn't a deprecation cycle for this step, so our apologies for that.

The migration to LAVA V2 increases the number and types of methods available to export data from LAVA - XML-RPC, REST api and ZMQ. The intention is that customisation will take place in custom frontends which abstract the generic LAVA data to make it relevant for particular teams. If there are extra calls which need to be exported, these need to be contributed to the LAVA codebase so that these can be tested and maintained.

We've had significant problems when lava has been mixed with pip installs or third-party eggs - introducing instability, failures and errors. The use of any non-packaged code with lava, other than through the existing API, is not supported.

Please talk to us about what the commands and methods were trying to achieve - we are open to having more calls available as long as the code being used can be properly tested within the codebase.

-- Neil Williams ============= neil.williams@linaro.org http://www.linux.codehelp.co.uk/

Konrad Scherer

1:19 p.m.

New subject: [Lava-users] Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension

On 2016-06-10 04:35 AM, Neil Williams wrote:

...

On 10 June 2016 at 00:15, Konrad Scherer Konrad.Scherer@windriver.com wrote:

...
Hello,

For the 2016.4 release I had created a custom LAVA extension to add 2 commands and 2 xmlrpc methods. The extension mechanism was very convenient. All I had to do was install an egg on the system and restart the lava service. When I upgraded to 2016.6 the extension did not work due to commit b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for extensions".

I was not able to find a way to add my extension to INSTALLED_APPS and register the xmlrpc methods without modifying settings/common.py and urls.py. I looked through common.py and distro.py and could not find support in settings.conf for extensions. I also looked for a local_settings import which is referenced on the Internet as a common way to extend django, but did not find it. If there is a way to extend LAVA without modifying the LAVA python code, please let me know and I will be happy to send in a documentation patch.

It would have been nice if functionality such as the extension mechanism, which is part of the external interface of LAVA, had gone through a deprecation cycle. A reworked demo app showing how the new extension mechanism works would have also been helpful.

Sorry about that - the extensions support was removed amongst a lot of other changes to (start to) streamline the codebase and remove historical or legacy code, particularly at the lower levels where it affects all operations. It's part of the ongoing work for LAVA V2, including the upcoming move to require django 1.8. The current migration is complex and code sometimes needs to be removed or adapted. Extensions cannot be tested as part of the development of LAVA, so it is preferable for changes to be made inside the LAVA codebase and extensions have therefore been removed. There wasn't a deprecation cycle for this step, so our apologies for that.

The migration to LAVA V2 increases the number and types of methods available to export data from LAVA - XML-RPC, REST api and ZMQ. The intention is that customisation will take place in custom frontends which abstract the generic LAVA data to make it relevant for particular teams. If there are extra calls which need to be exported, these need to be contributed to the LAVA codebase so that these can be tested and maintained.

We've had significant problems when lava has been mixed with pip installs or third-party eggs - introducing instability, failures and errors. The use of any non-packaged code with lava, other than through the existing API, is not supported.

Please talk to us about what the commands and methods were trying to achieve - we are open to having more calls available as long as the code being used can be properly tested within the codebase.

Thank you for the detailed response. The extension code was complex and I understand why it was removed. Ideally the LAVA API would accommodate my requirements so it may help to understand why I wrote the extension and what it currently does:

1) A command to create a pipeline worker and a set of qemu devices. Extending settings.conf to be able to add an app to INSTALLED_APPS would be the simplest way to enable this.

But why did I create the command? I am running a Mesos[1] cluster with 50+ x86 servers and am experimenting with using the cluster to run qemu/lxc tests. The idea is that the LAVA server will be connected to both hardware in a lab and to the mesos cluster and users can run tests for qemu devices on demand using available mesos cluster resources. My proof of concept is working, i.e. a custom mesos scheduler polls the LAVA server for submitted qemu jobs and starts lava workers dynamically to run the jobs. The trend is to use cluster managers like Mesos[1], Nomad[2] or Kubernetes[3] to utilize hardware resources more efficiently. My long term plan is to share the mesos cluster resources between services like LAVA, Jenkins and Marathon[4]. We can discuss the details of this if you are interested.

2) A command that integrates LAVA with our internal lab management software. Unfortunately LAVA does not own the lab devices and when a user reserves a lab target using this management service, this command is used to mark the device offline in LAVA and back online when the lab target is unreserved.

3) XMLRPC methods to mark devices online and offline. I noticed that when a pipeline worker disconnects from the server, its attached devices are not transitioned offline and the scheduler will attempt to reserve and run jobs on these devices. With the ephemeral workers this does not work, so I added the xmlrpc methods to enable and disable only one qemu device per worker when the worker starts and stops.

My custom xmlrpc methods are currently unauthenticated which is unacceptable for a production deployment. My only choice currently is to manually create and share tokens across all the workers which not ideal. Ideally there would be a way for a "manager" process to create and revoke tokens for an ephemeral worker to access a specific device for a limited time. This would require some remote methods to manage tokens. I was not aware of the REST or ZMQ API and will have to investigate that further.

Fundamentally LAVA assumes it owns the devices and that makes it hard to integrate into existing infrastructure. I would be interested in any ideas you have that could make this easier.

I also have Dockerfiles to create lava and lava-worker docker images that I have been using for my experiment. It makes it very easy to setup development instances of lava-server and test upgrades, etc. If you are interested, I can put them on Github. Let me know what would be the best way to share them.

Thank you for your time. I look forward to working with you to make LAVA better.

[1]: http://mesos.apache.org/ [2]: https://www.nomadproject.io/ [3]: http://kubernetes.io/ [4]: https://mesosphere.github.io/marathon/

-- Konrad Scherer, MTS, Linux Products Group, Wind River

Neil Williams

2:57 p.m.

New subject: [Lava-users] Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension

On 10 June 2016 at 14:19, Konrad Scherer Konrad.Scherer@windriver.com wrote:

...

On 2016-06-10 04:35 AM, Neil Williams wrote:

...
On 10 June 2016 at 00:15, Konrad Scherer Konrad.Scherer@windriver.com wrote:

...
Hello,

For the 2016.4 release I had created a custom LAVA extension to add 2 commands and 2 xmlrpc methods. The extension mechanism was very convenient. All I had to do was install an egg on the system and restart the lava service. When I upgraded to 2016.6 the extension did not work due to commit b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for extensions".

I was not able to find a way to add my extension to INSTALLED_APPS and register the xmlrpc methods without modifying settings/common.py and urls.py. I looked through common.py and distro.py and could not find support in settings.conf for extensions. I also looked for a local_settings import which is referenced on the Internet as a common way to extend django, but did not find it. If there is a way to extend LAVA without modifying the LAVA python code, please let me know and I will be happy to send in a documentation patch.

It would have been nice if functionality such as the extension mechanism, which is part of the external interface of LAVA, had gone through a deprecation cycle. A reworked demo app showing how the new extension mechanism works would have also been helpful.

Sorry about that - the extensions support was removed amongst a lot of other changes to (start to) streamline the codebase and remove historical or legacy code, particularly at the lower levels where it affects all operations. It's part of the ongoing work for LAVA V2, including the upcoming move to require django 1.8. The current migration is complex and code sometimes needs to be removed or adapted. Extensions cannot be tested as part of the development of LAVA, so it is preferable for changes to be made inside the LAVA codebase and extensions have therefore been removed. There wasn't a deprecation cycle for this step, so our apologies for that.

The migration to LAVA V2 increases the number and types of methods available to export data from LAVA - XML-RPC, REST api and ZMQ. The intention is that customisation will take place in custom frontends which abstract the generic LAVA data to make it relevant for particular teams. If there are extra calls which need to be exported, these need to be contributed to the LAVA codebase so that these can be tested and maintained.

We've had significant problems when lava has been mixed with pip installs or third-party eggs - introducing instability, failures and errors. The use of any non-packaged code with lava, other than through the existing API, is not supported.

Please talk to us about what the commands and methods were trying to achieve - we are open to having more calls available as long as the code being used can be properly tested within the codebase.

Thank you for the detailed response. The extension code was complex and I understand why it was removed. Ideally the LAVA API would accommodate my requirements so it may help to understand why I wrote the extension and what it currently does:

A command to create a pipeline worker and a set of qemu devices.

Extending settings.conf to be able to add an app to INSTALLED_APPS would be the simplest way to enable this.

Sorry, that doesn't scale. Pipeline workers typically have no connection to the django settings - the only connection with the master is over ZMQ. There is support for adding a pipeline worker but creating devices is an admin task that can be very specific to that instance. The worker may also need encryption configured which is per-instance.

A one off command during initial setup of an instance has been considered but that would not run once the instance had been configured.

...

But why did I create the command? I am running a Mesos[1] cluster with 50+ x86 servers and am experimenting with using the cluster to run qemu/lxc tests.

So let LAVA have a server outside the cluster and a couple of pipeline workers then the workers can communicate with the cluster to run the tests. No need for temporary workers. Note: temporary devices are also a bad idea for the database - that is why we created primary and secondary connections in V2.

...

The idea is that the LAVA server will be connected to both hardware in a lab and to the mesos cluster and users can run tests for qemu devices on demand using available mesos cluster resources.

I don't see why this wouldn't work in a standard LAVA arrangement of an external box running lava-server and a few workers running lava-slave. What's needed is a way for a permanent worker to communicate with the cluster to run commands. That could be as simple as using the existing support for primary SSH connections. With the added benefit that because your cluster resource is temporary, you don't have the usual concerns about persistence associated with primary connections.

https://staging.validation.linaro.org/static/docs/v2/dispatcher-design.html#...

...

My proof of concept is working, i.e. a custom mesos scheduler polls the LAVA server for submitted qemu jobs and starts lava workers dynamically to run the jobs.

Please do not poll. There will be ways to get information from LAVA using push notifications but polling adds load for no good reason. Having two schedulers is adding unnecessary complexity.

This would be much better with static workers, not dynamic. Then define a device-type which spins up the device - not the worker. Presumably you have some connection method to the device, that could be used to drive the test in much the same way as LAVA currently drives ARM boards.That way, the LAVA scheduler can manage jobs itself (which is what it is designed to do) and the workers can run the jobs on the devices. Drop the other scheduler and just have a process that submits to LAVA.

Dynamic workers is a very bad idea and is likely to lead to problems as the LAVA code develops, tying the worker and the master still closer together. We already have enough problems with the dynamic devices used in V1.

...

A command that integrates LAVA with our internal lab management software.

Unfortunately LAVA does not own the lab devices and when a user reserves a lab target using this management service, this command is used to mark the device offline in LAVA and back online when the lab target is unreserved.

That is just too specific to that lab. The usual approach there involves tools like puppet, ansible, salt etc.

...

XMLRPC methods to mark devices online and offline. I noticed that when a

pipeline worker disconnects from the server, its attached devices are not transitioned offline and the scheduler will attempt to reserve and run jobs on these devices. With the ephemeral workers this does not work, so I added the xmlrpc methods to enable and disable only one qemu device per worker when the worker starts and stops.

Included in 2016.6 - see https://lists.linaro.org/pipermail/lava-users/2016-April/000068.html which resulted in https://git.linaro.org/lava/lava-server.git/commit/2091ac9c3f9305f5a4e083d15... and a few related commits.

However, that support is NOT intended for support of dynamic workers, it is intended to support temporary access to the devices outside of LAVA.

...

Fundamentally LAVA assumes it owns the devices and that makes it hard to integrate into existing infrastructure. I would be interested in any ideas you have that could make this easier.

LAVA does indeed own the devices and that needs to stay. Temporary reassignment of devices can be done but workers need to be permanent objects in the database. We do expect to add code to handle workers which go offline, it's not included yet.

Sorry, but this design makes no sense in terms of how LAVA can be supported. It would seem to waste the majority of the support available in LAVA, I fail to see how LAVA can provide benefit to a CI loop based on this situation. VMs for demo usage has been considered but only as demos.

...

It makes it very easy to setup development instances of lava-server and test upgrades, etc. If you are interested, I can put them on Github. Let me know what would be the best way to share them.

IMHO docker is not a suitable way to test LAVA - full virtualisation is the way that LAVA is tested upstream, along with dedicated hardware for the long term testing.

...

Thank you for your time. I look forward to working with you to make LAVA better.

I'm not sure if there is a misconception about what LAVA is able to achieve or whether you are just trying to force LAVA into a niche you have already defined. Continuing with the path you've outlined is likely to cause breakage. I suggest that you reconsider your design - especially any idea about temporary workers. The churn resulting from that is simply not worth considering.

-- Neil Williams ============= neil.williams@linaro.org http://www.linux.codehelp.co.uk/

Neil Williams

3:20 p.m.

New subject: [Lava-users] Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension

On 10 June 2016 at 15:57, Neil Williams neil.williams@linaro.org wrote:

...

On 10 June 2016 at 14:19, Konrad Scherer Konrad.Scherer@windriver.com wrote:

...
On 2016-06-10 04:35 AM, Neil Williams wrote:

...
On 10 June 2016 at 00:15, Konrad Scherer Konrad.Scherer@windriver.com wrote:

...
Hello,

For the 2016.4 release I had created a custom LAVA extension to add 2 commands and 2 xmlrpc methods. The extension mechanism was very convenient. All I had to do was install an egg on the system and restart the lava service. When I upgraded to 2016.6 the extension did not work due to commit b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for extensions".

I was not able to find a way to add my extension to INSTALLED_APPS and register the xmlrpc methods without modifying settings/common.py and urls.py. I looked through common.py and distro.py and could not find support in settings.conf for extensions. I also looked for a local_settings import which is referenced on the Internet as a common way to extend django, but did not find it. If there is a way to extend LAVA without modifying the LAVA python code, please let me know and I will be happy to send in a documentation patch.

It would have been nice if functionality such as the extension mechanism, which is part of the external interface of LAVA, had gone through a deprecation cycle. A reworked demo app showing how the new extension mechanism works would have also been helpful.

Sorry about that - the extensions support was removed amongst a lot of other changes to (start to) streamline the codebase and remove historical or legacy code, particularly at the lower levels where it affects all operations. It's part of the ongoing work for LAVA V2, including the upcoming move to require django 1.8. The current migration is complex and code sometimes needs to be removed or adapted. Extensions cannot be tested as part of the development of LAVA, so it is preferable for changes to be made inside the LAVA codebase and extensions have therefore been removed. There wasn't a deprecation cycle for this step, so our apologies for that.

The migration to LAVA V2 increases the number and types of methods available to export data from LAVA - XML-RPC, REST api and ZMQ. The intention is that customisation will take place in custom frontends which abstract the generic LAVA data to make it relevant for particular teams. If there are extra calls which need to be exported, these need to be contributed to the LAVA codebase so that these can be tested and maintained.

We've had significant problems when lava has been mixed with pip installs or third-party eggs - introducing instability, failures and errors. The use of any non-packaged code with lava, other than through the existing API, is not supported.

Please talk to us about what the commands and methods were trying to achieve - we are open to having more calls available as long as the code being used can be properly tested within the codebase.

Thank you for the detailed response. The extension code was complex and I understand why it was removed. Ideally the LAVA API would accommodate my requirements so it may help to understand why I wrote the extension and what it currently does:

A command to create a pipeline worker and a set of qemu devices.

Extending settings.conf to be able to add an app to INSTALLED_APPS would be the simplest way to enable this.

Sorry, that doesn't scale. Pipeline workers typically have no connection to the django settings - the only connection with the master is over ZMQ. There is support for adding a pipeline worker but creating devices is an admin task that can be very specific to that instance. The worker may also need encryption configured which is per-instance.

A one off command during initial setup of an instance has been considered but that would not run once the instance had been configured.

...
But why did I create the command? I am running a Mesos[1] cluster with 50+ x86 servers and am experimenting with using the cluster to run qemu/lxc tests.

So let LAVA have a server outside the cluster and a couple of pipeline workers then the workers can communicate with the cluster to run the tests. No need for temporary workers. Note: temporary devices are also a bad idea for the database - that is why we created primary and secondary connections in V2.

...
The idea is that the LAVA server will be connected to both hardware in a lab and to the mesos cluster and users can run tests for qemu devices on demand using available mesos cluster resources.

I don't see why this wouldn't work in a standard LAVA arrangement of an external box running lava-server and a few workers running lava-slave. What's needed is a way for a permanent worker to communicate with the cluster to run commands. That could be as simple as using the existing support for primary SSH connections. With the added benefit that because your cluster resource is temporary, you don't have the usual concerns about persistence associated with primary connections.

https://staging.validation.linaro.org/static/docs/v2/dispatcher-design.html#...

...
My proof of concept is working, i.e. a custom mesos scheduler polls the LAVA server for submitted qemu jobs and starts lava workers dynamically to run the jobs.

Please do not poll. There will be ways to get information from LAVA using push notifications but polling adds load for no good reason. Having two schedulers is adding unnecessary complexity.

This would be much better with static workers, not dynamic. Then define a device-type which spins up the device - not the worker. Presumably you have some connection method to the device, that could be used to drive the test in much the same way as LAVA currently drives ARM boards.That way, the LAVA scheduler can manage jobs itself (which is what it is designed to do) and the workers can run the jobs on the devices. Drop the other scheduler and just have a process that submits to LAVA.

Dynamic workers is a very bad idea and is likely to lead to problems as the LAVA code develops, tying the worker and the master still closer together. We already have enough problems with the dynamic devices used in V1.

...

A command that integrates LAVA with our internal lab management software.

Unfortunately LAVA does not own the lab devices and when a user reserves a lab target using this management service, this command is used to mark the device offline in LAVA and back online when the lab target is unreserved.

That is just too specific to that lab. The usual approach there involves tools like puppet, ansible, salt etc.

...

XMLRPC methods to mark devices online and offline. I noticed that when a

pipeline worker disconnects from the server, its attached devices are not transitioned offline and the scheduler will attempt to reserve and run jobs on these devices. With the ephemeral workers this does not work, so I added the xmlrpc methods to enable and disable only one qemu device per worker when the worker starts and stops.

Included in 2016.6 - see https://lists.linaro.org/pipermail/lava-users/2016-April/000068.html which resulted in https://git.linaro.org/lava/lava-server.git/commit/2091ac9c3f9305f5a4e083d15... and a few related commits.

However, that support is NOT intended for support of dynamic workers, it is intended to support temporary access to the devices outside of LAVA.

...
Fundamentally LAVA assumes it owns the devices and that makes it hard to integrate into existing infrastructure. I would be interested in any ideas you have that could make this easier.

LAVA does indeed own the devices and that needs to stay. Temporary reassignment of devices can be done but workers need to be permanent objects in the database. We do expect to add code to handle workers which go offline, it's not included yet.

Sorry, but this design makes no sense in terms of how LAVA can be supported. It would seem to waste the majority of the support available in LAVA, I fail to see how LAVA can provide benefit to a CI loop based on this situation. VMs for demo usage has been considered but only as demos.

...
It makes it very easy to setup development instances of lava-server and test upgrades, etc. If you are interested, I can put them on Github. Let me know what would be the best way to share them.

IMHO docker is not a suitable way to test LAVA - full virtualisation is the way that LAVA is tested upstream, along with dedicated hardware for the long term testing.

...
Thank you for your time. I look forward to working with you to make LAVA better.

I'm not sure if there is a misconception about what LAVA is able to achieve or whether you are just trying to force LAVA into a niche you have already defined. Continuing with the path you've outlined is likely to cause breakage. I suggest that you reconsider your design - especially any idea about temporary workers. The churn resulting from that is simply not worth considering.

Thinking more about this - is this actually a case where Jenkins or Travis would be better suited? (We use Jenkins for some operations in LAVA development, precisely because it can spin up the necessary - x86 - resources on demand.)

LAVA is better with tests that are dependent on particular kinds of permanent hardware and although there is QEMU support, LAVA is primarily about providing access to unusual hardware to test whether systems boot and run cleanly. i.e. where the architecture and physical aspects of the hardware make a significant difference to the testing. A mismatch at that sort of level is only going to make things harder later. The driving force for LAVA development could be at odds with your needs.

For LAVA, with primary and secondary connections, the "device" is just an endpoint that can run tests - it doesn't matter if the actual resource has only just been created, as long as it is ready when LAVA needs to make a connection. The thing itself is just whatever box or resource is in charge of running the tests - and one of those test actions is simply to run QEMU.

Some of our virtualisation testing in LAVA uses an arm64 board as the device and then uses secondary connections to run tests in VMs started by that device. No need for temporary workers or temporary devices.

Make the server, workers and devices permanent and then let the connections be to whatever has been made available. However, do please take time first to consider if your needs actually align with what LAVA is trying to achieve.

-- Neil Williams ============= neil.williams@linaro.org http://www.linux.codehelp.co.uk/

Sandeep Chawla

5:53 p.m.

New subject: [Lava-users] Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension

Neil, Thanks for the explanation on the limitations of Lava extensions.

I too have a lava extension that I wrote.

It added a few XML RPC apis to deploy data to the lava server at /usr/share/lava-server/static/

The purpose was to deploy device images on the server so that its faster to use in multiple jobs

Is there another way the builds/binaries can be deployed to the server ?

If its ok, I can add those patches to lava

Thanks Sandeep

On Fri, Jun 10, 2016 at 8:20 AM, Neil Williams neil.williams@linaro.org wrote:

...

On 10 June 2016 at 15:57, Neil Williams neil.williams@linaro.org wrote:

...
On 10 June 2016 at 14:19, Konrad Scherer Konrad.Scherer@windriver.com

wrote:

...
...
On 2016-06-10 04:35 AM, Neil Williams wrote:

...
On 10 June 2016 at 00:15, Konrad Scherer <Konrad.Scherer@windriver.com

...
...
wrote:

...
Hello,

For the 2016.4 release I had created a custom LAVA extension to add 2 commands and 2 xmlrpc methods. The extension mechanism was very convenient. All I had to do was install an egg on the system and restart the lava service. When I upgraded to 2016.6 the extension did not work due to commit b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for extensions".

I was not able to find a way to add my extension to INSTALLED_APPS and register the xmlrpc methods without modifying settings/common.py and urls.py. I looked through common.py and distro.py and could not find support in settings.conf for extensions. I also looked for a local_settings import which is referenced on the Internet as a common way to extend django,

but

...
...
...
...
did not find it. If there is a way to extend LAVA without modifying

the

...
...
...
...
LAVA python code, please let me know and I will be happy to send in a documentation patch.

It would have been nice if functionality such as the extension

mechanism,

...
...
...
...
which is part of the external interface of LAVA, had gone through a deprecation cycle. A reworked demo app showing how the new extension mechanism works would have also been helpful.

Sorry about that - the extensions support was removed amongst a lot of other changes to (start to) streamline the codebase and remove historical or legacy code, particularly at the lower levels where it affects all operations. It's part of the ongoing work for LAVA V2, including the upcoming move to require django 1.8. The current migration is complex and code sometimes needs to be removed or adapted. Extensions cannot be tested as part of the development of LAVA, so it is preferable for changes to be made inside the LAVA codebase and extensions have therefore been removed. There wasn't a deprecation cycle for this step, so our apologies for that.

The migration to LAVA V2 increases the number and types of methods available to export data from LAVA - XML-RPC, REST api and ZMQ. The intention is that customisation will take place in custom frontends which abstract the generic LAVA data to make it relevant for particular teams. If there are extra calls which need to be exported, these need to be contributed to the LAVA codebase so that these can be tested and maintained.

We've had significant problems when lava has been mixed with pip installs or third-party eggs - introducing instability, failures and errors. The use of any non-packaged code with lava, other than through the existing API, is not supported.

Please talk to us about what the commands and methods were trying to achieve - we are open to having more calls available as long as the code being used can be properly tested within the codebase.

Thank you for the detailed response. The extension code was complex and

I

...
...
understand why it was removed. Ideally the LAVA API would accommodate my requirements so it may help to understand why I wrote the extension and

what

...
...
it currently does:

A command to create a pipeline worker and a set of qemu devices.

Extending settings.conf to be able to add an app to INSTALLED_APPS

would be

...
...
the simplest way to enable this.

Sorry, that doesn't scale. Pipeline workers typically have no connection to the django settings - the only connection with the master is over ZMQ. There is support for adding a pipeline worker but creating devices is an admin task that can be very specific to that instance. The worker may also need encryption configured which is per-instance.

A one off command during initial setup of an instance has been considered but that would not run once the instance had been configured.

...
But why did I create the command? I am running a Mesos[1] cluster with

50+

...
...
x86 servers and am experimenting with using the cluster to run qemu/lxc tests.

So let LAVA have a server outside the cluster and a couple of pipeline workers then the workers can communicate with the cluster to run the tests. No need for temporary workers. Note: temporary devices are also a bad idea for the database - that is why we created primary and secondary connections in V2.

...
The idea is that the LAVA server will be connected to both hardware in a lab and to the mesos cluster and users can run tests for qemu

devices

...
...
on demand using available mesos cluster resources.

I don't see why this wouldn't work in a standard LAVA arrangement of an external box running lava-server and a few workers running lava-slave. What's needed is a way for a permanent worker to communicate with the cluster to run commands. That could be as simple as using the existing support for primary SSH connections. With the added benefit that because your cluster resource is temporary, you don't have the usual concerns about persistence associated with primary connections.

https://staging.validation.linaro.org/static/docs/v2/dispatcher-design.html#...

...
...
My proof of concept is working, i.e. a custom mesos scheduler polls the LAVA server for

submitted

...
...
qemu jobs and starts lava workers dynamically to run the jobs.

Please do not poll. There will be ways to get information from LAVA using push notifications but polling adds load for no good reason. Having two schedulers is adding unnecessary complexity.

This would be much better with static workers, not dynamic. Then define a device-type which spins up the device - not the worker. Presumably you have some connection method to the device, that could be used to drive the test in much the same way as LAVA currently drives ARM boards.That way, the LAVA scheduler can manage jobs itself (which is what it is designed to do) and the workers can run the jobs on the devices. Drop the other scheduler and just have a process that submits to LAVA.

Dynamic workers is a very bad idea and is likely to lead to problems as the LAVA code develops, tying the worker and the master still closer together. We already have enough problems with the dynamic devices used in V1.

...

A command that integrates LAVA with our internal lab management

software.

...
...
Unfortunately LAVA does not own the lab devices and when a user

reserves a

...
...
lab target using this management service, this command is used to mark

the

...
...
device offline in LAVA and back online when the lab target is

unreserved.

...
That is just too specific to that lab. The usual approach there involves tools like puppet, ansible, salt etc.

...

XMLRPC methods to mark devices online and offline. I noticed that

when a

...
...
pipeline worker disconnects from the server, its attached devices are

not

...
...
transitioned offline and the scheduler will attempt to reserve and run

jobs

...
...
on these devices. With the ephemeral workers this does not work, so I

added

...
...
the xmlrpc methods to enable and disable only one qemu device per worker when the worker starts and stops.

Included in 2016.6 - see https://lists.linaro.org/pipermail/lava-users/2016-April/000068.html which resulted in

https://git.linaro.org/lava/lava-server.git/commit/2091ac9c3f9305f5a4e083d15...

...
and a few related commits.

However, that support is NOT intended for support of dynamic workers, it is intended to support temporary access to the devices outside of LAVA.

...
Fundamentally LAVA assumes it owns the devices and that makes it hard to integrate into existing infrastructure. I would be interested in any

ideas

...
...
you have that could make this easier.

LAVA does indeed own the devices and that needs to stay. Temporary reassignment of devices can be done but workers need to be permanent objects in the database. We do expect to add code to handle workers which go offline, it's not included yet.

Sorry, but this design makes no sense in terms of how LAVA can be supported. It would seem to waste the majority of the support available in LAVA, I fail to see how LAVA can provide benefit to a CI loop based on this situation. VMs for demo usage has been considered but only as demos.

...
It makes it very easy to setup development instances of lava-server and test upgrades, etc. If you are interested, I can put them on Github. Let me know what would be the

best way

...
...
to share them.

IMHO docker is not a suitable way to test LAVA - full virtualisation is the way that LAVA is tested upstream, along with dedicated hardware for the long term testing.

...
Thank you for your time. I look forward to working with you to make LAVA better.

I'm not sure if there is a misconception about what LAVA is able to achieve or whether you are just trying to force LAVA into a niche you have already defined. Continuing with the path you've outlined is likely to cause breakage. I suggest that you reconsider your design - especially any idea about temporary workers. The churn resulting from that is simply not worth considering.

Thinking more about this - is this actually a case where Jenkins or Travis would be better suited? (We use Jenkins for some operations in LAVA development, precisely because it can spin up the necessary - x86

resources on demand.)

LAVA is better with tests that are dependent on particular kinds of permanent hardware and although there is QEMU support, LAVA is primarily about providing access to unusual hardware to test whether systems boot and run cleanly. i.e. where the architecture and physical aspects of the hardware make a significant difference to the testing. A mismatch at that sort of level is only going to make things harder later. The driving force for LAVA development could be at odds with your needs.

For LAVA, with primary and secondary connections, the "device" is just an endpoint that can run tests - it doesn't matter if the actual resource has only just been created, as long as it is ready when LAVA needs to make a connection. The thing itself is just whatever box or resource is in charge of running the tests - and one of those test actions is simply to run QEMU.

Some of our virtualisation testing in LAVA uses an arm64 board as the device and then uses secondary connections to run tests in VMs started by that device. No need for temporary workers or temporary devices.

Make the server, workers and devices permanent and then let the connections be to whatever has been made available. However, do please take time first to consider if your needs actually align with what LAVA is trying to achieve.

--

Neil Williams

neil.williams@linaro.org http://www.linux.codehelp.co.uk/ _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users

Neil Williams

8:28 p.m.

New subject: [Lava-users] Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension

On 10 June 2016 at 18:53, Sandeep Chawla sandeep@cyngn.com wrote:

...

Neil, Thanks for the explanation on the limitations of Lava extensions.

I too have a lava extension that I wrote.

It added a few XML RPC apis to deploy data to the lava server at /usr/share/lava-server/static/

That is a django directory for UI files, templates, css, javascript and the like. Various django utility programs are able to take ownership of the those locations.

It makes no sense to use that for downloads to a device when that should be happening using lava-dispatcher. You risk breaking the UI and it's an abuse of the https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard which Debian follows.

The tmp/ directory is /var/lib/lava/dispatcher/tmp/ - if you're using the example apache config from the source.

...

The purpose was to deploy device images on the server so that its faster to use in multiple jobs

There's no point having those files on the server - the files need to be on the dispatcher. Even if you use a single machine as both, you must still use the dispatcher directories.

...

Is there another way the builds/binaries can be deployed to the server ?

Use a caching proxy (e.g. squid) or a local fileserver to make files more easily available to the dispatcher. These files have no business being anywhere where the server would find them.

...

If its ok, I can add those patches to lava

Sorry - no. That would not be appropriate or suitable.

(Looks like removing the extensions was actually a good way to highlight issues with how things need to actually work. I'm glad we removed it - should probably have done so earlier.)

LAVA instances require admins - in return, LAVA can improve efficiency by asserting some things about how an instance should be administered. It helps everyone if all instances are maintained in standard ways - otherwise we cannot help you when bugs appear due to a mistake in local administration. This is a particular problem with single-user instances where the user is also the LAVA admin, system admin, test writer and developer. These are not small roles - busy instances need a medium sized team to operate and maintain, without considering any development.

With V2, the work for the test writers and admins is both changing and increasing as a deliberate change to make LAVA more transparent.

There is a lot of help out there for administering Debian-based systems and how to adopt the best practices. LAVA is complex and although we are making some parts of it easier, that simply puts more emphasis on the test writers and the admins. This is one reason why only Debian is currently supportable - we just don't have the resources to run a parallel lab on a different OS to be confident in advising on administration and maintenance. (It would need to be just as busy as validation.linaro.org, with a similar set and number of devices and that quickly becomes impractical.)

The more that instances diverge from best practice, the more likely it is that fixes and improvements in LAVA will break those instances - things just suddenly stop working. Yet, every instance is different, so LAVA cannot and will not do this work for you. There is room in V2 for a very large range of fully supportable instances.

-- Neil Williams ============= neil.williams@linaro.org http://www.linux.codehelp.co.uk/

Konrad Scherer

8:57 p.m.

New subject: [Lava-users] Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension

On 2016-06-10 11:20 AM, Neil Williams wrote:

...

On 10 June 2016 at 15:57, Neil Williams neil.williams@linaro.org wrote:

...
On 10 June 2016 at 14:19, Konrad Scherer Konrad.Scherer@windriver.com wrote:

...
On 2016-06-10 04:35 AM, Neil Williams wrote:

...
On 10 June 2016 at 00:15, Konrad Scherer Konrad.Scherer@windriver.com wrote:

...
Hello,

For the 2016.4 release I had created a custom LAVA extension to add 2 commands and 2 xmlrpc methods. The extension mechanism was very convenient. All I had to do was install an egg on the system and restart the lava service. When I upgraded to 2016.6 the extension did not work due to commit b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for extensions".

I was not able to find a way to add my extension to INSTALLED_APPS and register the xmlrpc methods without modifying settings/common.py and urls.py. I looked through common.py and distro.py and could not find support in settings.conf for extensions. I also looked for a local_settings import which is referenced on the Internet as a common way to extend django, but did not find it. If there is a way to extend LAVA without modifying the LAVA python code, please let me know and I will be happy to send in a documentation patch.

It would have been nice if functionality such as the extension mechanism, which is part of the external interface of LAVA, had gone through a deprecation cycle. A reworked demo app showing how the new extension mechanism works would have also been helpful.

Sorry about that - the extensions support was removed amongst a lot of other changes to (start to) streamline the codebase and remove historical or legacy code, particularly at the lower levels where it affects all operations. It's part of the ongoing work for LAVA V2, including the upcoming move to require django 1.8. The current migration is complex and code sometimes needs to be removed or adapted. Extensions cannot be tested as part of the development of LAVA, so it is preferable for changes to be made inside the LAVA codebase and extensions have therefore been removed. There wasn't a deprecation cycle for this step, so our apologies for that.

The migration to LAVA V2 increases the number and types of methods available to export data from LAVA - XML-RPC, REST api and ZMQ. The intention is that customisation will take place in custom frontends which abstract the generic LAVA data to make it relevant for particular teams. If there are extra calls which need to be exported, these need to be contributed to the LAVA codebase so that these can be tested and maintained.

We've had significant problems when lava has been mixed with pip installs or third-party eggs - introducing instability, failures and errors. The use of any non-packaged code with lava, other than through the existing API, is not supported.

Please talk to us about what the commands and methods were trying to achieve - we are open to having more calls available as long as the code being used can be properly tested within the codebase.

Thank you for the detailed response. The extension code was complex and I understand why it was removed. Ideally the LAVA API would accommodate my requirements so it may help to understand why I wrote the extension and what it currently does:

A command to create a pipeline worker and a set of qemu devices.

Extending settings.conf to be able to add an app to INSTALLED_APPS would be the simplest way to enable this.

Sorry, that doesn't scale. Pipeline workers typically have no connection to the django settings - the only connection with the master is over ZMQ. There is support for adding a pipeline worker but creating devices is an admin task that can be very specific to that instance. The worker may also need encryption configured which is per-instance.

A one off command during initial setup of an instance has been considered but that would not run once the instance had been configured.

...
But why did I create the command? I am running a Mesos[1] cluster with 50+ x86 servers and am experimenting with using the cluster to run qemu/lxc tests.

So let LAVA have a server outside the cluster and a couple of pipeline workers then the workers can communicate with the cluster to run the tests. No need for temporary workers. Note: temporary devices are also a bad idea for the database - that is why we created primary and secondary connections in V2.

...
The idea is that the LAVA server will be connected to both hardware in a lab and to the mesos cluster and users can run tests for qemu devices on demand using available mesos cluster resources.

I don't see why this wouldn't work in a standard LAVA arrangement of an external box running lava-server and a few workers running lava-slave. What's needed is a way for a permanent worker to communicate with the cluster to run commands. That could be as simple as using the existing support for primary SSH connections. With the added benefit that because your cluster resource is temporary, you don't have the usual concerns about persistence associated with primary connections.

https://staging.validation.linaro.org/static/docs/v2/dispatcher-design.html#...

...
My proof of concept is working, i.e. a custom mesos scheduler polls the LAVA server for submitted qemu jobs and starts lava workers dynamically to run the jobs.

Please do not poll. There will be ways to get information from LAVA using push notifications but polling adds load for no good reason. Having two schedulers is adding unnecessary complexity.

This would be much better with static workers, not dynamic. Then define a device-type which spins up the device - not the worker. Presumably you have some connection method to the device, that could be used to drive the test in much the same way as LAVA currently drives ARM boards.That way, the LAVA scheduler can manage jobs itself (which is what it is designed to do) and the workers can run the jobs on the devices. Drop the other scheduler and just have a process that submits to LAVA.

Dynamic workers is a very bad idea and is likely to lead to problems as the LAVA code develops, tying the worker and the master still closer together. We already have enough problems with the dynamic devices used in V1.

...

A command that integrates LAVA with our internal lab management software.

Unfortunately LAVA does not own the lab devices and when a user reserves a lab target using this management service, this command is used to mark the device offline in LAVA and back online when the lab target is unreserved.

That is just too specific to that lab. The usual approach there involves tools like puppet, ansible, salt etc.

...

XMLRPC methods to mark devices online and offline. I noticed that when a

pipeline worker disconnects from the server, its attached devices are not transitioned offline and the scheduler will attempt to reserve and run jobs on these devices. With the ephemeral workers this does not work, so I added the xmlrpc methods to enable and disable only one qemu device per worker when the worker starts and stops.

Included in 2016.6 - see https://lists.linaro.org/pipermail/lava-users/2016-April/000068.html which resulted in https://git.linaro.org/lava/lava-server.git/commit/2091ac9c3f9305f5a4e083d15... and a few related commits.

However, that support is NOT intended for support of dynamic workers, it is intended to support temporary access to the devices outside of LAVA.

...
Fundamentally LAVA assumes it owns the devices and that makes it hard to integrate into existing infrastructure. I would be interested in any ideas you have that could make this easier.

LAVA does indeed own the devices and that needs to stay. Temporary reassignment of devices can be done but workers need to be permanent objects in the database. We do expect to add code to handle workers which go offline, it's not included yet.

Sorry, but this design makes no sense in terms of how LAVA can be supported. It would seem to waste the majority of the support available in LAVA, I fail to see how LAVA can provide benefit to a CI loop based on this situation. VMs for demo usage has been considered but only as demos.

...
It makes it very easy to setup development instances of lava-server and test upgrades, etc. If you are interested, I can put them on Github. Let me know what would be the best way to share them.

IMHO docker is not a suitable way to test LAVA - full virtualisation is the way that LAVA is tested upstream, along with dedicated hardware for the long term testing.

...
Thank you for your time. I look forward to working with you to make LAVA better.

I'm not sure if there is a misconception about what LAVA is able to achieve or whether you are just trying to force LAVA into a niche you have already defined. Continuing with the path you've outlined is likely to cause breakage. I suggest that you reconsider your design - especially any idea about temporary workers. The churn resulting from that is simply not worth considering.

Thinking more about this - is this actually a case where Jenkins or Travis would be better suited? (We use Jenkins for some operations in LAVA development, precisely because it can spin up the necessary - x86

resources on demand.)

LAVA is better with tests that are dependent on particular kinds of permanent hardware and although there is QEMU support, LAVA is primarily about providing access to unusual hardware to test whether systems boot and run cleanly. i.e. where the architecture and physical aspects of the hardware make a significant difference to the testing. A mismatch at that sort of level is only going to make things harder later. The driving force for LAVA development could be at odds with your needs.

For LAVA, with primary and secondary connections, the "device" is just an endpoint that can run tests - it doesn't matter if the actual resource has only just been created, as long as it is ready when LAVA needs to make a connection. The thing itself is just whatever box or resource is in charge of running the tests - and one of those test actions is simply to run QEMU.

Some of our virtualisation testing in LAVA uses an arm64 board as the device and then uses secondary connections to run tests in VMs started by that device. No need for temporary workers or temporary devices.

Make the server, workers and devices permanent and then let the connections be to whatever has been made available. However, do please take time first to consider if your needs actually align with what LAVA is trying to achieve.

Thanks Neil for the detailed responses. I am new to LAVA and was fully aware that my first "proof of concept" would be sub optimal. I only have myself to blame for not starting the discussion earlier. On the other hand, writing code and trying to get something working is sometimes the best way to learn a new code base.

We have a large number of hardware targets and LAVA is well suited to run tests on these hardware devices. I was looking for a nice way to be able to use the large pool of compute I have available to run the LAVA qemu tests that we have. I am trying to move away from dedicated hardware for specific tasks and towards a pool of compute shared across a large set of tasks. But if it isn't possible, we can always fall back to dedicated servers to run qemu just for LAVA.

I did not realize that primary and secondary connections could be used in this way and I will research this approach and see if it is a better fit for our testing needs.

Thank you for your time.

-- Konrad Scherer, MTS, Linux Products Group, Wind River

Konrad Scherer

15 Jun 15 Jun

8:07 p.m.

New subject: [Lava-users] Upgrade from 2016.4 to 2016.6 breaks custom LAVA extension

On 06/10/2016 11:20 AM, Neil Williams wrote:

...

On 10 June 2016 at 15:57, Neil Williams neil.williams@linaro.org wrote:

...
On 10 June 2016 at 14:19, Konrad Scherer Konrad.Scherer@windriver.com wrote:

...
On 2016-06-10 04:35 AM, Neil Williams wrote:

...
On 10 June 2016 at 00:15, Konrad Scherer Konrad.Scherer@windriver.com wrote:

...
Hello,

For the 2016.4 release I had created a custom LAVA extension to add 2 commands and 2 xmlrpc methods. The extension mechanism was very convenient. All I had to do was install an egg on the system and restart the lava service. When I upgraded to 2016.6 the extension did not work due to commit b6fd045cc2b320ed34a6fefd713cd0574ed7b376 "Remove the need for extensions".

I was not able to find a way to add my extension to INSTALLED_APPS and register the xmlrpc methods without modifying settings/common.py and urls.py. I looked through common.py and distro.py and could not find support in settings.conf for extensions. I also looked for a local_settings import which is referenced on the Internet as a common way to extend django, but did not find it. If there is a way to extend LAVA without modifying the LAVA python code, please let me know and I will be happy to send in a documentation patch.

It would have been nice if functionality such as the extension mechanism, which is part of the external interface of LAVA, had gone through a deprecation cycle. A reworked demo app showing how the new extension mechanism works would have also been helpful.

Sorry about that - the extensions support was removed amongst a lot of other changes to (start to) streamline the codebase and remove historical or legacy code, particularly at the lower levels where it affects all operations. It's part of the ongoing work for LAVA V2, including the upcoming move to require django 1.8. The current migration is complex and code sometimes needs to be removed or adapted. Extensions cannot be tested as part of the development of LAVA, so it is preferable for changes to be made inside the LAVA codebase and extensions have therefore been removed. There wasn't a deprecation cycle for this step, so our apologies for that.

The migration to LAVA V2 increases the number and types of methods available to export data from LAVA - XML-RPC, REST api and ZMQ. The intention is that customisation will take place in custom frontends which abstract the generic LAVA data to make it relevant for particular teams. If there are extra calls which need to be exported, these need to be contributed to the LAVA codebase so that these can be tested and maintained.

We've had significant problems when lava has been mixed with pip installs or third-party eggs - introducing instability, failures and errors. The use of any non-packaged code with lava, other than through the existing API, is not supported.

Please talk to us about what the commands and methods were trying to achieve - we are open to having more calls available as long as the code being used can be properly tested within the codebase.

Thank you for the detailed response. The extension code was complex and I understand why it was removed. Ideally the LAVA API would accommodate my requirements so it may help to understand why I wrote the extension and what it currently does:

A command to create a pipeline worker and a set of qemu devices.

Extending settings.conf to be able to add an app to INSTALLED_APPS would be the simplest way to enable this.

Sorry, that doesn't scale. Pipeline workers typically have no connection to the django settings - the only connection with the master is over ZMQ. There is support for adding a pipeline worker but creating devices is an admin task that can be very specific to that instance. The worker may also need encryption configured which is per-instance.

A one off command during initial setup of an instance has been considered but that would not run once the instance had been configured.

...
But why did I create the command? I am running a Mesos[1] cluster with 50+ x86 servers and am experimenting with using the cluster to run qemu/lxc tests.

So let LAVA have a server outside the cluster and a couple of pipeline workers then the workers can communicate with the cluster to run the tests. No need for temporary workers. Note: temporary devices are also a bad idea for the database - that is why we created primary and secondary connections in V2.

...
The idea is that the LAVA server will be connected to both hardware in a lab and to the mesos cluster and users can run tests for qemu devices on demand using available mesos cluster resources.

I don't see why this wouldn't work in a standard LAVA arrangement of an external box running lava-server and a few workers running lava-slave. What's needed is a way for a permanent worker to communicate with the cluster to run commands. That could be as simple as using the existing support for primary SSH connections. With the added benefit that because your cluster resource is temporary, you don't have the usual concerns about persistence associated with primary connections.

https://staging.validation.linaro.org/static/docs/v2/dispatcher-design.html#...

...
My proof of concept is working, i.e. a custom mesos scheduler polls the LAVA server for submitted qemu jobs and starts lava workers dynamically to run the jobs.

Please do not poll. There will be ways to get information from LAVA using push notifications but polling adds load for no good reason. Having two schedulers is adding unnecessary complexity.

This would be much better with static workers, not dynamic. Then define a device-type which spins up the device - not the worker. Presumably you have some connection method to the device, that could be used to drive the test in much the same way as LAVA currently drives ARM boards.That way, the LAVA scheduler can manage jobs itself (which is what it is designed to do) and the workers can run the jobs on the devices. Drop the other scheduler and just have a process that submits to LAVA.

Dynamic workers is a very bad idea and is likely to lead to problems as the LAVA code develops, tying the worker and the master still closer together. We already have enough problems with the dynamic devices used in V1.

...

A command that integrates LAVA with our internal lab management software.

Unfortunately LAVA does not own the lab devices and when a user reserves a lab target using this management service, this command is used to mark the device offline in LAVA and back online when the lab target is unreserved.

That is just too specific to that lab. The usual approach there involves tools like puppet, ansible, salt etc.

...

XMLRPC methods to mark devices online and offline. I noticed that when a

pipeline worker disconnects from the server, its attached devices are not transitioned offline and the scheduler will attempt to reserve and run jobs on these devices. With the ephemeral workers this does not work, so I added the xmlrpc methods to enable and disable only one qemu device per worker when the worker starts and stops.

Included in 2016.6 - see https://lists.linaro.org/pipermail/lava-users/2016-April/000068.html which resulted in https://git.linaro.org/lava/lava-server.git/commit/2091ac9c3f9305f5a4e083d15... and a few related commits.

However, that support is NOT intended for support of dynamic workers, it is intended to support temporary access to the devices outside of LAVA.

...
Fundamentally LAVA assumes it owns the devices and that makes it hard to integrate into existing infrastructure. I would be interested in any ideas you have that could make this easier.

LAVA does indeed own the devices and that needs to stay. Temporary reassignment of devices can be done but workers need to be permanent objects in the database. We do expect to add code to handle workers which go offline, it's not included yet.

Sorry, but this design makes no sense in terms of how LAVA can be supported. It would seem to waste the majority of the support available in LAVA, I fail to see how LAVA can provide benefit to a CI loop based on this situation. VMs for demo usage has been considered but only as demos.

...
It makes it very easy to setup development instances of lava-server and test upgrades, etc. If you are interested, I can put them on Github. Let me know what would be the best way to share them.

IMHO docker is not a suitable way to test LAVA - full virtualisation is the way that LAVA is tested upstream, along with dedicated hardware for the long term testing.

...
Thank you for your time. I look forward to working with you to make LAVA better.

I'm not sure if there is a misconception about what LAVA is able to achieve or whether you are just trying to force LAVA into a niche you have already defined. Continuing with the path you've outlined is likely to cause breakage. I suggest that you reconsider your design - especially any idea about temporary workers. The churn resulting from that is simply not worth considering.

Thinking more about this - is this actually a case where Jenkins or Travis would be better suited? (We use Jenkins for some operations in LAVA development, precisely because it can spin up the necessary - x86

resources on demand.)

LAVA is better with tests that are dependent on particular kinds of permanent hardware and although there is QEMU support, LAVA is primarily about providing access to unusual hardware to test whether systems boot and run cleanly. i.e. where the architecture and physical aspects of the hardware make a significant difference to the testing. A mismatch at that sort of level is only going to make things harder later. The driving force for LAVA development could be at odds with your needs.

For LAVA, with primary and secondary connections, the "device" is just an endpoint that can run tests - it doesn't matter if the actual resource has only just been created, as long as it is ready when LAVA needs to make a connection. The thing itself is just whatever box or resource is in charge of running the tests - and one of those test actions is simply to run QEMU.

Some of our virtualisation testing in LAVA uses an arm64 board as the device and then uses secondary connections to run tests in VMs started by that device. No need for temporary workers or temporary devices.

Make the server, workers and devices permanent and then let the connections be to whatever has been made available. However, do please take time first to consider if your needs actually align with what LAVA is trying to achieve.

I have been researching how a solution with this setup would look. I have reread the docs and looked through the code and I am confused. We have a working V1 server with a V1 device with the following custom commands:

pre_connect_command = /pre_connect_script.py %(hostname)s connection_command = /connection_script.py %(hostname)s power_off_cmd = /power_off_script.py %(hostname)s

The connection script initiates a telnet connection to the console of the device. Is it possible to do the something similar with V2?

There are a few V2 devices in lava-dispatcher with custom pdu commands and a comment in lava_dispatcher/pipeline/test/sample_jobs/basics.yaml:

- commands: # list of pre-defined scripts installed by the server administrator at # /etc/lava-server/commands.d/* to be invoked # # Each called script will have the data in the job definition (+ plus # the actual device where the test is running) passed in by environment # variables - power-off

I can't find any code that implements running custom commands or injects values into a script environment. I created a device with some custom commands, but I couldn't figure out the correct deploy and boot methods to invoke those scripts.

Did I miss something or should I stick with V1 for now?

Thank you for your time.

-- Konrad Scherer, MTS, Linux Products Group, Wind River

3308

days inactive

3314

days old

lava-users@lists.lavasoftware.org

8 comments

participants

tags (0)

participants (4)

Konrad Scherer
Konrad Scherer
Neil Williams
Sandeep Chawla