lava-announce October 2020

lava-announce@lists.lavasoftware.org

1 participants
1 discussions

LAVA 2020.09 release

by Remi Duraffort

Hi folks, The 2020.09 tag has been pushed to master on git.lavasoftware.org. .deb packages have been built in GitLab CI and are published at https://apt.lavasoftware.org/release Docker images for amd64 and arm64 have been built in GitLab CI and are available from https://hub.lavasoftware.org/ and https://hub.docker.com/u/lavasoftware Changes in this release ================= # Upgrading This release is bringing a big architectural change by replacing ZMQ by HTTP(s) for server-worker communication. Admin will have to update the configuration of each worker. ## From ZMQ to HTTP(s) The protocol that LAVA is using to communicate between the server and the workers has been changed from ZMQ to HTTP(s). This will improve performance and reliability but admins will have to update their configuration after the upgrade. ## Database migrations This release include two migrations: * lava_results_app.0017_testdata_onetoone_field: drop the bug link * lava_scheduler_app.0053_testjob_and_worker_token # Device-types ## New device-types New supported devices: * imx8mn-evk ## Juno-r2 and tee Fix a bug in LAVA %2020.08 that was preventing the use of juno-r2 boards. This is a regression from %2020.07 introduced by the support for `tee` in u-boot jobs. ## SoCA9 Update the dtb address from `0x00000100` to `0x00001000` to prevent some issues with u-boot 2020.07. # From ZMQ to HTTP(s) In prior versions LAVA daemons where using ZMQ to communicate and send logs. In this release, LAVA is using plain HTTP(s) to control the remote worker and send job logs. ## Reasons ### Load-balancing and fault-tolerance The previous architecture was not able to cope with a large number of jobs running in parallel. Mainly because it was impossible to load-balance the traffic to multiple `lava-logs` and `lava-master`. By using HTTPS(s) it is way easier to load-balance the traffic to multiple instances of `lava-server-gunicorn`. With load-balancing we can also increase fault-tolerance and move toward zero-downtime upgrades. ### Master and scheduling In the previous design, `lava-master` was both the master and the job scheduler. This was introducing latency to start jobs when running many jobs in parallel. With the new design, `lava-scheduler` is running in the background, scheduling jobs while `lava-server-gunicorn` is serving both clients and workers. ### Proxies Using HTTPS(s) will also easier the adoption of remote workers. In fact, connection to non-standard port is often impossible in corporate environment. ### Job termination With the previous design, `lava-logs` and `lava-master` where both responsible for terminating a job. This was sometime leading to a dead lock where the job was waiting forever for its end. This is not possible anymore as `lava-server-gunicorn` is responsible for both the logs and the job termination. ### Simplifying the architecture By using HTTP(s) instead of ZMQ, we are able to decrease the number of services running on the server. We are planning to also drop the need for `lava-coordinator` in the future. Using HTTP(s) also allow to decrease the number of network ports that the server should listen to. This is simplifying deployment and help hosting many instances on the same physical server. ## Services The following services has been dropped: * `lava-logs`: the logs are sent directly to `lava-server-gunicorn` * `lava-master`: the workers are pulling jobs from `lava-server-gunicorn` This release is introducing a new service called `lava-scheduler` that is solely responsible for scheduling jobs. In this release `lava-slave` has been rewritten from scratch and renamed `lava-worker`. ## Version mismatch With previous LAVA version, `lava-master` was not checking `lava-slave` version. This was sometime leading to strange behavior when the server was upgraded but not the dispatcher. `lava-server-gunicorn` is now able to check the `lava-worker` version every time the service is requesting jobs to run. In the event of a version mismatch, the server will put the worker offline, refusing to start jobs on this worker. When it's safe to stop the worker (the worker is done with the current set of jobs), the server will return a specific error. If you use the new [LAVA docker worker](#lava-docker-worker), `lava-worker` will be automatically upgraded to the server version whenever needed. ## Upgrading After the upgrade, every worker will be inactive as the `lava-worker` services won't be able to connect to `lava-server-gunicorn`. For each worker, admins will have to update the configuration. * Update the `URL` variable in the worker configuration (`/etc/lava-dispatcher/lava-worker`). This is the full URL to the server. * Add the worker token in `/var/lib/lava/dispatcher/worker/token`. Admins can find the token in the worker admin page at [ http://INSTANCE/admin/lava_scheduler_app/worker/WORKER_NAME/change/](http:/… . * restart `lava-worker` # LAVA docker worker This release introduces a program called `lava-docker-worker` that runs a LAVA worker inside a Docker container. This script is provided by the `lava-dispatcher-host` package, and has has the following features: * Takes the same parameters as regular `lava-worker`. * Detects the LAVA version of the server, and runs the worker from the same LAVA version * Automatically upgrades the worker when the server upgrades. * Docker containers started by it are it siblings and not children, i.e. they will run under the host system directly. This worker in Docker should support most user cases that are supported by the regular LAVA worker, except running LXC containers. It's important to note that the container started by `lava-docker-worker` runs in privileged mode and with host networking, what means that it is less isolated from the host system as you would usually expect application containers to be: - it has access to **all** devices under `/dev`. - it uses the same networking stack as the host system. In this case, you should consider `lava-docker-worker` as a distribution facilitator, not as an isolation mechanism. You should not run `lava-docker-worker` on a host where you wouldn't run the regular LAVA worker. # Bug link The possibility to link a bug to a specific test job or result as been dropped. This feature was generating a huge load on the database server without a real benefits. # Tests from tar Starting from this release, LAVA can pull tests from a tar archive instead of a git repository. The job definition will look like: ```yaml - test: name: basic-linux-smoke timeout: minutes: 10 definitions: - repository: https://github.com/Linaro/test-definitions/archive/2019.03.tar.gz from: url path: automated/linux/smoke/smoke.yaml name: linux-smoke compression: gz ``` # LAVA job id LAVA is now exporting the job id to the lava test shell environment. The variable is called `LAVA_JOB_ID` and can be used with ```shell echo "$LAVA_JOB_ID" ``` We are willing to export more LAVA data as environment variable in the future. Thanks -- Rémi Duraffort LAVA Architect Linaro

5 years, 3 months

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

lava-announce October 2020