Hi folks,
The 2020.09 tag has been pushed to master on git.lavasoftware.org.
.deb packages have been built in GitLab CI and are published at
https://apt.lavasoftware.org/release
Docker images for amd64 and arm64 have been built in GitLab CI and
are available from
https://hub.lavasoftware.org/
and
https://hub.docker.com/u/lavasoftware
Changes in this release
=================
# Upgrading
This release is bringing a big architectural change by replacing ZMQ by
HTTP(s) for server-worker communication.
Admin will have to update the configuration of each worker.
## From ZMQ to HTTP(s)
The protocol that LAVA is using to communicate between the server and the
workers has been changed from ZMQ to HTTP(s).
This will improve performance and reliability but admins will have to
update their configuration after the upgrade.
## Database migrations
This release include two migrations:
* lava_results_app.0017_testdata_onetoone_field: drop the bug link
* lava_scheduler_app.0053_testjob_and_worker_token
# Device-types
## New device-types
New supported devices:
* imx8mn-evk
## Juno-r2 and tee
Fix a bug in LAVA %2020.08 that was preventing the use of juno-r2 boards.
This is a regression from %2020.07 introduced by the support for `tee` in
u-boot jobs.
## SoCA9
Update the dtb address from `0x00000100` to `0x00001000` to prevent some
issues with u-boot 2020.07.
# From ZMQ to HTTP(s)
In prior versions LAVA daemons where using ZMQ to communicate and send
logs. In this release, LAVA is using plain HTTP(s) to control the remote
worker and send job logs.
## Reasons
### Load-balancing and fault-tolerance
The previous architecture was not able to cope with a large number of jobs
running in parallel. Mainly because it was impossible to load-balance the
traffic to multiple `lava-logs` and `lava-master`.
By using HTTPS(s) it is way easier to load-balance the traffic to multiple
instances of `lava-server-gunicorn`.
With load-balancing we can also increase fault-tolerance and move toward
zero-downtime upgrades.
### Master and scheduling
In the previous design, `lava-master` was both the master and the job
scheduler. This was introducing latency to start jobs when running many
jobs in parallel.
With the new design, `lava-scheduler` is running in the background,
scheduling jobs while `lava-server-gunicorn` is serving both clients and
workers.
### Proxies
Using HTTPS(s) will also easier the adoption of remote workers. In fact,
connection to non-standard port is often impossible in corporate
environment.
### Job termination
With the previous design, `lava-logs` and `lava-master` where both
responsible for terminating a job. This was sometime leading to a dead lock
where the job was waiting forever for its end.
This is not possible anymore as `lava-server-gunicorn` is responsible for
both the logs and the job termination.
### Simplifying the architecture
By using HTTP(s) instead of ZMQ, we are able to decrease the number of
services running on the server. We are planning to also drop the need for
`lava-coordinator` in the future.
Using HTTP(s) also allow to decrease the number of network ports that the
server should listen to. This is simplifying deployment and help hosting
many instances on the same physical server.
## Services
The following services has been dropped:
* `lava-logs`: the logs are sent directly to `lava-server-gunicorn`
* `lava-master`: the workers are pulling jobs from `lava-server-gunicorn`
This release is introducing a new service called `lava-scheduler` that is
solely responsible for scheduling jobs.
In this release `lava-slave` has been rewritten from scratch and renamed
`lava-worker`.
## Version mismatch
With previous LAVA version, `lava-master` was not checking `lava-slave`
version. This was sometime leading to strange behavior when the server was
upgraded but not the dispatcher.
`lava-server-gunicorn` is now able to check the `lava-worker` version every
time the service is requesting jobs to run.
In the event of a version mismatch, the server will put the worker offline,
refusing to start jobs on this worker.
When it's safe to stop the worker (the worker is done with the current set
of jobs), the server will return a specific error. If you use the new [LAVA
docker worker](#lava-docker-worker), `lava-worker` will be automatically
upgraded to the server version whenever needed.
## Upgrading
After the upgrade, every worker will be inactive as the `lava-worker`
services won't be able to connect to `lava-server-gunicorn`.
For each worker, admins will have to update the configuration.
* Update the `URL` variable in the worker configuration
(`/etc/lava-dispatcher/lava-worker`). This is the full URL to the server.
* Add the worker token in `/var/lib/lava/dispatcher/worker/token`. Admins
can find the token in the worker admin page at [
http://INSTANCE/admin/lava_scheduler_app/worker/WORKER_NAME/change/](http:/…
.
* restart `lava-worker`
# LAVA docker worker
This release introduces a program called `lava-docker-worker` that runs a
LAVA worker inside a Docker container. This script is provided by the
`lava-dispatcher-host` package, and has has the following features:
* Takes the same parameters as regular `lava-worker`.
* Detects the LAVA version of the server, and runs the worker from the same
LAVA version
* Automatically upgrades the worker when the server upgrades.
* Docker containers started by it are it siblings and not children, i.e.
they will run under the host system directly.
This worker in Docker should support most user cases that are supported by
the regular LAVA worker, except running LXC containers.
It's important to note that the container started by `lava-docker-worker`
runs in privileged mode and with host networking, what means that it is
less isolated from the host system as you would usually expect application
containers to be:
- it has access to **all** devices under `/dev`.
- it uses the same networking stack as the host system.
In this case, you should consider `lava-docker-worker` as a distribution
facilitator, not as an isolation mechanism. You should not run
`lava-docker-worker` on a host where you wouldn't run the regular LAVA
worker.
# Bug link
The possibility to link a bug to a specific test job or result as been
dropped. This feature was generating a huge load on the database server
without a real benefits.
# Tests from tar
Starting from this release, LAVA can pull tests from a tar archive instead
of a git repository.
The job definition will look like:
```yaml
- test:
name: basic-linux-smoke
timeout:
minutes: 10
definitions:
- repository:
https://github.com/Linaro/test-definitions/archive/2019.03.tar.gz
from: url
path: automated/linux/smoke/smoke.yaml
name: linux-smoke
compression: gz
```
# LAVA job id
LAVA is now exporting the job id to the lava test shell environment. The
variable is called `LAVA_JOB_ID` and can be used with
```shell
echo "$LAVA_JOB_ID"
```
We are willing to export more LAVA data as environment variable in the
future.
Thanks
--
Rémi Duraffort
LAVA Architect
Linaro