fastboot in docker support

List overview All Threads
Download

newer

older

Ask plan for docker test shell.

tee support breaking job with Bug...

Stephen Lawrence

25 Jun 2020 25 Jun '20

2:53 a.m.

Hi,

I have some questions about the fastboot/adb in docker support I hope you can help with. Use case is for android 10 aosp testing with LAVA 2020.05.

Thanks to Antonio's presentation and draft documentation I have simple fastboot host to DUT communication working for a u-boot based arm64 board. I am now trying to apply an existing flash process, which uses a script on the host to send fastboot cmds, into a LAVA job.

I can see how fastboot --set-active, reboot and flash commands all have equivalent controls in a 'deploy to fastboot docker' LAVA job section. Do equivalents exist for the fastboot oem format, oem erase, format and erase commands or is there a way to insert them in the deploy?

Expecting that will take some engineering work, in parallel I wanted as a stop gap to try running the flash script from a LAVA job. So people could work on the testing side whilst I resolved the deploy section. Antonio suggested trying to do that from the test section I recall. To do that I face two issues: 1) The build artifacts are on a local volume rather than an artifact server so I need to get them into the docker container in an automated way. Is there a way to either mount a local volume or file list into the container without asking LAVA to deploy them?

As an experiment I tried using the docker image manipulation added in 2020.04 to do this. There I hit a problem with the current implementation. It seems the 'deploy to downloads' implementation does not check for a local image first, as the other docker support does, before trying to pull the image. So I get an error when the pull fails: https://lava.genivi.org/scheduler/job/961#L24

2) The job needs to be able to boot the DUT, interrupt u-boot and start fastboot so the host has something to communicate with once the test section is reached.

I can achieve that in a horrible hacky way by having a deploy section with a simple single flash image (dtbo say) and using the reboot controls to get the board to reboot into fastboot to await the flash script being run from the test section, but I expect there is a better way. Any ideas?

Regards

Steve

Show replies by date

Milosz Wasilewski

25 Jun 25 Jun

5:48 p.m.

New subject: [Lava-users] fastboot in docker support

Stephen,

I think what you want already works (sort of)

On Wed, 24 Jun 2020 at 19:53, Stephen Lawrence stephen.lawrence@renesas.com wrote:

...

Hi,

I have some questions about the fastboot/adb in docker support I hope you can help with. Use case is for android 10 aosp testing with LAVA 2020.05.

Thanks to Antonio's presentation and draft documentation I have simple fastboot host to DUT communication working for a u-boot based arm64 board. I am now trying to apply an existing flash process, which uses a script on the host to send fastboot cmds, into a LAVA job.

I can see how fastboot --set-active, reboot and flash commands all have equivalent controls in a 'deploy to fastboot docker' LAVA job section. Do equivalents exist for the fastboot oem format, oem erase, format and erase commands or is there a way to insert them in the deploy?

I tried on our staging instance and I hit a bug in schema validation but in general this definition should do what you want: https://hastebin.com/aqegomijez.cs There is however a bug in job validation that will not allow wait_for_prompt key in interactive test. I'll send the patch to fix it in a moment. This is an easy fix so I expect it to be merged soon and available in master branch next week.

...

Expecting that will take some engineering work, in parallel I wanted as a stop gap to try running the flash script from a LAVA job. So people could work on the testing side whilst I resolved the deploy section. Antonio suggested trying to do that from the test section I recall. To do that I face two issues:

The build artifacts are on a local volume rather than an artifact server so I need to

get them into the docker container in an automated way. Is there a way to either mount a local volume or file list into the container without asking LAVA to deploy them?

As an experiment I tried using the docker image manipulation added in 2020.04 to do this. There I hit a problem with the current implementation. It seems the 'deploy to downloads' implementation does not check for a local image first, as the other docker support does, before trying to pull the image. So I get an error when the pull fails: https://lava.genivi.org/scheduler/job/961#L24

The job needs to be able to boot the DUT, interrupt u-boot and start fastboot

so the host has something to communicate with once the test section is reached.

- boot: prompts: - "=>" timeout: minutes: 15 method: u-boot

- test: interactive: - name: fastboot prompts: ['=> ', '/ # '] script: - command: fastboot 1 name: fastboot-1 wait_for_prompt: false

This is the part of job definition you need. Let me fix the schema parsing bug and you'll be able to use it.

...

I can achieve that in a horrible hacky way by having a deploy section with a simple single flash image (dtbo say) and using the reboot controls to get the board to reboot into fastboot to await the flash script being run from the test section, but I expect there is a better way. Any ideas?

I also tried this:

- boot: prompts: - "=>" timeout: minutes: 15 method: u-boot commands: - fastboot 1

But it waits for prompt. Since fastboot doesn't exit and return a code lava hangs waiting. So IMHO the 'cleanest' way at this point is with interactive test starting fasboot and test section doing oem commands.

milosz

...

Regards

Steve

Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users

Milosz Wasilewski

6:10 p.m.

New subject: [Lava-users] fastboot in docker support

On Thu, 25 Jun 2020 at 10:48, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

...

Stephen,

I think what you want already works (sort of)

On Wed, 24 Jun 2020 at 19:53, Stephen Lawrence stephen.lawrence@renesas.com wrote:

...
Hi,

I have some questions about the fastboot/adb in docker support I hope you can help with. Use case is for android 10 aosp testing with LAVA 2020.05.

Thanks to Antonio's presentation and draft documentation I have simple fastboot host to DUT communication working for a u-boot based arm64 board. I am now trying to apply an existing flash process, which uses a script on the host to send fastboot cmds, into a LAVA job.

I can see how fastboot --set-active, reboot and flash commands all have equivalent controls in a 'deploy to fastboot docker' LAVA job section. Do equivalents exist for the fastboot oem format, oem erase, format and erase commands or is there a way to insert them in the deploy?

I tried on our staging instance and I hit a bug in schema validation but in general this definition should do what you want: https://hastebin.com/aqegomijez.cs There is however a bug in job validation that will not allow wait_for_prompt key in interactive test. I'll send the patch to fix it in a moment. This is an easy fix so I expect it to be merged soon and available in master branch next week.

...
Expecting that will take some engineering work, in parallel I wanted as a stop gap to try running the flash script from a LAVA job. So people could work on the testing side whilst I resolved the deploy section. Antonio suggested trying to do that from the test section I recall. To do that I face two issues:

The build artifacts are on a local volume rather than an artifact server so I need to

get them into the docker container in an automated way. Is there a way to either mount a local volume or file list into the container without asking LAVA to deploy them?

As an experiment I tried using the docker image manipulation added in 2020.04 to do this. There I hit a problem with the current implementation. It seems the 'deploy to downloads' implementation does not check for a local image first, as the other docker support does, before trying to pull the image. So I get an error when the pull fails: https://lava.genivi.org/scheduler/job/961#L24

The job needs to be able to boot the DUT, interrupt u-boot and start fastboot

so the host has something to communicate with once the test section is reached.

boot: prompts:

"=>"

timeout: minutes: 15 method: u-boot

Sorry, I was a bit too quick. The proper boot section is:

- boot: prompts: - "=>" timeout: minutes: 15 method: bootloader bootloader: u-boot commands: []

milosz

...

test: interactive:

name: fastboot prompts: ['=> ', '/ # '] script:

command: fastboot 1 name: fastboot-1 wait_for_prompt: false

This is the part of job definition you need. Let me fix the schema parsing bug and you'll be able to use it.

...
I can achieve that in a horrible hacky way by having a deploy section with a simple single flash image (dtbo say) and using the reboot controls to get the board to reboot into fastboot to await the flash script being run from the test section, but I expect there is a better way. Any ideas?

I also tried this:

boot: prompts:

"=>"

timeout: minutes: 15 method: u-boot commands:

fastboot 1

But it waits for prompt. Since fastboot doesn't exit and return a code lava hangs waiting. So IMHO the 'cleanest' way at this point is with interactive test starting fasboot and test section doing oem commands.

milosz

...
Regards

Steve

Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users

Milosz Wasilewski

6:34 p.m.

New subject: [Lava-users] fastboot in docker support

On Thu, 25 Jun 2020 at 11:10, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

...

On Thu, 25 Jun 2020 at 10:48, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

...
Stephen,

I think what you want already works (sort of)

On Wed, 24 Jun 2020 at 19:53, Stephen Lawrence stephen.lawrence@renesas.com wrote:

...
Hi,

I have some questions about the fastboot/adb in docker support I hope you can help with. Use case is for android 10 aosp testing with LAVA 2020.05.

Thanks to Antonio's presentation and draft documentation I have simple fastboot host to DUT communication working for a u-boot based arm64 board. I am now trying to apply an existing flash process, which uses a script on the host to send fastboot cmds, into a LAVA job.

I can see how fastboot --set-active, reboot and flash commands all have equivalent controls in a 'deploy to fastboot docker' LAVA job section. Do equivalents exist for the fastboot oem format, oem erase, format and erase commands or is there a way to insert them in the deploy?

I tried on our staging instance and I hit a bug in schema validation but in general this definition should do what you want: https://hastebin.com/aqegomijez.cs There is however a bug in job validation that will not allow wait_for_prompt key in interactive test. I'll send the patch to fix it in a moment. This is an easy fix so I expect it to be merged soon and available in master branch next week.

...
Expecting that will take some engineering work, in parallel I wanted as a stop gap to try running the flash script from a LAVA job. So people could work on the testing side whilst I resolved the deploy section. Antonio suggested trying to do that from the test section I recall. To do that I face two issues:

The build artifacts are on a local volume rather than an artifact server so I need to

get them into the docker container in an automated way. Is there a way to either mount a local volume or file list into the container without asking LAVA to deploy them?

As an experiment I tried using the docker image manipulation added in 2020.04 to do this. There I hit a problem with the current implementation. It seems the 'deploy to downloads' implementation does not check for a local image first, as the other docker support does, before trying to pull the image. So I get an error when the pull fails: https://lava.genivi.org/scheduler/job/961#L24

The job needs to be able to boot the DUT, interrupt u-boot and start fastboot

so the host has something to communicate with once the test section is reached.

boot: prompts:

"=>"

timeout: minutes: 15 method: u-boot

Sorry, I was a bit too quick. The proper boot section is:

boot: prompts:

"=>"

timeout: minutes: 15 method: bootloader bootloader: u-boot commands: []

milosz

I need to correct myself once more. It actually works :) https://staging.validation.linaro.org/scheduler/job/273595

Proper deploy/boot/test sequence is: - deploy: docker: image: miloszwasilewski/adb-fastboot timeout: minutes: 15 to: fastboot images: xloader: url: https://images.validation.linaro.org/snapshots.linaro.org/android/lkft/lkft-... bootloader: url: https://images.validation.linaro.org/snapshots.linaro.org/android/lkft/lkft-... os: debian - boot: prompts: - "=>" timeout: minutes: 15 method: bootloader bootloader: u-boot commands: ['fastboot 1'] - test: docker: image: miloszwasilewski/adb-fastboot timeout: minutes: 30 definitions: - from: inline name: fastboot-oem-format path: inline/fastboot.yaml repository: metadata: format: Lava-Test Test Definition 1.0 name: fastboot-oem description: fastboot oem run: steps: - fastboot oem format - fastboot erase super - fastboot erase boot

Full job definition in the link above.

milosz

...

...

test: interactive:

name: fastboot prompts: ['=> ', '/ # '] script:

command: fastboot 1 name: fastboot-1 wait_for_prompt: false

This is the part of job definition you need. Let me fix the schema parsing bug and you'll be able to use it.

...
I can achieve that in a horrible hacky way by having a deploy section with a simple single flash image (dtbo say) and using the reboot controls to get the board to reboot into fastboot to await the flash script being run from the test section, but I expect there is a better way. Any ideas?

I also tried this:

boot: prompts:

"=>"

timeout: minutes: 15 method: u-boot commands:

fastboot 1

But it waits for prompt. Since fastboot doesn't exit and return a code lava hangs waiting. So IMHO the 'cleanest' way at this point is with interactive test starting fasboot and test section doing oem commands.

milosz

...
Regards

Steve

Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users

Stephen Lawrence

30 Jun 30 Jun

3:13 a.m.

New subject: [Lava-users] fastboot in docker support

Hi Milosz,

...

-----Original Message----- From: Milosz Wasilewski milosz.wasilewski@linaro.org Sent: 25 June 2020 11:34 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...

I need to correct myself once more. It actually works :) https://staging.validation.linaro.org/scheduler/job/273595

Proper deploy/boot/test sequence is:

deploy: docker: image: miloszwasilewski/adb-fastboot timeout: minutes: 15 to: fastboot images: xloader: url: https://images.validation.linaro.org/snapshots.linaro.org/android/lkft/lkft-

aosp-master-x15/107/MLO bootloader: url: https://images.validation.linaro.org/snapshots.linaro.org/android/lkft/lkft- aosp-master-x15/107/u-boot.img os: debian

boot: prompts:

"=>"

timeout: minutes: 15 method: bootloader bootloader: u-boot commands: ['fastboot 1']

test: docker: image: miloszwasilewski/adb-fastboot timeout: minutes: 30 definitions:

from: inline name: fastboot-oem-format path: inline/fastboot.yaml repository: metadata: format: Lava-Test Test Definition 1.0 name: fastboot-oem description: fastboot oem run: steps: - fastboot oem format - fastboot erase super - fastboot erase boot

Full job definition in the link above.

I tried this largely following your own template as a first step. The major difference being I didn't fastboot flash u-boot as a first step as the correct version is already on the board.

The u-boot section worked as expected and successfully interrupted u-boot and executed fastboot.

However I hit a problem with the inline test section to run the custom fastboot commands. The docker image is never started as the WaitDeviceBoardID() check doesn't find the board and instead eventually times out: https://lava.genivi.org/scheduler/job/980#L210

That is weird as I know the device is available and indeed if I do a simple deploy to fastboot via docker image the device is found and the board flashed: https://lava.genivi.org/scheduler/job/982#L140

I took a look at the test docker.py and deploy docker.py. Although there is some difference between the code that gets to a board ID to pass in both appear to result in the same (correct) serial ID of 00025269, as the WaitDeviceBoardID debug code in udev.py reports.

Any ideas as to what could be causing the difference in behaviour?

Regards

Steve

Milosz Wasilewski

4:02 p.m.

New subject: [Lava-users] fastboot in docker support

On Mon, 29 Jun 2020 at 20:13, Stephen Lawrence stephen.lawrence@renesas.com wrote:

...

Hi Milosz,

...
-----Original Message----- From: Milosz Wasilewski milosz.wasilewski@linaro.org Sent: 25 June 2020 11:34 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...
I need to correct myself once more. It actually works :) https://staging.validation.linaro.org/scheduler/job/273595

Proper deploy/boot/test sequence is:

deploy: docker: image: miloszwasilewski/adb-fastboot timeout: minutes: 15 to: fastboot images: xloader: url: https://images.validation.linaro.org/snapshots.linaro.org/android/lkft/lkft-

aosp-master-x15/107/MLO bootloader: url: https://images.validation.linaro.org/snapshots.linaro.org/android/lkft/lkft- aosp-master-x15/107/u-boot.img os: debian

boot: prompts:

"=>"

timeout: minutes: 15 method: bootloader bootloader: u-boot commands: ['fastboot 1']

test: docker: image: miloszwasilewski/adb-fastboot timeout: minutes: 30 definitions:

from: inline name: fastboot-oem-format path: inline/fastboot.yaml repository: metadata: format: Lava-Test Test Definition 1.0 name: fastboot-oem description: fastboot oem run: steps: - fastboot oem format - fastboot erase super - fastboot erase boot

Full job definition in the link above.

I tried this largely following your own template as a first step. The major difference being I didn't fastboot flash u-boot as a first step as the correct version is already on the board.

The u-boot section worked as expected and successfully interrupted u-boot and executed fastboot.

However I hit a problem with the inline test section to run the custom fastboot commands. The docker image is never started as the WaitDeviceBoardID() check doesn't find the board and instead eventually times out: https://lava.genivi.org/scheduler/job/980#L210

That is weird as I know the device is available and indeed if I do a simple deploy to fastboot via docker image the device is found and the board flashed: https://lava.genivi.org/scheduler/job/982#L140

I took a look at the test docker.py and deploy docker.py. Although there is some difference between the code that gets to a board ID to pass in both appear to result in the same (correct) serial ID of 00025269, as the WaitDeviceBoardID debug code in udev.py reports.

Any ideas as to what could be causing the difference in behaviour?

Try adding this to your device dictionary:

{% device_info = [{'board_id': '00025269'}] %}

It should make the wait function happy and present the device to the container.

milosz

Stephen Lawrence

1 Jul 1 Jul

12:59 a.m.

New subject: [Lava-users] fastboot in docker support

Hi Milosz,

...

-----Original Message----- From: Milosz Wasilewski milosz.wasilewski@linaro.org Sent: 30 June 2020 09:03 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...

...
Any ideas as to what could be causing the difference in behaviour?

Try adding this to your device dictionary:

{% device_info = [{'board_id': '00025269'}] %}

It should make the wait function happy and present the device to the container.

Thank you for the suggestion. Unfortunately it didn't fix it and so I am investigating further.

I had the following in my device-template (was waiting to get this reliably working before upstreaming): {% set fastboot_deploy_uboot_commands = fastboot_deploy_uboot_commands|default(['fastboot usb 24']) %} {% set fastboot_serial_number = fastboot_serial_number|default("0000000000") %} {% set adb_serial_number = adb_serial_number|default(fastboot_serial_number) %} {% set device_info = device_info|default([{'board_id': fastboot_serial_number}]) %}

That produced the following rendered DUT yaml: device_info: [{'board_id': '00025269'}] adb_serial_number: "00025269" fastboot_serial_number: "00025269"

Today I added device_info to the existing fastboot/adb serial numbers in the device dictionary: {% set fastboot_serial_number = '00025269' %} {% set adb_serial_number = '00025269' %} {% set device_info = [{'board_id': '00025269'}] %} then updated with lavacli.

I had seen this udev.py failure when first working with the new docker changes in 2020.02. Then the fix was to shared the udev database (/run/udev) into the container. So I was wondering if I had missed a change when migrating from 2020.02 to 2020.05 and that's a check I need to complete but it is weird that the WaitDeviceBoardID succeeds for the simple deploy case. So it does not appear to be a simple containerisation issue.

Regards

Steve

Milosz Wasilewski

1:35 a.m.

New subject: [Lava-users] fastboot in docker support

On Tue, 30 Jun 2020 at 17:59, Stephen Lawrence stephen.lawrence@renesas.com wrote:

...

Hi Milosz,

...
-----Original Message----- From: Milosz Wasilewski milosz.wasilewski@linaro.org Sent: 30 June 2020 09:03 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...
...
Any ideas as to what could be causing the difference in behaviour?

Try adding this to your device dictionary:

{% device_info = [{'board_id': '00025269'}] %}

It should make the wait function happy and present the device to the container.

Thank you for the suggestion. Unfortunately it didn't fix it and so I am investigating further.

I had the following in my device-template (was waiting to get this reliably working before upstreaming): {% set fastboot_deploy_uboot_commands = fastboot_deploy_uboot_commands|default(['fastboot usb 24']) %} {% set fastboot_serial_number = fastboot_serial_number|default("0000000000") %} {% set adb_serial_number = adb_serial_number|default(fastboot_serial_number) %} {% set device_info = device_info|default([{'board_id': fastboot_serial_number}]) %}

That produced the following rendered DUT yaml: device_info: [{'board_id': '00025269'}] adb_serial_number: "00025269" fastboot_serial_number: "00025269"

Today I added device_info to the existing fastboot/adb serial numbers in the device dictionary: {% set fastboot_serial_number = '00025269' %} {% set adb_serial_number = '00025269' %} {% set device_info = [{'board_id': '00025269'}] %} then updated with lavacli.

I had seen this udev.py failure when first working with the new docker changes in 2020.02. Then the fix was to shared the udev database (/run/udev) into the container. So I was wondering if I had missed a change when migrating from 2020.02 to 2020.05 and that's a check I need to complete but it is weird that the WaitDeviceBoardID succeeds for the simple deploy case. So it does not appear to be a simple containerisation issue.

IIRC there was some udev change between 2020.02 and 2020.05. @Antonio, could you take a look at this thread?

milosz

...

Regards

Steve

Stephen Lawrence

1:51 a.m.

New subject: [Lava-users] fastboot in docker support

...

-----Original Message----- From: Milosz Wasilewski milosz.wasilewski@linaro.org Sent: 30 June 2020 18:35 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...

IIRC there was some udev change between 2020.02 and 2020.05. @Antonio, could you take a look at this thread?

OK thanks.

I have confirmed that for the failing case that the board is up and that executing 'fastboot devices' from the host container will result in the board being reported with the expected ID. So fastboot comms is working.

I'm using the Linaro 2020.05 docker image configured via lava-docker.

Regards

Steve

Antonio Terceiro

1:55 a.m.

New subject: [Lava-users] fastboot in docker support

On Tue, Jun 30, 2020 at 04:59:44PM +0000, Stephen Lawrence wrote:

...

Hi Milosz,

...
-----Original Message----- From: Milosz Wasilewski milosz.wasilewski@linaro.org Sent: 30 June 2020 09:03 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...
...
Any ideas as to what could be causing the difference in behaviour?

Try adding this to your device dictionary:

{% device_info = [{'board_id': '00025269'}] %}

It should make the wait function happy and present the device to the container.

Thank you for the suggestion. Unfortunately it didn't fix it and so I am investigating further.

I had the following in my device-template (was waiting to get this reliably working before upstreaming): {% set fastboot_deploy_uboot_commands = fastboot_deploy_uboot_commands|default(['fastboot usb 24']) %} {% set fastboot_serial_number = fastboot_serial_number|default("0000000000") %} {% set adb_serial_number = adb_serial_number|default(fastboot_serial_number) %} {% set device_info = device_info|default([{'board_id': fastboot_serial_number}]) %}

That produced the following rendered DUT yaml: device_info: [{'board_id': '00025269'}] adb_serial_number: "00025269" fastboot_serial_number: "00025269"

Today I added device_info to the existing fastboot/adb serial numbers in the device dictionary: {% set fastboot_serial_number = '00025269' %} {% set adb_serial_number = '00025269' %} {% set device_info = [{'board_id': '00025269'}] %} then updated with lavacli.

I had seen this udev.py failure when first working with the new docker changes in 2020.02. Then the fix was to shared the udev database (/run/udev) into the container. So I was wondering if I had missed a change when migrating from 2020.02 to 2020.05 and that's a check I need to complete but it is weird that the WaitDeviceBoardID succeeds for the simple deploy case. So it does not appear to be a simple containerisation issue.

The current implementation shares the devices with the container via the docker run --device= option, and for that it needs the device to be available (hence the WaitDeviceBoardID). That works for simple test jobs but doesn't for more complex ones.

I'm finishing a set of patches right now that changes the implementation and solves this by not waiting forever, sharing any devices that are already available with the container upfront, and adding dynamic mappings so that devices are shared with the container on udev events when they appear.

Stephen Lawrence

9:56 p.m.

New subject: [Lava-users] fastboot in docker support

Hi Antonio,

...

-----Original Message----- From: Lava-users lava-users-bounces@lists.lavasoftware.org On Behalf Of Antonio Terceiro Sent: 30 June 2020 18:55 To: lava-users@lists.lavasoftware.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...

...
I had seen this udev.py failure when first working with the new docker changes in

2020.02.

...
Then the fix was to shared the udev database (/run/udev) into the container. So I was wondering if I had missed a change when migrating from 2020.02 to 2020.05 and that's a check I need to complete but it is weird that the WaitDeviceBoardID succeeds for the

simple

...
deploy case. So it does not appear to be a simple containerisation issue.

The current implementation shares the devices with the container via the docker run --device= option, and for that it needs the device to be available (hence the WaitDeviceBoardID). That works for simple test jobs but doesn't for more complex ones.

I'm finishing a set of patches right now that changes the implementation and solves this by not waiting forever, sharing any devices that are already available with the container upfront, and adding dynamic mappings so that devices are shared with the container on udev events when they appear.

OK. I look forward to the patches. The links I provided are to an oss alliance lava instance so you can get information on the setup, but if there is anything information you want to check your changes cover this simple case then please get in touch.

Whilst I wait on the patches I'll try disabling the wait test in the test python. I've not checked but hopefully the code determining which USB device to pass into the docker run is independent of it. I know the container has what it needs to complete the fastboot communication so if the correct device is passed then it should work if the wait is skipped and the container started.

In parallel I might also try running the flash script from a test section although that approach requires I deal with getting the images into the container.

Regards

Steve

Stephen Lawrence

2 Jul 2 Jul

1:37 a.m.

New subject: [Lava-users] fastboot in docker support

Hi,

...

-----Original Message----- From: Stephen Lawrence Sent: 01 July 2020 14:56 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: RE: [Lava-users] fastboot in docker support

[snip]

...

Whilst I wait on the patches I'll try disabling the wait test in the test python. I've not checked but hopefully the code determining which USB device to pass into the docker run is independent of it. I know the container has what it needs to complete the fastboot communication so if the correct device is passed then it should work if the wait is skipped and the container started.

In case anyone was wondering disabling the wait test just resulted in a different error soon after and before the container was started: https://lava.genivi.org/scheduler/job/992#L210 which appears to be in the get_udev_devices() func in udev.py. Seemingly as it determines the usb dev to share in the docker run.

...

In parallel I might also try running the flash script from a test section although that approach requires I deal with getting the images into the container.

So will now try that whilst waiting on the patches.

Regards

Steve

Antonio Terceiro

6 Jul 6 Jul

8:03 p.m.

New subject: [Lava-users] fastboot in docker support

On Wed, Jul 01, 2020 at 05:37:58PM +0000, Stephen Lawrence wrote:

...

Hi,

...
-----Original Message----- From: Stephen Lawrence Sent: 01 July 2020 14:56 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: RE: [Lava-users] fastboot in docker support

[snip]

...
Whilst I wait on the patches I'll try disabling the wait test in the test python. I've not checked but hopefully the code determining which USB device to pass into the docker run is independent of it. I know the container has what it needs to complete the fastboot communication so if the correct device is passed then it should work if the wait is skipped and the container started.

In case anyone was wondering disabling the wait test just resulted in a different error soon after and before the container was started: https://lava.genivi.org/scheduler/job/992#L210 which appears to be in the get_udev_devices() func in udev.py. Seemingly as it determines the usb dev to share in the docker run.

...
In parallel I might also try running the flash script from a test section although that approach requires I deal with getting the images into the container.

So will now try that whilst waiting on the patches.

FWIW they are up in merge request 1238: https://git.lavasoftware.org/lava/lava/-/merge_requests/1238

Stephen Lawrence

9:27 p.m.

New subject: [Lava-users] fastboot in docker support

...

-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 06 July 2020 13:04 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...

FWIW they are up in merge request 1238: https://git.lavasoftware.org/lava/lava/-/merge_requests/1238

Thanks for the heads up.

I fixed the docker support in the lava utils Friday. I will send the PR today as a RFC. I was uncertain about logging and error msg handling as I see some approach differences in different existing modules so it could do with a review by maintainers.

Regards

Steve

Stephen Lawrence

7 Jul 7 Jul

1:53 a.m.

New subject: [Lava-users] fastboot in docker support

...

-----Original Message----- From: Stephen Lawrence Sent: 06 July 2020 14:27 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: RE: [Lava-users] fastboot in docker support

I fixed the docker support in the lava utils Friday. I will send the PR today as a RFC. I was uncertain about logging and error msg handling as I see some approach differences in different existing modules so it could do with a review by maintainers.

Sorry Antonio I spoke too soon. My memory was I had fixed it but when I went back and looked I had only hacked it to prove it could do what I needed. I had some good help and wanted to give something back with a PR. However looking at it the changes are somewhat wide ranging, including the test schema and I am not so familiar with the LAVA error and log mechanisms. Its probably best done by someone more familiar with the codebase. Should I go ahead and document my findings in a gitlab issue?

Regards

Steve

Antonio Terceiro

4:02 a.m.

New subject: [Lava-users] fastboot in docker support

On Mon, Jul 06, 2020 at 05:53:06PM +0000, Stephen Lawrence wrote:

...

...
-----Original Message----- From: Stephen Lawrence Sent: 06 July 2020 14:27 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: RE: [Lava-users] fastboot in docker support

I fixed the docker support in the lava utils Friday. I will send the PR today as a RFC. I was uncertain about logging and error msg handling as I see some approach differences in different existing modules so it could do with a review by maintainers.

Sorry Antonio I spoke too soon. My memory was I had fixed it but when I went back and looked I had only hacked it to prove it could do what I needed. I had some good help and wanted to give something back with a PR. However looking at it the changes are somewhat wide ranging, including the test schema and I am not so familiar with the LAVA error and log mechanisms. Its probably best done by someone more familiar with the codebase. Should I go ahead and document my findings in a gitlab issue?

Sure

Stephen Lawrence

8 Jul 8 Jul

1:33 a.m.

New subject: [Lava-users] fastboot in docker support

...

-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 06 July 2020 21:02 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...

...
Sorry Antonio I spoke too soon. My memory was I had fixed it but when I went back and looked I had only hacked it to prove it could do what I needed. I had some good help and wanted to give something back with a PR. However looking at it the changes are somewhat wide ranging, including the test schema and I am not so familiar with

the

...
LAVA error and log mechanisms. Its probably best done by someone more familiar

with

...
the codebase. Should I go ahead and document my findings in a gitlab issue?

Sure

Done: https://git.lavasoftware.org/lava/lava/-/issues/427

Regards

Steve

Stephen Lawrence

11 Jul 11 Jul

2:28 a.m.

New subject: [Lava-users] fastboot in docker support

Hi Antonio,

...

-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 06 July 2020 13:04 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

Antonio wrote:

...

FWIW they are up in merge request 1238: https://git.lavasoftware.org/lava/lava/-/merge_requests/1238

I am running the changes by updating to the 2020.07 Linaro docker image.

I found that in the simple case my jobs will now execute the docker container, e.g: https://lava.genivi.org/scheduler/job/1047

but when I do any actual fastboot work from the container the host waits for the device and eventually times out, e.g: https://lava.genivi.org/scheduler/job/1048#L245

I notice that no device is passed in the docker run. Is that expected behaviour?

I need to investigate further Monday but I wanted to ask if your udev changes bring any new requirements to device configuration such as device or device-type templates?

Regards

Steve

Antonio Terceiro

13 Jul 13 Jul

8:32 p.m.

New subject: [Lava-users] fastboot in docker support

On Fri, Jul 10, 2020 at 06:28:35PM +0000, Stephen Lawrence wrote:

...

Hi Antonio,

...
-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 06 July 2020 13:04 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

Antonio wrote:

...
FWIW they are up in merge request 1238: https://git.lavasoftware.org/lava/lava/-/merge_requests/1238

I am running the changes by updating to the 2020.07 Linaro docker image.

I found that in the simple case my jobs will now execute the docker container, e.g: https://lava.genivi.org/scheduler/job/1047

but when I do any actual fastboot work from the container the host waits for the device and eventually times out, e.g: https://lava.genivi.org/scheduler/job/1048#L245

I notice that no device is passed in the docker run. Is that expected behaviour?

It is now. the device is not not passed in the docker run command line, but shared with the container dynamically via lava-dispatcher-host either right at the beginning, or whenever the device appears to udev.

...

I need to investigate further Monday but I wanted to ask if your udev changes bring any new requirements to device configuration such as device or device-type templates?

You should not need any changes to the device config.

You said you are running the dispatcher in docker; how exactly are you doing that? For docker to work for this from a dispatcher running inside docker, you need:

- lava-dispatcher-host installed and configured on the host OK - /var/lib/lava/dispatcher/tmp and /run/udev from the host OS bind mounted in the dispatcher container - the docker socket from the host OS shared with the dispatcher containers, so that the containers started by the dispatcher are siblings of the dispatcher container and not its children.

Does that make sense? I have it on my TODO list to properly document this for the next release, hopefully I will be able to do it in a more comprehensible way.

Stephen Lawrence

11:28 p.m.

New subject: [Lava-users] fastboot in docker support

Hi,

...

-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 13 July 2020 13:32 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

On Fri, Jul 10, 2020 at 06:28:35PM +0000, Stephen Lawrence wrote:

...
Hi Antonio,

...
-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 06 July 2020 13:04 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

Antonio wrote:

...
FWIW they are up in merge request 1238: https://git.lavasoftware.org/lava/lava/-/merge_requests/1238

I am running the changes by updating to the 2020.07 Linaro docker image.

I found that in the simple case my jobs will now execute the docker container, e.g: https://lava.genivi.org/scheduler/job/1047

but when I do any actual fastboot work from the container the host waits for the

device

...
and eventually times out, e.g: https://lava.genivi.org/scheduler/job/1048#L245

I notice that no device is passed in the docker run. Is that expected behaviour?

It is now. the device is not not passed in the docker run command line, but shared with the container dynamically via lava-dispatcher-host either right at the beginning, or whenever the device appears to udev.

OK. Thank you for the summary.

...

...
I need to investigate further Monday but I wanted to ask if your udev changes bring

any

...
new requirements to device configuration such as device or device-type templates?

You should not need any changes to the device config.

OK. I wanted to check that off as a possible cause.

...

You said you are running the dispatcher in docker; how exactly are you doing that?

We are running the linaro docker image for 2020.07 via a fork of lava-docker commit b57379c6 [1]. [1] https://github.com/kernelci/lava-docker/commit/b57379c6b204870a568ccc7adcad1...

The fork carries instance specific changes for things like the DUTs, method of mounting artifacts etc. I would point you at the source rep but it contains some admin details. If there is anything you can't get from the public running instance then please ask.

You can find the "android tools" docker image definition here [2].

[2] https://github.com/gunnarx/linux-cip-ci/tree/master/android-tools-docker

...

For docker to work for this from a dispatcher running inside docker, you need:

lava-dispatcher-host installed and configured on the host OK

This I need to check as I am not familiar with the name. If it is a standard part of the dispatcher then it should be there but I will check.

...

/var/lib/lava/dispatcher/tmp and /run/udev from the host OS bind mounted in the dispatcher container

Yes we have both.

...

the docker socket from the host OS shared with the dispatcher containers, so that the containers started by the dispatcher are siblings of the dispatcher container and not its children.

This I think we saw this from our probing and testing of the code so should be the case, but again no harm in confirming.

...

Does that make sense? I have it on my TODO list to properly document this for the next release, hopefully I will be able to do it in a more comprehensible way.

I think so. I think we are close, the 'simple deploy' case fastboot comms works after all, it is the differences in behaviour between that and the boot and test docker handling that is tripping us up. We've been trying to compare against what few public instances of AOSP Android 10 we've found for something that might have been missed in the absence of a hard requirements list. So your reply helps. udev rules were another aspect I was wondering about.

Having written that the 'simple deploy' fastboot comms works I realise that was for 2020.05. I should also reconfirm that with 2020.07.

Regards

Steve

Stephen Lawrence

14 Jul 14 Jul

1:50 a.m.

New subject: [Lava-users] fastboot in docker support

Hi,

...

-----Original Message----- From: Stephen Lawrence Sent: 13 July 2020 16:29 To: 'Antonio Terceiro' antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: RE: [Lava-users] fastboot in docker support

Hi,

...
-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 13 July 2020 13:32 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...

...
For docker to work for this from a dispatcher running inside docker, you need:

lava-dispatcher-host installed and configured on the host OK

This I need to check as I am not familiar with the name. If it is a standard part of the dispatcher then it should be there but I will check.

Checking in the dispatcher docker I can confirm it is present. Both the lava_dispatcher_host and lava_dispatcher_host-2020.7.egg-info directories are present in the python3 dist-packages.

Harder to comment on configuration. Did you have a specific point of failure in mind?

...

...

/var/lib/lava/dispatcher/tmp and /run/udev from the host OS bind mounted in the dispatcher container

Yes we have both.

...

the docker socket from the host OS shared with the dispatcher containers, so that the containers started by the dispatcher are siblings of the dispatcher container and not its children.

This I think we saw this from our probing and testing of the code so should be the case, but again no harm in confirming.

I was pretty certain but have confirmed that yes the socket is shared into the dispatcher container: https://github.com/kernelci/lava-docker/blob/b57379c6b204870a568ccc7adcad171...

I went back and retested 'deploy to fastboot' under 2020.07 in case the behaviour changed from 2020.05. It still works so the deploy to fastboot specific docker run implementation that still passes the usb device works: https://lava.genivi.org/scheduler/job/1057

Therefore on a base level the dispatcher docker is allowing the host and device to find each other and the failure appears to relate to the different discovery and device passing code used by test/deploy to downloads. Could the new code bring requirements to the android-tool docker not needed before?

Regards

Steve

Antonio Terceiro

15 Jul 15 Jul

9:45 p.m.

New subject: [Lava-users] fastboot in docker support

On Mon, Jul 13, 2020 at 05:50:46PM +0000, Stephen Lawrence wrote:

...

Hi,

...
-----Original Message----- From: Stephen Lawrence Sent: 13 July 2020 16:29 To: 'Antonio Terceiro' antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: RE: [Lava-users] fastboot in docker support

Hi,

...
-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 13 July 2020 13:32 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...
...
For docker to work for this from a dispatcher running inside docker, you need:

lava-dispatcher-host installed and configured on the host OK

This I need to check as I am not familiar with the name. If it is a standard part of the dispatcher then it should be there but I will check.

Checking in the dispatcher docker I can confirm it is present. Both the lava_dispatcher_host and lava_dispatcher_host-2020.7.egg-info directories are present in the python3 dist-packages.

Harder to comment on configuration. Did you have a specific point of failure in mind?

There was a typo in my message: s/host OK/host OS/ i.e. lava-dispatcher-host needs to be installed and configured in the host *OS*. If you are using Debian then just having the package installed is enough as it does the setup for you

Otherwise you need to run `lava-dispatcher-host rules install` to have the udev rules installed.

...

...
...

/var/lib/lava/dispatcher/tmp and /run/udev from the host OS bind mounted in the dispatcher container

Yes we have both.

...

the docker socket from the host OS shared with the dispatcher containers, so that the containers started by the dispatcher are siblings of the dispatcher container and not its children.

This I think we saw this from our probing and testing of the code so should be the case, but again no harm in confirming.

I was pretty certain but have confirmed that yes the socket is shared into the dispatcher container: https://github.com/kernelci/lava-docker/blob/b57379c6b204870a568ccc7adcad171...

I went back and retested 'deploy to fastboot' under 2020.07 in case the behaviour changed from 2020.05. It still works so the deploy to fastboot specific docker run implementation that still passes the usb device works: https://lava.genivi.org/scheduler/job/1057

Therefore on a base level the dispatcher docker is allowing the host and device to find each other and the failure appears to relate to the different discovery and device passing code used by test/deploy to downloads. Could the new code bring requirements to the android-tool docker not needed before?

Yes, they work a bit differently.

For flashing lava itself just waits for the device to be available via udev, and since you have /run/udev mounted (and I assume also /dev) in the container it works because the device node is passed excplitily in the docker run call.

For the docker test action, first the docker container running your tests is spawned. we can't just pass the device node in the docker run call, because your tests can e.g. reboot the device, and we need it to become available again after rebooting, getting a new device number etc.

Therefor, for this case the device is shared dynamically with the container by lava-dispatcher-host, which is triggered from the udev daemon. Thus you need lava-dispatcher-host and its udev rules in the host operating system.

Stephen Lawrence

16 Jul 16 Jul

1:47 a.m.

New subject: [Lava-users] fastboot in docker support

Hi,

...

-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 15 July 2020 14:46 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...

...
Harder to comment on configuration. Did you have a specific point of failure in mind?

There was a typo in my message: s/host OK/host OS/ i.e. lava-dispatcher-host needs to be installed and configured in the host *OS*. If you are using Debian then just having the package installed is enough as it does the setup for you

Otherwise you need to run `lava-dispatcher-host rules install` to have the udev rules installed.

OK the worker docker is running Debian (buster) so we are ok then.

[snip]

...

...
Therefore on a base level the dispatcher docker is allowing the host and device to find each other and the failure appears to relate to the different discovery and device passing code used by test/deploy to downloads. Could the new code bring requirements to the android-tool docker not needed before?

Yes, they work a bit differently.

For flashing lava itself just waits for the device to be available via udev, and since you have /run/udev mounted (and I assume also /dev) in the container it works because the device node is passed excplitily in the docker run call.

For the docker test action, first the docker container running your tests is spawned. we can't just pass the device node in the docker run call, because your tests can e.g. reboot the device, and we need it to become available again after rebooting, getting a new device number etc.

Therefor, for this case the device is shared dynamically with the container by lava-dispatcher-host, which is triggered from the udev daemon. Thus you need lava-dispatcher-host and its udev rules in the host operating system.

Ok, thanks for the summary. Once I have tested and provided feedback on your 'deploy to downloads' PR I'll have a look at trying to debug the new udev discovery failure.

Regards

Steve

Stephen Lawrence

1 Aug 1 Aug

1:20 a.m.

New subject: [Lava-users] fastboot in docker support

Hi,

...

-----Original Message----- From: Stephen Lawrence Sent: 15 July 2020 18:48 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: RE: [Lava-users] fastboot in docker support

...

...
...
Therefore on a base level the dispatcher docker is allowing the host and device to find each other and the failure appears to relate to the different discovery and device passing code used by test/deploy to downloads. Could the new code bring requirements to the android-tool docker not needed before?

Yes, they work a bit differently.

For flashing lava itself just waits for the device to be available via udev, and since you have /run/udev mounted (and I assume also /dev) in the container it works because the device node is passed excplitily in the docker run call.

For the docker test action, first the docker container running your tests is spawned. we can't just pass the device node in the docker run call, because your tests can e.g. reboot the device, and we need it to become available again after rebooting, getting a new device number etc.

Therefor, for this case the device is shared dynamically with the container by lava-dispatcher-host, which is triggered from the udev daemon. Thus you need lava-dispatcher-host and its udev rules in the host operating system.

Ok, thanks for the summary. Once I have tested and provided feedback on your 'deploy to downloads' PR I'll have a look at trying to debug the new udev discovery failure.

Ok so with the local docker image support added to 'test with docker' and 'deploy to downloads' (thanks for that Antonio) I went back to try running fastboot from both. So I was running your PR 1268 (downloads: support local docker image for postprocessing)

1) 'test with docker': The udev discovery appears to be working now. It finds the device and attempts to pass it to the test docker. Testing did show an issue when the Worker is running in a docker container. I got a file error when the dispatcher could not access the sysfs devices.allow file for the test docker devices: https://lava.genivi.org/scheduler/job/1113#L193

That of course is just an issue of resource access from the Worker docker, but I just mention it as a fyi and in case it helps anyone who finds this later. As a stop gap, until I can consider the most secure method, I am currently working around this by sharing /sys/fs/cgroup/ into the Worker container. That works and I can now have 'test with docker' interacting with fastboot: https://lava.genivi.org/scheduler/job/1118#L232

Thank you to everyone who helped in getting it going. Just need to get the images into the test docker now. Hopefully I can do that with 'deploy to downloads' or the ability to mount a volume if it comes.

2) 'deploy to downloads': Here the behaviour is different to 'test with docker'. There appears to be no passing of the device and running fastboot from the host results in an eventual timeout. For example: https://lava.genivi.org/scheduler/job/1111#L142

For my flashing use case it may be that udev discovery support in 'test with docker' is sufficient. I thought I would raise the difference in behaviour in case you wanted to make it consistent at some point. Do you want me to raise an issue with the details?

Best Wishes,

Steve

Stephen Lawrence

6 Aug 6 Aug

3:22 a.m.

New subject: [Lava-users] fastboot in docker support

Hi Antonio,

We are really close to having this working now. We just seem to have a problem with udev on board restart. Details below.

...

-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 15 July 2020 14:46 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...

Yes, they work a bit differently.

For flashing lava itself just waits for the device to be available via udev, and since you have /run/udev mounted (and I assume also /dev) in the container it works because the device node is passed excplitily in the docker run call.

For the docker test action, first the docker container running your tests is spawned. we can't just pass the device node in the docker run call, because your tests can e.g. reboot the device, and we need it to become available again after rebooting, getting a new device number etc.

Therefor, for this case the device is shared dynamically with the container by lava-dispatcher-host, which is triggered from the udev daemon. Thus you need lava-dispatcher-host and its udev rules in the host operating system.

This job closely represents our use case: https://lava.genivi.org/scheduler/job/1149

The new udev code finds the device and shares it into the test docker: https://lava.genivi.org/scheduler/job/1149#L529

The test docker can successfully list the fastboot devices so we know the test docker can communicate with the DUT so the new mechanism is working up to this point: https://lava.genivi.org/scheduler/job/1149#L584

However when the flash script executes 'fastboot reboot-bootloader' the host sits there waiting for comms to be restored and ultimately times out: https://lava.genivi.org/scheduler/job/1149#L626

It looks as if the test docker is not seeing the device when it reappears.

I simplified the test case to a simple boot and 'test with docker' job that is just sending some fastboot commands to check comms. It reboots into the bootloader and then sends a set-active command so it will wait if the device is not available: https://lava.genivi.org/scheduler/job/1154/definition

Submitting the job the fastboot device is successfully found and passed into the test docker as dev /001/074/: https://lava.genivi.org/scheduler/job/1154#L190

We have DUT comms and can reboot the board: https://lava.genivi.org/scheduler/job/1154#L232

However when we then do the fastboot set-active it waits: https://lava.genivi.org/scheduler/job/1154#L238

On the worker I can see that lsusb reports a Google device at /001/075. So the DUT appears to have successfully rebooted into fastboot. All be it with a different device number. If I connect to the device I can break into u-boot.

So it appears LAVA has not re-hooked the device on reboot. In your description you mention dev numbers changing so I assume that is in the design.

Do you have any suggestions as to what might be wrong or how I could debug this further?

Worker is running LAVA at the point of your 1268 PR on Debian stretch. lava-dispatcher-host is installed.

Best Wishes,

Steve

Stephen Lawrence

8 Aug 8 Aug

1:27 a.m.

New subject: [Lava-users] fastboot in docker support

...

-----Original Message----- From: Stephen Lawrence Sent: 05 August 2020 20:22 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org Subject: RE: [Lava-users] fastboot in docker support

[snip]

...

I simplified the test case to a simple boot and 'test with docker' job that is just sending some fastboot commands to check comms. It reboots into the bootloader and then sends a set-active command so it will wait if the device is not available: https://lava.genivi.org/scheduler/job/1154/definition

Submitting the job the fastboot device is successfully found and passed into the test docker as dev /001/074/: https://lava.genivi.org/scheduler/job/1154#L190

We have DUT comms and can reboot the board: https://lava.genivi.org/scheduler/job/1154#L232

However when we then do the fastboot set-active it waits: https://lava.genivi.org/scheduler/job/1154#L238

On the worker I can see that lsusb reports a Google device at /001/075. So the DUT appears to have successfully rebooted into fastboot. All be it with a different device number. If I connect to the device I can break into u-boot.

So it appears LAVA has not re-hooked the device on reboot. In your description you mention dev numbers changing so I assume that is in the design.

Do you have any suggestions as to what might be wrong or how I could debug this further?

Worker is running LAVA at the point of your 1268 PR on Debian stretch. lava-dispatcher-host is installed.

Just to simplify further and eliminate the possibility that the recent PR didn't have all the code I retested using the latest code. LAVA master branch HEAD~3 (sha a2bd1a87). I got the same results, with fastboot on the host not finding the device after reboot: https://lava.genivi.org/scheduler/job/1166

Best Wishes,

Steve

Antonio Terceiro

11 Aug 11 Aug

4:13 a.m.

New subject: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

On Fri, Aug 07, 2020 at 05:27:54PM +0000, Stephen Lawrence wrote:

...

...
-----Original Message----- From: Stephen Lawrence Sent: 05 August 2020 20:22 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org Subject: RE: [Lava-users] fastboot in docker support

[snip]

...
I simplified the test case to a simple boot and 'test with docker' job that is just sending some fastboot commands to check comms. It reboots into the bootloader and then sends a set-active command so it will wait if the device is not available: https://lava.genivi.org/scheduler/job/1154/definition

Submitting the job the fastboot device is successfully found and passed into the test docker as dev /001/074/: https://lava.genivi.org/scheduler/job/1154#L190

We have DUT comms and can reboot the board: https://lava.genivi.org/scheduler/job/1154#L232

However when we then do the fastboot set-active it waits: https://lava.genivi.org/scheduler/job/1154#L238

On the worker I can see that lsusb reports a Google device at /001/075. So the DUT appears to have successfully rebooted into fastboot. All be it with a different device number. If I connect to the device I can break into u-boot.

So it appears LAVA has not re-hooked the device on reboot. In your description you mention dev numbers changing so I assume that is in the design.

Do you have any suggestions as to what might be wrong or how I could debug this further?

Worker is running LAVA at the point of your 1268 PR on Debian stretch. lava-dispatcher-host is installed.

Just to simplify further and eliminate the possibility that the recent PR didn't have all the code I retested using the latest code. LAVA master branch HEAD~3 (sha a2bd1a87). I got the same results, with fastboot on the host not finding the device after reboot: https://lava.genivi.org/scheduler/job/1166

Hi,

I could reproduce your issue with a job like this (I reboot from adb first because $reasons):

run: steps: - adb devices - adb reboot-bootloader - fastboot devices - fastboot reboot-bootloader - fastboot devices

This seems to be a timing issue. When I add a sleep between `fastboot reboot-bootloader` and the following `fastboot devices`, it works:

2020-08-10T20:04:22 + fastboot reboot-bootloader 2020-08-10T20:04:22 < waiting for 69E78A7B00325ACE > 2020-08-10T20:04:27 Sharing /dev/bus/usb/001/083 with docker container lava-docker-test-shell-380-3.3 2020-08-10T20:04:27 rebooting into bootloader... 2020-08-10T20:04:27 OKAY [ 0.001s] 2020-08-10T20:04:30 finished. total time: 3.011s 2020-08-10T20:04:30 + sleep 10s 2020-08-10T20:04:30 Sharing /dev/bus/usb/001/084 with docker container lava-docker-test-shell-380-3.3 2020-08-10T20:04:40 + fastboot devices 2020-08-10T20:04:40 69E78A7B00325ACE fastboot

of course sleeping is a lazy hack and not a real solution, the point is: when LAVA itself triggers reboots/resets, it usually also handles the waiting for you, but when you do it yourself, you need to do it yourself.

Stephen Lawrence

7:01 p.m.

New subject: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

...

-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 10 August 2020 21:13 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org Subject: fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

[snip]

...

...
Just to simplify further and eliminate the possibility that the recent PR didn't have all the code I retested using the latest code. LAVA master branch HEAD~3 (sha a2bd1a87). I got the same results, with fastboot on the host not finding the device after reboot: https://lava.genivi.org/scheduler/job/1166

Hi,

I could reproduce your issue with a job like this (I reboot from adb first because $reasons):
        run:
          steps:
            - adb devices
            - adb reboot-bootloader
            - fastboot devices
            - fastboot reboot-bootloader
            - fastboot devices
This seems to be a timing issue. When I add a sleep between `fastboot reboot-bootloader` and the following `fastboot devices`, it works:

2020-08-10T20:04:22 + fastboot reboot-bootloader 2020-08-10T20:04:22 < waiting for 69E78A7B00325ACE > 2020-08-10T20:04:27 Sharing /dev/bus/usb/001/083 with docker container lava-docker- test-shell-380-3.3 2020-08-10T20:04:27 rebooting into bootloader... 2020-08-10T20:04:27 OKAY [ 0.001s] 2020-08-10T20:04:30 finished. total time: 3.011s 2020-08-10T20:04:30 + sleep 10s 2020-08-10T20:04:30 Sharing /dev/bus/usb/001/084 with docker container lava-docker- test-shell-380-3.3 2020-08-10T20:04:40 + fastboot devices 2020-08-10T20:04:40 69E78A7B00325ACE fastboot

Hi,

Thank you for taking the time to look into this. We really appreciate it.

...

of course sleeping is a lazy hack and not a real solution, the point is: when LAVA itself triggers reboots/resets, it usually also handles the waiting for you, but when you do it yourself, you need to do it yourself.

Ok not ideal, but I think the sleep after a reboot would be fine - especially as a first solution to get us moving on to some testing. I realise we are operating in some what of a corner case in using fastboot commands not supported in 'deploy'.

In my simplified case I have a 'fastboot --set-active' after the reboot that will cause the host to naturally wait, but I did wonder about the discovery code getting the chance to find and share the restarted device. So anyway I tried a 10s and 20s sleep after the reboot. Unfortunately still no device discovery after reboot: https://lava.genivi.org/scheduler/job/1176#L240 https://lava.genivi.org/scheduler/job/1177#L240

I think I need to deep dive the new discovery code to better understand what methods need to be supported. It feels like it might be something minor. A missing udev rule or something extra required in the device dict like usb IDs to trigger the discovery the second time. It's curious that the serial number is sufficient for discovery to succeed the first time though.

The other thought is my recent mea culpa over the changes for tee support and having a Worker that is running newer code than the Master. Looking at your changes it appears they are confined to the Worker only, but that is another possibility if you know that in fact a change on the Master is required as well.

Regards

Steve

Stephen Lawrence

12 Aug 12 Aug

9:30 p.m.

New subject: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

...

-----Original Message----- From: Stephen Lawrence Sent: 11 August 2020 12:02 To: 'Antonio Terceiro' antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org Subject: RE: fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

[snip]

...

In my simplified case I have a 'fastboot --set-active' after the reboot that will cause the host to naturally wait, but I did wonder about the discovery code getting the chance to find and share the restarted device. So anyway I tried a 10s and 20s sleep after the reboot. Unfortunately still no device discovery after reboot: https://lava.genivi.org/scheduler/job/1176#L240 https://lava.genivi.org/scheduler/job/1177#L240

I think I need to deep dive the new discovery code to better understand what methods need to be supported. It feels like it might be something minor. A missing udev rule or something extra required in the device dict like usb IDs to trigger the discovery the second time. It's curious that the serial number is sufficient for discovery to succeed the first time though.

OK so yesterday I spent some hours going back through the lava-dispatcher-host code and debugging my instance. I could not see from browsing the source why the discovery code would not find the device on reboot - matching on the serial num is there for example. So I turned to fundamentals and the udev rules it installs. Specifically the 'add' udev event for a device.

Here I made some debug progress. Running 'udevadm monitor' in both the host and the worker container I can see kernel and udev events on the host, but no udev events in the worker container. That doesn't explain why the device is found the first time, but it can explain why it is not found after the reboot (working from Antonio's summary of how the new discovery code works).

This surprises me as it takes me back to around May when we were first working through problems of device discovery. My recollection from then is I was seeing the udev events in the worker container once I shared the udev database (as /run/udev) into the container.

With all my rebasing onto latest code its possible I have missed something. I have not found anything like that and I'll keep looking but taking a step back I think there is larger more fundamental questions about LAVA here. How do the developers see udev discovery working in a docker environment? Sharing 'just enough' into the worker container for udev events to work? What is 'just enough'? Perhaps you see a duplicate 'full' udev system running in the worker container instead? I would expect the sharing approach but I don't know for certain.

More immediately how is the community overcoming this in their own LAVA instances? What are you sharing into your worker containers? Are you just running in privileged mode with everything shared? It seems unlikely we are the only ones doing this.

Of course its all 'just' configurable code but as Paul Sokolovsky pointed out in the issue he raised about the docker socket [1] there ought to be patterns we can collectively figure out. Paths that are known to work. That should make people productive with LAVA more quickly, whilst reducing support on long threads like this.

[1] https://git.lavasoftware.org/lava/pkg/docker-compose/-/issues/7

Regards

Steve

Antonio Terceiro

9:59 p.m.

New subject: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

On Tue, Aug 11, 2020 at 11:01:33AM +0000, Stephen Lawrence wrote:

...

In my simplified case I have a 'fastboot --set-active' after the reboot that will cause the host to naturally wait, but I did wonder about the discovery code getting the chance to find and share the restarted device. So anyway I tried a 10s and 20s sleep after the reboot. Unfortunately still no device discovery after reboot: https://lava.genivi.org/scheduler/job/1176#L240 https://lava.genivi.org/scheduler/job/1177#L240

I think I need to deep dive the new discovery code to better understand what methods need to be supported. It feels like it might be something minor. A missing udev rule or something extra required in the device dict like usb IDs to trigger the discovery the second time. It's curious that the serial number is sufficient for discovery to succeed the first time though.

It should be enough. see if there are any errors in the udev logs. e.g. if lava-dispatcher-host fails for some reason there will be a correspoding line there (but no output unfortunately as udev does not capture it, only the failure exit code)

Stephen Lawrence

13 Aug 13 Aug

1:02 a.m.

New subject: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

...

-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 12 August 2020 14:59 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org Subject: Re: fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

[snip]

...

It should be enough. see if there are any errors in the udev logs. e.g. if lava-dispatcher-host fails for some reason there will be a correspoding line there (but no output unfortunately as udev does not capture it, only the failure exit code)

Thanks for the hints. Nothing in the host logs from lava-dispatcher-host via journalctl. In the worker container there is no systemd and in the lava-slave /var/log I don't see anything.

At this current moment it feels more a fundamental lava worker in docker issue given the lack of udev events being reported by udevadm monitor in the worker container. That the udev discovery code in lava-dispatcher-host has no chance if there are no events. Unless something about the container environment means I can not take the udevadm reporting at face value.

btw just a couple of additional comments about my post earlier today. I'm really trying not to be that new guy to a list who just spams with their issues and 'fix this next one' one after another. I did not mean that the developers should be maintaining some nice document set for running LAVA in docker.

I just meant two main points. Firstly, how the devs thought udev should work in such a lava environment - if its been considered. Secondly, that the wider community can collaborate on some patterns as to what needs to be shared into a container for certain LAVA services to work. Of course that can be codified where sensible in the docker base image or downstream projects like lava-docker. Web searches about enabling udev in docker containers returns very little outside some hacks.

Meanwhile I will try running the container in more privileged modes to see if that sparks some life.

Regards

Steve

Stephen Lawrence

15 Aug 15 Aug

1:09 a.m.

New subject: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

...

-----Original Message----- From: Stephen Lawrence Sent: 12 August 2020 18:03 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org Subject: RE: fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

...
-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 12 August 2020 14:59 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org Subject: Re: fastboot reboot-bootloader from docker test shell (was "fastboot in

docker

...
support")

[snip]

...
It should be enough. see if there are any errors in the udev logs. e.g. if lava-dispatcher-host fails for some reason there will be a correspoding line there (but no output unfortunately as udev does not capture it, only the failure exit code)

Thanks for the hints. Nothing in the host logs from lava-dispatcher-host via journalctl. In the worker container there is no systemd and in the lava-slave /var/log I don't see anything.

At this current moment it feels more a fundamental lava worker in docker issue given the lack of udev events being reported by udevadm monitor in the worker container. That the udev discovery code in lava-dispatcher-host has no chance if there are no events. Unless something about the container environment means I can not take the udevadm reporting at face value.

btw just a couple of additional comments about my post earlier today. I'm really trying not to be that new guy to a list who just spams with their issues and 'fix this next one' one after another. I did not mean that the developers should be maintaining some nice document set for running LAVA in docker.

I just meant two main points. Firstly, how the devs thought udev should work in such a lava environment - if its been considered. Secondly, that the wider community can collaborate on some patterns as to what needs to be shared into a container for certain LAVA services to work. Of course that can be codified where sensible in the docker base image or downstream projects like lava-docker. Web searches about enabling udev in docker containers returns very little outside some hacks.

It looks like I have finally tracked this down to its cause. Looking back at my notes when I was debugging lava_dispatcher/utils/udev.py it was looking like the lack of udev events in the worker container might be a red herring. So I went back to the lava-dispatcher-host source and the udev rules it appears to rely on.

lava-dispatcher-host was installed but in the worker container with the rest of the worker. Not receiving udev events in the worker container means that the rule there never executes. At the same time the host does receive the udev events on DUT reboot but there is nothing to communicate them to the worker container.

This then goes back to the first question I raised yesterday about how the lava devs see udev discovery commonly working when LAVA is containerised. For example, whether it happens on the host and is somehow communicated into the worker container, or whether 'full' udev is required in the worker container or whether in some cases a different setup is possible. I'll report on the last choice below.

Web discussion suggests there are opinion differences between the systemd and docker devs which may mean that 'full' udev will not appear in docker anytime soon: https://stackoverflow.com/a/62227562 There seems to be some ways of getting some support but they appear to require hacks like run the container in a privileged mode or bridging the container directly to the host network. That leaves sharing into the worker container in some way or an alternative.

For my specific use case/issue (udev discovery in 'test with docker' and 'deploy to downloads' docker) it appeared to be how/where to run lava-dispatcher-host and what communication requirements it had to the wider system. Having udev rules on the host that tried to execute lava-dispatcher-host in the worker container might be best if it needs to communicate with the other lava processes. For 'test with docker' and 'deploy to downloads' however the device is being shared into a sibling of the worker container, not a child. So execution in the worker container may present its own complications of sharing/communication with a sibling.

A high level pass of the code appeared to show it might be self contained so as a skunkworks test I tried installing lava-dispatcher-host on the host and installed the rules. This works! At least for my use case.

Simple fastboot reboot: https://lava.genivi.org/scheduler/job/1221

More advanced flashing case with extended fastboot commands and reboots: https://lava.genivi.org/scheduler/job/1226

One side effect I noticed is that (not a big surprise) logging for the device rediscovery is lost.

So great to finally have this working on some level but there are some open questions about how best to do this longer term. I don't have a good grasp of the lava code base and its wider requirements. Running lava-dispatcher-host on the host works for my use case, but you perhaps can immediately think of cases where it wouldn't. What do the devs think?

Regards

Steve

p.s. I think I found an error in the user doc. I'll send a pull request for that for review.

Kumar Gala

17 Aug 17 Aug

11:22 p.m.

New subject: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

...

On Aug 14, 2020, at 12:09 PM, Stephen Lawrence stephen.lawrence@renesas.com wrote:

...
-----Original Message----- From: Stephen Lawrence Sent: 12 August 2020 18:03 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org Subject: RE: fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

...
-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 12 August 2020 14:59 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org Subject: Re: fastboot reboot-bootloader from docker test shell (was "fastboot in

docker

...
support")

[snip]

...
It should be enough. see if there are any errors in the udev logs. e.g. if lava-dispatcher-host fails for some reason there will be a correspoding line there (but no output unfortunately as udev does not capture it, only the failure exit code)

Thanks for the hints. Nothing in the host logs from lava-dispatcher-host via journalctl. In the worker container there is no systemd and in the lava-slave /var/log I don't see anything.

At this current moment it feels more a fundamental lava worker in docker issue given the lack of udev events being reported by udevadm monitor in the worker container. That the udev discovery code in lava-dispatcher-host has no chance if there are no events. Unless something about the container environment means I can not take the udevadm reporting at face value.

btw just a couple of additional comments about my post earlier today. I'm really trying not to be that new guy to a list who just spams with their issues and 'fix this next one' one after another. I did not mean that the developers should be maintaining some nice document set for running LAVA in docker.

I just meant two main points. Firstly, how the devs thought udev should work in such a lava environment - if its been considered. Secondly, that the wider community can collaborate on some patterns as to what needs to be shared into a container for certain LAVA services to work. Of course that can be codified where sensible in the docker base image or downstream projects like lava-docker. Web searches about enabling udev in docker containers returns very little outside some hacks.

It looks like I have finally tracked this down to its cause. Looking back at my notes when I was debugging lava_dispatcher/utils/udev.py it was looking like the lack of udev events in the worker container might be a red herring. So I went back to the lava-dispatcher-host source and the udev rules it appears to rely on.

lava-dispatcher-host was installed but in the worker container with the rest of the worker. Not receiving udev events in the worker container means that the rule there never executes. At the same time the host does receive the udev events on DUT reboot but there is nothing to communicate them to the worker container.

This then goes back to the first question I raised yesterday about how the lava devs see udev discovery commonly working when LAVA is containerised. For example, whether it happens on the host and is somehow communicated into the worker container, or whether 'full' udev is required in the worker container or whether in some cases a different setup is possible. I'll report on the last choice below.

Web discussion suggests there are opinion differences between the systemd and docker devs which may mean that 'full' udev will not appear in docker anytime soon: https://stackoverflow.com/a/62227562 There seems to be some ways of getting some support but they appear to require hacks like run the container in a privileged mode or bridging the container directly to the host network. That leaves sharing into the worker container in some way or an alternative.

For my specific use case/issue (udev discovery in 'test with docker' and 'deploy to downloads' docker) it appeared to be how/where to run lava-dispatcher-host and what communication requirements it had to the wider system. Having udev rules on the host that tried to execute lava-dispatcher-host in the worker container might be best if it needs to communicate with the other lava processes. For 'test with docker' and 'deploy to downloads' however the device is being shared into a sibling of the worker container, not a child. So execution in the worker container may present its own complications of sharing/communication with a sibling.

A high level pass of the code appeared to show it might be self contained so as a skunkworks test I tried installing lava-dispatcher-host on the host and installed the rules. This works! At least for my use case.

Simple fastboot reboot: https://lava.genivi.org/scheduler/job/1221

More advanced flashing case with extended fastboot commands and reboots: https://lava.genivi.org/scheduler/job/1226

One side effect I noticed is that (not a big surprise) logging for the device rediscovery is lost.

So great to finally have this working on some level but there are some open questions about how best to do this longer term. I don't have a good grasp of the lava code base and its wider requirements. Running lava-dispatcher-host on the host works for my use case, but you perhaps can immediately think of cases where it wouldn't. What do the devs think?

Regards

Steve

p.s. I think I found an error in the user doc. I'll send a pull request for that for review.

Steve,

I haven’t read through all this thread, but just wanted to point out:

https://git.lavasoftware.org/lava/docker-udev-tools/-/blob/master/udev-forwa...

This was something I implemented for forwarding udev events to a lava-dispatcher running in a container.

So might or might not be useful, but just wanted to point it out.

- k

Stephen Lawrence

19 Aug 19 Aug

2:24 a.m.

New subject: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

...

-----Original Message----- From: Kumar Gala kumar.gala@linaro.org Sent: 17 August 2020 16:22 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: Antonio Terceiro antonio.terceiro@linaro.org; lava-users@lists.lavasoftware.org Subject: Re: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

Steve,

I haven’t read through all this thread, but just wanted to point out:

https://git.lavasoftware.org/lava/docker-udev-tools/-/blob/master/udev-forwa...

This was something I implemented for forwarding udev events to a lava-dispatcher running in a container.

So might or might not be useful, but just wanted to point it out.

Hi Kumar,

No I was not aware of it so thank you for the heads up. I am happy to look at the code but could you please summarise what it is intended to address?

If there is no systemd/udev in a container then does it have a specific use case in mind for example, or is it a general utility on which LAVA is built to use when containerised?

To save you reading through my long threads in my specific case I am using 'test docker' to run fastboot commands not supported by 'deploy'. For that to work I need the revised udev discovery implemented in lava-dispatcher-host to work in a containerised environment when fastboot reboots the DUT.

Regards

Steve

Antonio Terceiro

8:18 p.m.

New subject: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

On Fri, Aug 14, 2020 at 05:09:37PM +0000, Stephen Lawrence wrote:

...

...
-----Original Message----- From: Stephen Lawrence Sent: 12 August 2020 18:03 To: Antonio Terceiro antonio.terceiro@linaro.org Cc: lava-users@lists.lavasoftware.org Subject: RE: fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

...
-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 12 August 2020 14:59 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org Subject: Re: fastboot reboot-bootloader from docker test shell (was "fastboot in

docker

...
support")

[snip]

...
It should be enough. see if there are any errors in the udev logs. e.g. if lava-dispatcher-host fails for some reason there will be a correspoding line there (but no output unfortunately as udev does not capture it, only the failure exit code)

Thanks for the hints. Nothing in the host logs from lava-dispatcher-host via journalctl. In the worker container there is no systemd and in the lava-slave /var/log I don't see anything.

At this current moment it feels more a fundamental lava worker in docker issue given the lack of udev events being reported by udevadm monitor in the worker container. That the udev discovery code in lava-dispatcher-host has no chance if there are no events. Unless something about the container environment means I can not take the udevadm reporting at face value.

btw just a couple of additional comments about my post earlier today. I'm really trying not to be that new guy to a list who just spams with their issues and 'fix this next one' one after another. I did not mean that the developers should be maintaining some nice document set for running LAVA in docker.

I just meant two main points. Firstly, how the devs thought udev should work in such a lava environment - if its been considered. Secondly, that the wider community can collaborate on some patterns as to what needs to be shared into a container for certain LAVA services to work. Of course that can be codified where sensible in the docker base image or downstream projects like lava-docker. Web searches about enabling udev in docker containers returns very little outside some hacks.

It looks like I have finally tracked this down to its cause. Looking back at my notes when I was debugging lava_dispatcher/utils/udev.py it was looking like the lack of udev events in the worker container might be a red herring. So I went back to the lava-dispatcher-host source and the udev rules it appears to rely on.

lava-dispatcher-host was installed but in the worker container with the rest of the worker. Not receiving udev events in the worker container means that the rule there never executes. At the same time the host does receive the udev events on DUT reboot but there is nothing to communicate them to the worker container.

lava-dispatcher-host needs to be installed and configured (i.e. udev rules in place) on the host system for it to work, exactly because udev events are not (and probably should not be) available inside the container.

...

This then goes back to the first question I raised yesterday about how the lava devs see udev discovery commonly working when LAVA is containerised. For example, whether it happens on the host and is somehow communicated into the worker container, or whether 'full' udev is required in the worker container or whether in some cases a different setup is possible. I'll report on the last choice below.

Running the dispatcher in a container and still being able to run jobs inside containers is still WIP from our PoV, I am only now starting to look at it.

Stephen Lawrence

22 Aug 22 Aug

3:02 a.m.

New subject: [Lava-users] fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

...

-----Original Message----- From: Antonio Terceiro antonio.terceiro@linaro.org Sent: 19 August 2020 13:19 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org Subject: Re: fastboot reboot-bootloader from docker test shell (was "fastboot in docker support")

[snip]

...

lava-dispatcher-host needs to be installed and configured (i.e. udev rules in place) on the host system for it to work, exactly because udev events are not (and probably should not be) available inside the container.

Hi Antonio, thanks for the input. I'll continue to use lava-dispatcher-host on the host then.

...

...
This then goes back to the first question I raised yesterday about how the lava devs see udev discovery commonly working when LAVA is containerised. For example, whether it happens on the host and is somehow communicated into the worker container, or whether 'full' udev is required in the worker container or whether in some cases a different setup is possible. I'll report on the last choice below.

Running the dispatcher in a container and still being able to run jobs inside containers is still WIP from our PoV, I am only now starting to look at it.

I added one recent gotcha when running in a container as a comment to Paul's docker issue on gitlab, but it was not the perfect home for it. Paul's issue is more specific I think.

Anyway as you get further into it and if it it's a help to hear what problems ppl hit doing it then feel free to ask or point at where to add them.

Regards

Steve

Stephen Lawrence

25 Jun 25 Jun

9:49 p.m.

New subject: [Lava-users] fastboot in docker support

Hi Milosz,

...

-----Original Message----- From: Milosz Wasilewski milosz.wasilewski@linaro.org Sent: 25 June 2020 10:49 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

Stephen,

I think what you want already works (sort of)

Thank you for your very quick reply and in particular taking the time to actually try out the solution and your follow ups :) The sequencing of the sections and the u-boot control example were educational.

I'm now taking some vacation till the end of the week, but based on a initial look I just wanted to ask a couple of questions to check I have it right for when I try this out Monday.

Looking at your job you would get: 1. deploy: flash boot loader 2. boot: board into fastboot in u-boot 3. interactive test: to perform bespoke fastboot cmds 4. deploy: flash rest of android images (as normal) 5. test: as normal

So as long as the fastboot cmds in the interactive test section in step 3 do not require specific sequencing with the images being flashed in step 4, e.g that you must send 'oem format' after boot.img, but before vbmeta.img it should work. Do I have that right?

I think with your example I could also sequence a job definition that called the script to flash the board from a test section. The only issue there is getting the images into the container in an automated way.

Thanks again. I look forward to trying it out.

Cheers

Steve

Milosz Wasilewski

10:01 p.m.

New subject: [Lava-users] fastboot in docker support

On Thu, 25 Jun 2020 at 14:49, Stephen Lawrence stephen.lawrence@renesas.com wrote:

...

Hi Milosz,

...
-----Original Message----- From: Milosz Wasilewski milosz.wasilewski@linaro.org Sent: 25 June 2020 10:49 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

Stephen,

I think what you want already works (sort of)

Thank you for your very quick reply and in particular taking the time to actually try out the solution and your follow ups :) The sequencing of the sections and the u-boot control example were educational.

I'm now taking some vacation till the end of the week, but based on a initial look I just wanted to ask a couple of questions to check I have it right for when I try this out Monday.

Looking at your job you would get:

deploy: flash boot loader

boot: board into fastboot in u-boot

In real world scenario I should have done (in u-boot shell): env default -f -a env set partitions $partitions_android fastboot 1

But I used the same version of bootloader that was already running on the board so these steps were not needed.

...

interactive test: to perform bespoke fastboot cmds

you don't need interactive test. I thought it was needed to start fastboot inside u-boot, but it wasn't. I used inline test to invoke fastboot oem format, erase commands. IMHO this does the trick if you need to perform multi-step flashing (like in this case). In my case 'oem format' might change the partition layout so I should flash bootloader and xloader partitions again. But as I explained above I wasn't actually changing the bootloader/xloader.

...

deploy: flash rest of android images (as normal)

yes, remember to flash bootloader/xloader if you change the version.

...

test: as normal

So as long as the fastboot cmds in the interactive test section in step 3 do not require specific sequencing with the images being flashed in step 4, e.g that you must send 'oem format' after boot.img, but before vbmeta.img it should work. Do I have that right?

Yes, as mentioned above, 'oem format' alters the partition table (in my case).

If you have more commands that do something on the board you just repeat the deploy/boot/test sequence several times. It might be handy to use some templating engine to prepare your lava jobs.

...

I think with your example I could also sequence a job definition that called the script to flash the board from a test section. The only issue there is getting the images into the container in an automated way.

Hmm, I think this can be done with 'postprocessing' in deploy section but I didn't try. Antonio will know better. If it works you might not need the test section 3. Your second deploy step would do the oem format and reboot. I didn't try it but in theory it should work (famous last words).

milosz

...

Thanks again. I look forward to trying it out.

Cheers

Steve

Stephen Lawrence

10:27 p.m.

New subject: [Lava-users] fastboot in docker support

Hi,

...

-----Original Message----- From: Milosz Wasilewski milosz.wasilewski@linaro.org Sent: 25 June 2020 15:01 To: Stephen Lawrence stephen.lawrence@renesas.com Cc: lava-users@lists.lavasoftware.org; gandersson@genivi.org Subject: Re: [Lava-users] fastboot in docker support

[snip]

...

If you have more commands that do something on the board you just repeat the deploy/boot/test sequence several times. It might be handy to use some templating engine to prepare your lava jobs.

Yes I think this is the educational 'aha' for me. I had seen jobs with multiple test sections, but none I think (or just didn’t notice) that had repeats of the same types of deploy or boot sections (by type I just mean say 'deploy to fastboot'). So I was thinking that lava was just executing single deploy->boot blocks and thus everything had to overloaded into that single operation.

Now I can see how it can be done without engineering changes in lava itself. I agree templating it would make sense once its working.

...

...
I think with your example I could also sequence a job definition that called the script to flash the board from a test section. The only issue there is getting the images into the container in an automated way.

Hmm, I think this can be done with 'postprocessing' in deploy section but I didn't try. Antonio will know better. If it works you might not need the test section 3. Your second deploy step would do the oem format and reboot. I didn't try it but in theory it should work (famous last words).

We did wonder about trying it from the 'postprocess' of a 'deploy to downloads' and made an initial experiment. First blocker was getting the images for the script to process into the container and the other was the lava python not checking for a local docker image before trying to pull it. Oh and the question of u-boot control but you've answered that. I just mention this in the spirit of community information sharing. I think I will try the fastboot sequencing first Monday.

Right now I am really off. Blazing hot here and I am lucky enough to have a tree to sit under and a beer and book with my name on it!

Steve

1813

days inactive

1871

days old

lava-users@lists.lavasoftware.org

38 comments

participants

tags (0)

participants (4)

Antonio Terceiro
Kumar Gala
Milosz Wasilewski
Stephen Lawrence