lava-users December 2018

lava-users@lists.lavasoftware.org

18 participants
28 discussions

MultiNode Tradefed-runner with session retry

by Karsten Tausche

Hi folks, We at Fairphone have developed a variant of the Tradefed-runner in LAVA test-definitions that is meant to run complete Tradefed test suites on multiple devices by making use of the shards feature in Tradefed. The runner is currently in “staging” state. We still want to share now what we are using and developing to see if there are more people with interest in it. Feedback on the general approach taken would also be much appreciated. On the higher level, our setup works as follows: • Use MultiNode to allocate multiple devices for one test submission. • One “master” runs the Tradefed shell, similarly as in the existing runner. • The master connects to the workers’ DUTs via adb TCP/IP. These DUTs are transparently available to Tradefed just in the same way as USB-attached devices. • Workers ensure that their respective DUTs remain accessible to the master, especially in case of WLAN disconnects, reboots, crashes, etc. Major features of our runner: • Support for Android CTS, GTS and STS. • Test run split into “shards” in Tradefed to run tests in parallel on multiple devices. This allows for a major speedup when running large test suites. • Tradefed retry: Rerun test suites until the failure count stabilizes. • No adb root required. • Based on the original Tradefed runner, having at least parts of the common code moved to python libraries. Current limitations: • Test executions are not always stable. This needs further investigation. • Test executions produce more false positives than local test runs. This needs further investigation but is at least partially due to using adb TCP/IP instead of a local USB connection. • Android VTS not implemented (would require only minor changes) Our current changes have been pushed to the tradefed_shards_with_retry topic on Gerrit[1]. Besides the two major changes to add MultiNode adb support and then Tradefed support on top of that, a couple of smaller changes that could be useful on their own have also been pushed. We are looking forward to your feedback and to joint efforts in automating and speeding up Tradefed test executions! Best regards, Karsten for the Fairphone Software Team [1] https://review.linaro.org/q/topic:%22tradefed_shards_with_retry%22+(status:…

6 years, 7 months

Re: [Lava-users] Steps for restoring a backup

by Tim Jaacks

On Mon, 10 Dec 2018 at 20:16, Neil Williams <neil.williams at linaro.org> wrote: > Yes, there is a problem there - thanks for catching it. I think the > bulk of the page dates from the last stages of the migration when V1 > data was still around. I'll look at an update of the page tomorrow. > Step 7 is a sanity check that the install of the empty instance has > gone well, Step 9 is to ensure that the newly restored database is put > into maintenance as soon as possible to prevent any queued test jobs > from attempting to start. The critical element of Step 9 is to ensure > that the lava-master service is stopped. > > The emphasis of the section is on ensuring that the instance only > serves a "Maintenance" page, e.g. the default Debian "It works!" > apache page, to prevent access to the instance during the restore. Thanks for pointing that out, Neil. I got the point, that the Apache server has to serve a static site during the restore process. > Accessing the UI would involve having an alternative way to serve the > pages. If that can be arranged, just for admins, (e.g. by changing the > external routing to the box or redirecting DNS temporarily) then the > UI on the instance can be used with the change that the > lava-server-gunicorn service does not need to be stopped (because > access has been redirected). Other services would be stopped. However, > this would involve a fair number of apache config changes, so is best > left to those admins who have such config already on hand. > > The operations can be done from the command line and that's probably > best for these docs. > > Step 7 can be replaced by: > > lava-server manage check --deploy > > Step 9 can be replaced by looping over: > > lava-server manage devices update --health MAINTENANCE --hostname ${HOSTNAME} > > or, if there are a lot of devices: > > lava-server manage maintenance --force > > (This maintenance helper has been fixed in master - soon to be 2018.12 > - so older versions would use the first command & loop.) Thanks, the CLI operations are very helpful for automating the process. However, the docs say that all devices in "Reserved" state have to have their "current job" cleared. I can use "lava-server manage devices details" to check whether this field is actually set. There is no command to modify it, though. Seems like using the Python API is the only way to go here, right? The same applies to setting "Running" jobs to "Cancelled". > I'll look at changing the page to use CLI operations for steps 7 and > 9. Some labs can do the http redirect / routing method but the detail > of that is probably not in scope for this page in the LAVA docs. I'll > add a note that admins have that choice but leave it for those admins > to implement. Mit freundlichen Grüßen / Best regards Tim Jaacks DEVELOPMENT ENGINEER Garz & Fricke GmbH Tempowerkring 2 21079 Hamburg Direct: +49 40 791 899 - 55 Fax: +49 40 791899 - 39 tim.jaacks(a)garz-fricke.com www.garz-fricke.com WE MAKE IT YOURS! Sitz der Gesellschaft: D-21079 Hamburg Registergericht: Amtsgericht Hamburg, HRB 60514 Geschäftsführer: Matthias Fricke, Manfred Garz, Marc-Michael Braun

6 years, 7 months

Steps for restoring a backup

by Tim Jaacks

Hello everyone, I am trying to implement a backup and restore routine for our LAVA server, based on the documentation: https://validation.linaro.org/static/docs/v2/admin-backups.html#restoring-a… The creation of the backup is straight-forward. I have problems with the order of the proposed restore steps, though. Step 6 is "Stop all LAVA services". However, afterwards in step 7 it says "Make sure that this instance actually works by browsing a few (empty) instance pages." This should obviously be done before, right? The actual problem is that step 9 says "In the Django administration interface, take all devices which are not Retired into Offline". This cannot be an ordering issue, because the LAVA services actually must not be available during these modifications. How do I use the Django admin interface, while all LAVA services are stopped? Mit freundlichen Grüßen / Best regards Tim Jaacks DEVELOPMENT ENGINEER Garz & Fricke GmbH Tempowerkring 2 21079 Hamburg Direct: +49 40 791 899 - 55 Fax: +49 40 791899 - 39 tim.jaacks(a)garz-fricke.com www.garz-fricke.com WE MAKE IT YOURS! Sitz der Gesellschaft: D-21079 Hamburg Registergericht: Amtsgericht Hamburg, HRB 60514 Geschäftsführer: Matthias Fricke, Manfred Garz, Marc-Michael Braun

6 years, 7 months

How to switch device-type base jinja with only one device?

by Larry Shen

Dear all, I have a question when use lava. Background: 1. I have only one hardware device with android. 2. I have a device-type jinja2 file start with "{% extends 'base-fastboot.jinja2' %}" Here, I use "adb reboot bootloader" to enter in to fastboot. 3. I have another device-type jinja2 file start with "{% extends 'base-uboot.jinja2' %}" Here, I use "fastboot 0" in uboot to enter in to fastboot. Now, we have a scenario which need to test with above both methods, but we just have one device, if possible user can define some parameter in job.yaml, then can switch between the two methods just for one device? Any suggestion? Thanks, Larry

6 years, 7 months

lava-logs crash in 2018.11

by Corentin Labbe

Hello I got the following crash with 2018.11 on debian stretch dpkg -l|grep lava ii lava 2018.11-1~bpo9+1 all Linaro Automated Validation Architecture metapackage ii lava-common 2018.11-1~bpo9+1 all Linaro Automated Validation Architecture common ii lava-coordinator 0.1.7-1 all LAVA Coordinator daemon ii lava-dev 2018.11-1~bpo9+1 all Linaro Automated Validation Architecture developer support ii lava-dispatcher 2018.11-1~bpo9+1 amd64 Linaro Automated Validation Architecture dispatcher ii lava-server 2018.11-1~bpo9+1 all Linaro Automated Validation Architecture server ii lava-server-doc 2018.11-1~bpo9+1 all Linaro Automated Validation Architecture documentation ii lavacli 0.9.3-1~bpo9+1 all LAVA XML-RPC command line interface ii lavapdu-client 0.0.5-1 all LAVA PDU client ii lavapdu-daemon 0.0.5-1 all LAVA PDU control daemon 2018-12-04 14:14:40,187 ERROR [EXIT] Unknown exception raised, leaving! 2018-12-04 14:14:40,187 ERROR string index out of range Traceback (most recent call last): File "/usr/lib/python3/dist-packages/lava_server/management/commands/lava-logs.py", line 193, in handle self.main_loop() File "/usr/lib/python3/dist-packages/lava_server/management/commands/lava-logs.py", line 253, in main_loop while self.wait_for_messages(False): File "/usr/lib/python3/dist-packages/lava_server/management/commands/lava-logs.py", line 287, in wait_for_messages self.logging_socket() File "/usr/lib/python3/dist-packages/lava_server/management/commands/lava-logs.py", line 433, in logging_socket job.save() File "/usr/lib/python3/dist-packages/django_restricted_resource/models.py", line 71, in save return super(RestrictedResource, self).save(*args, **kwargs) File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 796, in save force_update=force_update, update_fields=update_fields) File "/usr/lib/python3/dist-packages/django/db/models/base.py", line 820, in save_base update_fields=update_fields) File "/usr/lib/python3/dist-packages/django/dispatch/dispatcher.py", line 191, in send response = receiver(signal=self, sender=sender, **named) File "/usr/lib/python3/dist-packages/lava_scheduler_app/signals.py", line 139, in testjob_notifications send_notifications(job) File "/usr/lib/python3/dist-packages/lava_scheduler_app/notifications.py", line 305, in send_notifications title, body, settings.SERVER_EMAIL, [recipient.email_address] File "/usr/lib/python3/dist-packages/django/core/mail/__init__.py", line 62, in send_mail return mail.send() File "/usr/lib/python3/dist-packages/django/core/mail/message.py", line 342, in send return self.get_connection(fail_silently).send_messages([self]) File "/usr/lib/python3/dist-packages/django/core/mail/backends/smtp.py", line 107, in send_messages sent = self._send(message) File "/usr/lib/python3/dist-packages/django/core/mail/backends/smtp.py", line 120, in _send recipients = [sanitize_address(addr, encoding) for addr in email_message.recipients()] File "/usr/lib/python3/dist-packages/django/core/mail/backends/smtp.py", line 120, in <listcomp> recipients = [sanitize_address(addr, encoding) for addr in email_message.recipients()] File "/usr/lib/python3/dist-packages/django/core/mail/message.py", line 161, in sanitize_address address = Address(nm, addr_spec=addr) File "/usr/lib/python3.5/email/headerregistry.py", line 42, in __init__ a_s, rest = parser.get_addr_spec(addr_spec) File "/usr/lib/python3.5/email/_header_value_parser.py", line 1988, in get_addr_spec token, value = get_local_part(value) File "/usr/lib/python3.5/email/_header_value_parser.py", line 1800, in get_local_part if value[0] in CFWS_LEADER: IndexError: string index out of range 2018-12-04 14:14:40,211 INFO [EXIT] Disconnect logging socket and process messages 2018-12-04 14:14:40,211 DEBUG [EXIT] unbinding from 'tcp://0.0.0.0:5555' 2018-12-04 14:14:50,221 INFO [EXIT] Closing the logging socket: the queue is empty Regards

6 years, 7 months

non-posix (interactive) shell testing

by Milosz Wasilewski

Hi, I'm trying to experiment with 'interactive' test shell. The docs are here: https://master.lavasoftware.org/static/docs/v2/actions-test.html#index-1 As I understand the feature should work in the following way (the docs aren't very clear): 1. wait for one of the prompts 2. send command 3. match the result to regex (pass or fail) However when I try it out, there is no wait for prompt. LAVA immediately sends the command and adds value from prompts to list of expressions to match. Is this correct? Example job: https://staging.validation.linaro.org/scheduler/job/245886#L53 LAVA sends the command even before the board starts booting. Any help is appreciated. milosz

6 years, 7 months

U-Boot command sequence ignores prompt waiting

by Andrejs Cainikovs

Hi everyone, I’m facing an issue when U-Boot commands are sent one after another without waiting for a prompt. Obviously, device is not able to boot. Excerpt from logs: dhcp => dhcp bootloader-commands: Wait for prompt ['=>', 'Resetting CPU', 'Must RESET board to recover', 'TIMEOUT', 'Retry count exceeded', 'ERROR: The remote end did not respond in time.'] (timeout 00:04:56) setenv serverip 172.17.1.189 setenv serverip 172.17.1.189 bootloader-commands: Wait for prompt ['=>', 'Resetting CPU', 'Must RESET board to recover', 'TIMEOUT', 'Retry count exceeded', 'ERROR: The remote end did not respond in time.'] (timeout 00:04:56) tftp 0x80000000 60/tftp-deploy-46_3_i9j/kernel/zImage-am335x-vcu.bin tftp 0x80000000 60/tftp-deploy-46_3_i9j/kernel/zImage-am335x-vcu.bin bootloader-commands: Wait for prompt ['=>', 'Resetting CPU', 'Must RESET board to recover', 'TIMEOUT', 'Retry count exceeded', 'ERROR: The remote end did not respond in time.'] (timeout 00:04:56) tftp 0x83000000 60/tftp-deploy-46_3_i9j/ramdisk/ramdisk.cpio.gz.uboot tftp 0x83000000 60/tftp-deploy-46_3_i9j/ramdisk/ramdisk.cpio.gz.uboot bootloader-commands: Wait for prompt ['=>', 'Resetting CPU', 'Must RESET board to recover', 'TIMEOUT', 'Retry count exceeded', 'ERROR: The remote end did not respond in time.'] (timeout 00:04:55) setenv initrd_size ${filesize} setenv initrd_size ${filesize} bootloader-commands: Wait for prompt ['=>', 'Resetting CPU', 'Must RESET board to recover', 'TIMEOUT', 'Retry count exceeded', 'ERROR: The remote end did not respond in time.'] (timeout 00:04:55) tftp 0x82000000 60/tftp-deploy-46_3_i9j/dtb/am335x-vcu-prod1.dtb tftp 0x82000000 60/tftp-deploy-46_3_i9j/dtb/am335x-vcu-prod1.dtb bootloader-commands: Wait for prompt ['=>', 'Resetting CPU', 'Must RESET board to recover', 'TIMEOUT', 'Retry count exceeded', 'ERROR: The remote end did not respond in time.'] (timeout 00:04:55) dhcp dhcp link up on port 0, speed 10, half duplex BOOTP broadcast 1 setenv serverip 172.17.1.189 tftp 0x80000000 60/tftp-deploy-46_3_i9j/kernel/zImage-am335x-vcu.bin BOOTP broadcast 2 tftp 0x83000000 60/tftp-deploy-46_3_i9j/ramdisk/ramdisk.cpio.gz.uboot setenv initrd_size ${filesize} tftp 0x82000000 60/tftp-deploy-46_3_i9j/dtb/am335x-vcu-prod1.dtb BOOTP broadcast 3 BOOTP broadcast 4 BOOTP broadcast 5 Retry time exceeded setenv bootargs 'console=ttyS2,115200n8 root=/dev/ram0 ti_cpsw.rx_packet_max=1526 ip=dhcp' => setenv bootargs 'console=ttyS2,115200n8 root=/dev/ram0 ti_cpsw.rx_packet_max=1526 ip=dhcp' bootloader-commands: Wait for prompt ['=>', 'Resetting CPU', 'Must RESET board to recover', 'TIMEOUT', 'Retry count exceeded', 'ERROR: The remote end did not respond in time.'] (timeout 00:04:50) setenv bootargs 'console=ttyS2,115200n8 root=/dev/ram0 ti_cpsw.rx_packet_max=1526 ip=dhcp' setenv bootargs 'console=ttyS2,115200n8 root=/dev/ram0 ti_cpsw.rx_packet_max=1526 ip=dhcp' => bootz 0x80000000 0x83000000 0x82000000 => bootz 0x80000000 0x83000000 0x82000000 bootloader-commands: Wait for prompt Starting kernel (timeout 00:04:50) bootz 0x80000000 0x83000000 0x82000000 bootz 0x80000000 0x83000000 0x82000000 Bad Linux ARM zImage magic! => Bad Linux ARM zImage magic! As you see, all commands that were sent between `dhcp` command and until it was able to complete or fail are simply dropped. Device type config: {% extends 'base-uboot.jinja2' %} {% set device_type = "vcu" %} {% set console_device = 'ttyS2' %} {% set baud_rate = 115200 %} {% set interrupt_prompt = 'Press s to abort autoboot' %} {% set interrupt_char = 's' %} {% set bootloader_prompt = '=>' %} {% set uboot_mkimage_arch = 'arm' %} {% set bootz_kernel_addr = '0x80000000' %} {% set bootz_ramdisk_addr = '0x83000000' %} {% set bootz_dtb_addr = '0x82000000' %} {% set extra_kernel_args = 'ti_cpsw.rx_packet_max=1526' %} {% set kernel_start_message = 'Welcome to' %} Device config: {% extends 'vcu.jinja2' %} {% set connection_command = 'telnet lava-disp-1.local 7000' %} {% set power_on_command = 'relay-ctrl --relay 1 --state on' %} {% set power_off_command = 'relay-ctrl --relay 1 --state off' %} {% set hard_reset_command = 'relay-ctrl --relay 1 --toggle' %} Boot action block from job definition: - boot: timeout: minutes: 5 method: u-boot commands: ramdisk auto_login: login_prompt: 'am335x-nmhw21 login: ' username: root prompts: - 'fct@am335x-nmhw21:~# ' Have I misconfigured something? What I’m missing? Thanks! Best regards, Andrejs Cainikovs.

6 years, 7 months

timeouts for test action not working

by Kevin Hilman

Hello, I have a LAVA job with long running test, so I put a timeout of 180 minutes in the test action itself[1]. However, but the job times out after 3989 seconds (~ 66 min)[2]. Looking closer at the "Timing" section of the job, I see that lava-test-shell indeed has a timeout of ~3989 seconds, but I have no idea where that number comes from. That's neither the 10 minutes in the default "timeouts" section, nor the 180 minutes I put in the "test" action. Hmm, after almost pushing send on this, I now seeing that in the "timeouts" sections, the whole job has a timeout of 70 minutes. So, I assume that means an absolute max, even if one of the actions puts a higher timeout? So I guess this email now turns into a feature request rather than a bug report. Maybe LAVA should show a warning at the top of the job if any of the actions has a timeout that's longer than the job timeout. Kevin [1] http://lava.baylibre.com:10080/scheduler/job/60374/definition#defline89 [2] http://lava.baylibre.com:10080/scheduler/job/60374#results_694663

6 years, 7 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

lava-users December 2018