Hello,
We've been using uboot-ums for WaRP7 but we've been having intermittent failures when it tried to run dd to flash the image. Provided we need to look better into the root cause of this issue, we'd like to make the flashing phase a little more reliable. I have few questions, coming from different angles: * LAVA uses dd command to flash the image. Is there a way to specify the usage of bmap-tools? * let's say dd times out (this is what usually happen). Is there a mechanism to restart the actions (deploy and boot) in case of timeout?
If you have any other suggestion, let me know!
Cheers -- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Fri, 8 Feb 2019 at 14:48, Diego Russo Diego.Russo@arm.com wrote:
Hello,
We've been using uboot-ums for WaRP7 but we've been having intermittent failures when it tried to run dd to flash the image. Provided we need to look better into the root cause of this issue, we'd like to make the flashing phase a little more reliable.
The fix for that would not seem to be within LAVA but within the firmware or the device.
I have few questions, coming from different angles:
- LAVA uses dd command to flash the image. Is there a way to specify the usage of bmap-tools?
Not currently. It is set in https://git.lavasoftware.org/lava/lava/blob/master/lava_dispatcher/utils/sto...
That could be looked at but bmap-tools isn't necessarily going to fix the problem with the hardware - it's trying to do something more complex and pushing things faster. What is the equivalent syntax?
- let's say dd times out (this is what usually happen). Is there a mechanism to restart the actions (deploy and boot) in case of timeout?
Not yet - there is related support in development to ensure that a timeout kills the process properly: https://git.lavasoftware.org/lava/lava/merge_requests/355
However, a timeout on a deployment is still a fatal event. This isn't a third-party problem, as with downloads or creating an LXC, this is the device under test not being sufficiently reliable - that is a test job failure due to an infrastructure error. It's equivalent to TFTP timing out or fastboot failing to flash an image. LAVA is correctly halting at this point - the original test job will need to be resubmitted to attempt the deploy again before going on to a boot action. The solution for intermittent errors is not to ignore the error.
If you have any other suggestion, let me know!
Cheers
Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
On 8 Feb 2019, at 15:02, Neil Williams neil.williams@linaro.org wrote:
On Fri, 8 Feb 2019 at 14:48, Diego Russo Diego.Russo@arm.com wrote:
Hello,
We've been using uboot-ums for WaRP7 but we've been having intermittent failures when it tried to run dd to flash the image. Provided we need to look better into the root cause of this issue, we'd like to make the flashing phase a little more reliable.
The fix for that would not seem to be within LAVA but within the firmware or the device.
I have few questions, coming from different angles:
- LAVA uses dd command to flash the image. Is there a way to specify the usage of bmap-tools?
Not currently. It is set in https://git.lavasoftware.org/lava/lava/blob/master/lava_dispatcher/utils/sto...
That could be looked at but bmap-tools isn't necessarily going to fix the problem with the hardware - it's trying to do something more complex and pushing things faster. What is the equivalent syntax?
We’ve seen bmap having less failure rate or at least it fails properly (dd instead just hangs) The syntax is something similar
$ bmaptool copy image.wic.gz --bmap image.wic.bmap /dev/blockdevice
- let's say dd times out (this is what usually happen). Is there a mechanism to restart the actions (deploy and boot) in case of timeout?
Not yet - there is related support in development to ensure that a timeout kills the process properly: https://git.lavasoftware.org/lava/lava/merge_requests/355
However, a timeout on a deployment is still a fatal event. This isn't a third-party problem, as with downloads or creating an LXC, this is the device under test not being sufficiently reliable - that is a test job failure due to an infrastructure error. It's equivalent to TFTP timing out or fastboot failing to flash an image. LAVA is correctly halting at this point - the original test job will need to be resubmitted to attempt the deploy again before going on to a boot action. The solution for intermittent errors is not to ignore the error.
My intention is not to ignore errors, but to have a better re-try mechanism (instead of resubmit the whole job).
If you have any other suggestion, let me know!
Cheers
Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
-- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello Diego,
I create an issue to investigate if using bmap-tools instead of dd is worth it or not: https://git.lavasoftware.org/lava/lava/issues/234
Cheers
Le ven. 8 févr. 2019 à 17:14, Diego Russo Diego.Russo@arm.com a écrit :
On 8 Feb 2019, at 15:02, Neil Williams neil.williams@linaro.org wrote:
On Fri, 8 Feb 2019 at 14:48, Diego Russo Diego.Russo@arm.com wrote:
Hello,
We've been using uboot-ums for WaRP7 but we've been having intermittent
failures when it tried to run dd to flash the image.
Provided we need to look better into the root cause of this issue, we'd
like to make the flashing phase a little more reliable.
The fix for that would not seem to be within LAVA but within the firmware or the device.
I have few questions, coming from different angles:
- LAVA uses dd command to flash the image. Is there a way to specify
the usage of bmap-tools?
Not currently. It is set in
https://git.lavasoftware.org/lava/lava/blob/master/lava_dispatcher/utils/sto...
That could be looked at but bmap-tools isn't necessarily going to fix the problem with the hardware - it's trying to do something more complex and pushing things faster. What is the equivalent syntax?
We’ve seen bmap having less failure rate or at least it fails properly (dd instead just hangs) The syntax is something similar
$ bmaptool copy image.wic.gz --bmap image.wic.bmap /dev/blockdevice
- let's say dd times out (this is what usually happen). Is there a
mechanism to restart the actions (deploy and boot) in case of timeout?
Not yet - there is related support in development to ensure that a timeout kills the process properly: https://git.lavasoftware.org/lava/lava/merge_requests/355
However, a timeout on a deployment is still a fatal event. This isn't a third-party problem, as with downloads or creating an LXC, this is the device under test not being sufficiently reliable - that is a test job failure due to an infrastructure error. It's equivalent to TFTP timing out or fastboot failing to flash an image. LAVA is correctly halting at this point - the original test job will need to be resubmitted to attempt the deploy again before going on to a boot action. The solution for intermittent errors is not to ignore the error.
My intention is not to ignore errors, but to have a better re-try mechanism (instead of resubmit the whole job).
If you have any other suggestion, let me know!
Cheers
Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United
Kingdom
http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
-- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
Remi,
That’s great, thanks a lot! I’ll follow the issue closely.
Cheers
From: Remi Duraffort remi.duraffort@linaro.org Date: Tuesday, 19 February 2019 at 09:54 To: Diego Russo Diego.Russo@arm.com Cc: Neil Williams neil.williams@linaro.org, "lava-users@lists.lavasoftware.org" lava-users@lists.lavasoftware.org Subject: Re: [Lava-users] uboot ums flakiness
Hello Diego,
I create an issue to investigate if using bmap-tools instead of dd is worth it or not: https://git.lavasoftware.org/lava/lava/issues/234
Cheers
Le ven. 8 févr. 2019 à 17:14, Diego Russo <Diego.Russo@arm.commailto:Diego.Russo@arm.com> a écrit :
On 8 Feb 2019, at 15:02, Neil Williams <neil.williams@linaro.orgmailto:neil.williams@linaro.org> wrote:
On Fri, 8 Feb 2019 at 14:48, Diego Russo <Diego.Russo@arm.commailto:Diego.Russo@arm.com> wrote:
Hello,
We've been using uboot-ums for WaRP7 but we've been having intermittent failures when it tried to run dd to flash the image. Provided we need to look better into the root cause of this issue, we'd like to make the flashing phase a little more reliable.
The fix for that would not seem to be within LAVA but within the firmware or the device.
I have few questions, coming from different angles:
- LAVA uses dd command to flash the image. Is there a way to specify the usage of bmap-tools?
Not currently. It is set in https://git.lavasoftware.org/lava/lava/blob/master/lava_dispatcher/utils/sto...
That could be looked at but bmap-tools isn't necessarily going to fix the problem with the hardware - it's trying to do something more complex and pushing things faster. What is the equivalent syntax?
We’ve seen bmap having less failure rate or at least it fails properly (dd instead just hangs) The syntax is something similar
$ bmaptool copy image.wic.gz --bmap image.wic.bmap /dev/blockdevice
- let's say dd times out (this is what usually happen). Is there a mechanism to restart the actions (deploy and boot) in case of timeout?
Not yet - there is related support in development to ensure that a timeout kills the process properly: https://git.lavasoftware.org/lava/lava/merge_requests/355
However, a timeout on a deployment is still a fatal event. This isn't a third-party problem, as with downloads or creating an LXC, this is the device under test not being sufficiently reliable - that is a test job failure due to an infrastructure error. It's equivalent to TFTP timing out or fastboot failing to flash an image. LAVA is correctly halting at this point - the original test job will need to be resubmitted to attempt the deploy again before going on to a boot action. The solution for intermittent errors is not to ignore the error.
My intention is not to ignore errors, but to have a better re-try mechanism (instead of resubmit the whole job).
If you have any other suggestion, let me know!
Cheers
Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.orgmailto:Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.orgmailto:neil.williams@linaro.org http://www.linux.codehelp.co.uk/
-- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.orgmailto:Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
-- Rémi Duraffort LAVA Team, Linaro IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
lava-users@lists.lavasoftware.org