On Tue, 29 Jan 2019 at 12:30, Diego Russo Diego.Russo@arm.com wrote:
Apologies the top posting.
I think I’ve come up with a “workable” solution and I need some validation.
- I create a bridge (brctl addbr brusb0)
- I add the enp0s12u4 interface to the bridge (brctl addif brusb0 enp0s12u4)
- Expose the brusb0 to the container. The following is specified in the config file when creating the container
lxc.network.1.type = veth lxc.network.1.link = brusb0 lxc.network.1.name = usb0 lxc.network.1.flags = up
OK, so the provisos here relate to the implementation across all LXCs on that worker. That limits the portability of your solution and puts all of the burden of triage on test job failures onto either your setup or another local setup. For the benefit of everyone else on this list, it is worth stressing that this is not recommended practice.
- At reboot of the warp7 the bridge remains up and as soon as the interface enp0s12u4 is up again we need to re-add to the bridge.
That sounds workable. There are lots of tools which will wait until a network connection is available again after some kind of interruption.
It would be advisable to record the reboot and log a test case failure each time - how you do that will depend on the rest of this setup.
I guess the brusb0 and enp0s12u4 are property of the board and they should belong to the device dictionary of the board.
How can I automate such a thing with LAVA?
In theory, this should "just work". I would start with something which can emulate the network connection and prove out the support for detecting, reporting and waiting for the connection to return at unpredictable moments. That could be done in a stand-alone QEMU test job, for example, with the LAVA test job trying to maintain a connection to an external service under your control.
If you're going to put lxc.network.1.name = usb0 into the global configuration of the worker, it becomes less relevant to constrain brusb0 and enp0s12u4 to the device configuration. (On the assumption that this local setup is only likely to have/need a single device for this test at any one time.) You could put brusb0 and enp0s12u4 as test job parameters as an alternative to static in the device dictionary. This makes it very easy to access these names inside the LAVA test definition.
Your LXC configuration means that you don't need to run anything on the device. It also means that the sequence of operations is driven from the test action in the LXC and is therefore unaffected by the device rebooting (as long as your scripts can wait for the network to come back).
On 29 Jan 2019, at 09:54, Diego Russo Diego.Russo@arm.com wrote:
On 29 Jan 2019, at 08:56, Neil Williams neil.williams@linaro.org wrote:
On Mon, 28 Jan 2019 at 17:50, Diego Russo Diego.Russo@arm.com wrote:
On 28 Jan 2019, at 16:37, Neil Williams neil.williams@linaro.org wrote: On Mon, 28 Jan 2019 at 16:11, Diego Russo Diego.Russo@arm.com wrote:
On 28 Jan 2019, at 11:20, Neil Williams neil.williams@linaro.org wrote: On Mon, 28 Jan 2019 at 11:02, Diego Russo Diego.Russo@arm.com wrote:
Hello,
I have the following setup: a WaRP7 which exposes a network connection over USB gadget driver (http://trac.gateworks.com/wiki/linux/OTG#g_etherGadget)
As long as the device is capable of raising the network interface from a POSIX test action on the device, there is no need to even care that this is a USB anything. It's TCP/IP and that's all that any other test action needs to know.
Exactly, I’m passing the usb0 interface to the container having the following in /et/lxc/default.conf
lxc.network.1.type = phys lxc.network.1.link = usb0 lxc.network.1.name = usb0 lxc.network.1.flags = up
That contaminates EVERY LXC test job with usb0 which is never going to be acceptable. Do NOT do this, under any circumstances.
This needs to be test-job specific, i.e. defined in the device dictionary and managed via a udev rule written by LAVA, which then also covers re-adding to the LXC automatically. However, there is actually no reason to even care about USB, so the whole issue goes away.
I know this is going to affect every LXC container in the slave bu in this specific case I had a one-to-one WaRP7-slave relationship.
This means the usb0 network interfaces will be passed to the container as usb0. This works as far as the usb0 interface exists on the host.
Please think carefully and describe EXACTLY what you are aiming to do because it sounds like there is confusion here about how to interact with the device.
My aim for this specific test is:
- “Install" an application on the slave which interacts with the WaRP7
That would be better done as a custom docker image or a custom VM which interacts with the device solely over TCP/IP. Installing things takes time - better to start a pre-installed container or VM.
The device, when booted, broadcasts a MAC address and gets an IP address from DHCP. That DHCP can be configured to always give the same IP address for the same MAC address. This IP address is then defined in the device dictionary.
The device is then one node, with no LXC and no USB handling or udev rules or system-wide LXC changes.
A second node is defined which addresses the device using the IP address.
The two nodes are defined in a MultiNode test job.
There is no DHCP service involved. On the WaRP7 side the usb0 interface will be setting a link-local ipv4 and ipv6 address. The mac address of this interface is generated random at every boot, hence the link-local address changes too.
- this application is a CLI application which interacts with the WaRP7 using the usb0 interface
It sounds like you are confusing USB with usb0 and TCP/IP. The CLI application interacts with the device using TCP/IP. Thinking of that as usb leads you into the problem of adding a USB device which probably isn't even necessary.
No, I’m not confusing it, I didn’t explain very well. When I say usb0 I mean the actual network interface (see above) I have the same network interface on the host side as well.
The result of using the host side interface is that the host side kernel now becomes a much more significant part of the testjob than in any other test method. The running kernel on the lava-slave is involved in some small way in all test jobs but the usb0 host interface tasks the kernel of the lava-slave to translating TCP/IP traffic to USB and the test job has no control over that kernel. This needs to be taken into account in your Test Plan. It's not actually a network connection, it's a faked up serial connection - as demonstrated by the rotating MAC address problem.
The g_ether module on the warp7 can control the the mac address on the host side.
This is happening on the Warp7
... ... [ 35.612943] using random self ethernet address [ 35.617697] using random host ethernet address [ 35.622180] using host ethernet address: 56:ed:e4:2f:ae:c2 [ 35.654777] usb0: HOST MAC 56:ed:e4:2f:ae:c2 [ 35.698912] usb0: MAC 8e:4a:a8:dd:81:5e [ 35.703505] using random self ethernet address [ 35.708138] using random host ethernet address [ 35.718986] g_ether gadget: Ethernet Gadget, version: Memorial Day 2008 [ 35.725637] g_ether gadget: g_ether ready [ 36.015483] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready udhcpc: started, v1.29.2 [ 36.157310] g_ether gadget: high-speed config #1: CDC Ethernet (ECM) [ 36.166902] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready …
I can control the mac address on the host via
root@warp7:~# cat /etc/modprobe.d/g_ether.conf options g_ether host_addr=56:ed:e4:2f:ae:c2
So the mac address will be persistent at every reboot.
I don’t know if this can help.
root@mbl-lava-dispatcher-3:~# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 08:00:27:07:4c:66 brd ff:ff:ff:ff:ff:ff 61: lxcbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 00:16:3e:00:00:00 brd ff:ff:ff:ff:ff:ff 122: usb0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 4e:2e:50:11:be:7e brd ff:ff:ff:ff:ff:ff
Note the mac address of this interfaces changes EVERY time.
Another note: on the host side, for testing purposes I’ve disable the predictable network interface name. In this way my interface name is always usb0
https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfac...
By default is enabled and the interface name might change at every reboot of the board (enp0s12u4)
Just in passing, the whole point of the systemd predictable names support is that "the names are fully automatic, fully predictable, that they stay fixed even if hardware is added or removed (i.e. no reenumeration takes place) and that broken hardware can be replaced seamlessly." So enp0s12u4 is meant to be predictable and stable across reboots and enumeration. If that doesn't happen, it's a bug somewhere in the device kernel support.
I need to investigate this better what’s going on because we are testing with Virtualbox VMs and I’ve seen some inconsistent behaviour.
Or is your CLI not using networking at all but hacking into the kernel networking stack directly? Is the CLI trying to open the usb0 interface as a USB device? Why?
My cli application uses mDNS/avahi over the network, so it doesn’t do anything special with the kernel. It’s a standard application-level binary which at the end uses sockets. Again, when I say usb0 interface I mean the network interface with usb0. We don’t even specify which interface to use as for it usb0 is yet another network interface. It works both with IPv4 and IPv6.
- Flash and boot the Warp7
- Tests are run ON the lava slave using the application installed earlier
Tests can just as easily be run in a docker image or a VM - it doesn't need to be on the lava slave at all, as long as it can see the TCP/IP address. This way, you can debug your test definitions by running the same image against a device on your desk, outside of LAVA.
For this reason I wanted to use LXC support in LAVA and just wanted to use usb0 network interface within that container.
- the application uses the usb0 interface on the slave
The application uses the IP address raised on whatever interface the device is configured to use, in this case it happens to be called usb0 but the application has no idea how it is implemented, it's standard TCP/IP.
Exactly, the application uses usb0 interface as standard TCP/IP interface.
- There are no tests running on the Warp7 but this might be rebooted while running above tests
So the application needs to be able to buffer until the device comes back on the same IP address - that's manageable via DHCP and the MAC address. Once the other node has the IP address of the device, it doesn't matter what the device does - providing the device always re-establishes the TCP/IP connection.
As I said earlier there are no DHCP server involved. Every works with link-local IP (both IPv4 and IPv6) and those change at ever reboot of the board (which is fine: we can rescan it via mdns/avahi).
The USB gadget interface is the wrong side of the interface - you already have a POSIX test action running on the device, so use that to raise and configure the TCP/IP by accessing the relevant driver support directly on the device.
On the device I don’t need to run anything. The “issue” is on the host (lava slave) side.
I disagree. The issue is the confusion of /dev/bus/usb and usb0 - along with the mistake of putting testjob specific configuration into a system-wide file.
As stated earlier, the device comes up with the right settings already: usb0 up and running with IPv4 and IPv6 local-link addresses.
A possible test case is to have some process running on the LAVA dispatcher (within a LXC container) which targets the WaRP7 over this network interface.
LXC support does not provide any means of synchronisation across test actions. Strict sequence only. If anything isn't ready, the test definition will either have to just cope with the situation or fail the entire test job.
Why are you trying to do USB device passthrough when you have this network interface? This device doesn't need an LXC to run a standard test job. https://staging.validation.linaro.org/scheduler/job/248129
Therefore, avoid using the LXC protocol in the first place and communicate over the network. You'll need to declare the IP address of the device but that's a standard MultiNode API call from a POSIX shell on the device.
The process running the test case does NOT have to be on the LAVA dispatcher if it is targetting the device over TCP/IP. All it needs is the IP address, nothing USB at all.
In our case we don’t need to run anything on the WaRP7. The WaRP7 just needs to be up and running and be visible via usb0 from the dispatcher.
The device just needs to be configured to automatically raise a network interface and get an IP address when booted. What interface that uses is completely irrelevant. This makes it trivial to test with a different kind of device or two docker images or to QEMU VMs etc.
Tests are run on lava dispatcher
Tests would be better run in a dedicated container. Quicker and easier to reproduce.
Yes, what I meant that tests are running off target. For this reason I looked into LXC containers as LAVA already supports them.
WaRP7 <—> usb0 net iface <———> usb0 net iface <—> LAVA slave
device <--> TCP/IP <--> container.
The LAVA slave does not need to have any part in this (apart from running two test jobs).
Unfortunately it has: WaRP7 doesn’t have any wired network interface and it exposes the usb0 interface over the USB power cable which is connected to the LAVA slave. The same cable is used to flash the board (via uboot ums)
This WArp7 has a physical connection via USB with the LAVA slave (from the OS point of view is a yet another network interface) and I don’t think it can be reached by other nodes.
You will have to try and provide more information for udev on the slave to manage the addition of the device to the LXC and this must be done dynamically, not statically for all containers.
This can be done by adopting support from AOSP testing to attempt to use secondary udev IDs to identify the device dynamically:
{% set device_info = [{'board_id': '0123456789', 'usb_vendor_id': '0451', 'usb_product_id': 'd109'}] %} https://master.lavasoftware.org/static/docs/v2/admin-lxc-deploy.html#android...
I tried to do something like that, but it doesn’t work (as expected)
{% set device_info = [{'board_id': '0', 'usb_vendor_id’: ‘0525', 'usb_product_id': ‘a4a2'}] %}
If this does not work, then full automation of this use case will not be possible, until such time as the hardware is modified. You will have to adopt a semi-automated approach where a human deals with the consequences of USB re-enumeration if the device reboots or eliminate all causes of reboots from the Test Plan and make all *unanticipated* reboots into a test failure.
This shows a design fault in the hardware to not uniquely identify itself to udev, resulting in a lack of hardware support for automation.
Unfortunately we cannot modify the hw.
I think the only option I have is to treat it as a network interface and do something at network level.
The USB node is of no concern. The device can be booted without needing any LXC and it can be configured to raise usb0 at boot. It can be configured to get a DHCP IP address at boot.
The only thing anything outside the device needs to know about it the IP address and that is configured by allocating an address in the DHCP config of the lab.
There is no DHCP service involved: as soon as the boot is up and running this can be discovered via IPv6 with the local-link address.
Through LXC I'm able to passtrhough this interface from the host to the container and use it within the container (via /etc/lxc/default.conf)
How are you passing it through? If the device is dynamic, you must declare the board_id of the device in the device dictionary so that LAVA will create a suitable udev rule to add the re-enumerated device back to the LXC when udev sees an ADD event.
I’m passing the usb0 network interface to LXC as stated at the beginning of the email. usb0 is just.
I tried also to use the board_id but unfortunately it doesn’t have any iSerial (only usb and vendo id). Better the iSerial field is 0.
All the more reason to disregard the entire /dev/bus/usb issue and use TCP/IP as standalone.
Agreed.
If a test requires the reboot of the WaRP7, the usb0 interface disappears from the container. When the WaRP7 boots again the usb0 interface is available on the host (but not in the container).
The usb0 interface is accessible from the device and you're already running a POSIX shell in the test action on the device, so that test action needs to take care of re-establishing the network connection (and possibly re-declaring the IP address to the other node).
Again, the problem is on the container side. As soon as the board is up and running I have usb0 network interface both on WaRP7 and host. The container though loses visibility.
Things I tried or thought about:
- I tried synchronizing boots both of the WaRP7 and LXC container but it seems not possible to "reboot" (restart) a container within the same job execution.
- Is it possible to "restart" a container during a job execution?
No. This has nothing to do with the start of the LXC.
Well it does because if I restart the LXC container AFTER the board has rebooted, usb0 is re-passed through and it has visibility of this network interface.
You cannot contaminate every LXC ever run on that lava-slave with the usb0 device details - do not make changes to /etc/lxc/default.conf - that cannot scale.
Provided I don’t do that, how can I pass the enp0s12u4 to the LXC container?
- Outside LAVA it is possible to run a command (lxc-device --name diegor-test -- add usb0) which re-passthrough the interface from Linux to LXC container.
- Is it possible to run the above command ad job execution time on the lava dispatcher?
How can I solve this situation?
If you do want to do passthrough:
https://master.lavasoftware.org/static/docs/v2/admin-lxc-deploy.html#deployi...
https://lava.codehelp.co.uk/scheduler/job/4313#action_1-2
https://lava.codehelp.co.uk/scheduler/device/tom/devicedict#defline11
As said earlier, the board_id is 0.
Then passthrough is undermined by the broken hardware / firmware.
I don’t know why, but WaRP7 has been designed in this way, it doesn’t expose either an ID over the serial (even though has a FDTI chip on it).
If you want to use MultiNode, use a QEMU device as the second node which communicates with the other node using the MultiNode API.
https://master.lavasoftware.org/static/docs/v2/multinode.html
I think using the multinode won’t help for this specific case.
Cheers
-- Diego Russo Staff Software Engineer - diego.russo@arm.com Direct Tel. no: +44 1223 405920 Main Tel. no: +44 1223 400400 ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - http://twitter.com/diegor http://www.linkedin.com/in/diegor
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
-- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
-- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
-- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ Lava-users mailing list Lava-users@lists.lavasoftware.org https://lists.lavasoftware.org/mailman/listinfo/lava-users
-- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
--
Neil Williams ============= neil.williams@linaro.org http://www.linux.codehelp.co.uk/