Apologies the top posting.
I think I’ve come up with a “workable” solution and I need some validation.
* I create a bridge (brctl addbr brusb0)
* I add the enp0s12u4 interface to the bridge (brctl addif brusb0 enp0s12u4)
* Expose the brusb0 to the container. The following is specified in the config file when creating the container
lxc.network.1.type = veth
lxc.network.1.link = brusb0
lxc.network.1.name = usb0
lxc.network.1.flags = up
* At reboot of the warp7 the bridge remains up and as soon as the interface enp0s12u4 is up again we need to re-add to the bridge.
I guess the brusb0 and enp0s12u4 are property of the board and they should belong to the device dictionary of the board.
How can I automate such a thing with LAVA?
On 29 Jan 2019, at 08:56, Neil Williams <neil.williams@linaro.org> wrote:
On Mon, 28 Jan 2019 at 17:50, Diego Russo <Diego.Russo@arm.com> wrote:
On 28 Jan 2019, at 16:37, Neil Williams <neil.williams@linaro.org> wrote:
On Mon, 28 Jan 2019 at 16:11, Diego Russo <Diego.Russo@arm.com> wrote:
On 28 Jan 2019, at 11:20, Neil Williams <neil.williams@linaro.org> wrote:
On Mon, 28 Jan 2019 at 11:02, Diego Russo <Diego.Russo@arm.com> wrote:
Hello,
I have the following setup: a WaRP7 which exposes a network connection over USB gadget driver (http://trac.gateworks.com/wiki/linux/OTG#g_etherGadget)
As long as the device is capable of raising the network interface from
a POSIX test action on the device, there is no need to even care that
this is a USB anything. It's TCP/IP and that's all that any other test
action needs to know.
Exactly, I’m passing the usb0 interface to the container having the following in /et/lxc/default.conf
lxc.network.1.type = phys
lxc.network.1.link = usb0
lxc.network.1.name = usb0
lxc.network.1.flags = up
That contaminates EVERY LXC test job with usb0 which is never going to
be acceptable. Do NOT do this, under any circumstances.
This needs to be test-job specific, i.e. defined in the device
dictionary and managed via a udev rule written by LAVA, which then
also covers re-adding to the LXC automatically. However, there is
actually no reason to even care about USB, so the whole issue goes
away.
I know this is going to affect every LXC container in the slave bu in this specific case I had a one-to-one WaRP7-slave relationship.
This means the usb0 network interfaces will be passed to the container as usb0. This works as far as the usb0 interface exists on the host.
Please think carefully and describe EXACTLY what you are aiming to do
because it sounds like there is confusion here about how to interact
with the device.
My aim for this specific test is:
* “Install" an application on the slave which interacts with the WaRP7
That would be better done as a custom docker image or a custom VM
which interacts with the device solely over TCP/IP. Installing things
takes time - better to start a pre-installed container or VM.
The device, when booted, broadcasts a MAC address and gets an IP
address from DHCP. That DHCP can be configured to always give the same
IP address for the same MAC address. This IP address is then defined
in the device dictionary.
The device is then one node, with no LXC and no USB handling or udev
rules or system-wide LXC changes.
A second node is defined which addresses the device using the IP address.
The two nodes are defined in a MultiNode test job.
There is no DHCP service involved. On the WaRP7 side the usb0 interface will be setting a link-local ipv4 and ipv6 address.
The mac address of this interface is generated random at every boot, hence the link-local address changes too.
* this application is a CLI application which interacts with the WaRP7 using the usb0 interface
It sounds like you are confusing USB with usb0 and TCP/IP. The CLI
application interacts with the device using TCP/IP. Thinking of that
as usb leads you into the problem of adding a USB device which
probably isn't even necessary.
No, I’m not confusing it, I didn’t explain very well. When I say usb0 I mean the actual network interface (see above)
I have the same network interface on the host side as well.
The result of using the host side interface is that the host side
kernel now becomes a much more significant part of the testjob than in
any other test method. The running kernel on the lava-slave is
involved in some small way in all test jobs but the usb0 host
interface tasks the kernel of the lava-slave to translating TCP/IP
traffic to USB and the test job has no control over that kernel. This
needs to be taken into account in your Test Plan. It's not actually a
network connection, it's a faked up serial connection - as
demonstrated by the rotating MAC address problem.
The
g_ether module on the warp7 can control the the mac address on the host side.
This
is happening on the Warp7
...
...
[
35.612943] using random self ethernet address
[
35.617697] using random host ethernet address
[
35.622180] using host ethernet address: 56:ed:e4:2f:ae:c2
[
35.654777] usb0: HOST MAC 56:ed:e4:2f:ae:c2
[
35.698912] usb0: MAC 8e:4a:a8:dd:81:5e
[
35.703505] using random self ethernet address
[
35.708138] using random host ethernet address
[
35.718986] g_ether gadget: Ethernet Gadget, version: Memorial Day 2008
[
35.725637] g_ether gadget: g_ether ready
[
36.015483] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
udhcpc:
started, v1.29.2
[
36.157310] g_ether gadget: high-speed config #1: CDC Ethernet (ECM)
[
36.166902] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready
…
I
can control the mac address on the host via
root@warp7:~#
cat /etc/modprobe.d/g_ether.conf
options
g_ether host_addr=56:ed:e4:2f:ae:c2
So
the mac address will be persistent at every reboot.
I
don’t know if this can help.
root@mbl-lava-dispatcher-3:~# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 08:00:27:07:4c:66 brd ff:ff:ff:ff:ff:ff
61: lxcbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/ether 00:16:3e:00:00:00 brd ff:ff:ff:ff:ff:ff
122: usb0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 4e:2e:50:11:be:7e brd ff:ff:ff:ff:ff:ff
Note the mac address of this interfaces changes EVERY time.
Another note: on the host side, for testing purposes I’ve disable the predictable network interface name. In this way my interface name is always usb0
https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
By default is enabled and the interface name might change at every reboot of the board (enp0s12u4)
Just in passing, the whole point of the systemd predictable names
support is that "the names are fully automatic, fully predictable,
that they stay fixed even if hardware is added or removed (i.e. no
reenumeration takes place) and that broken hardware can be replaced
seamlessly." So enp0s12u4 is meant to be predictable and stable across
reboots and enumeration. If that doesn't happen, it's a bug somewhere
in the device kernel support.
I
need to investigate this better what’s going on because we are testing with Virtualbox VMs and I’ve seen some inconsistent behaviour.
Or is your CLI not using networking at all but hacking into the kernel
networking stack directly? Is the CLI trying to open the usb0
interface as a USB device? Why?
My cli application uses mDNS/avahi over the network, so it doesn’t do anything special with the kernel.
It’s a standard application-level binary which at the end uses sockets.
Again, when I say usb0 interface I mean the network interface with usb0.
We don’t even specify which interface to use as for it usb0 is yet another network interface.
It works both with IPv4 and IPv6.
* Flash and boot the Warp7
* Tests are run ON the lava slave using the application installed earlier
Tests can just as easily be run in a docker image or a VM - it doesn't
need to be on the lava slave at all, as long as it can see the TCP/IP
address. This way, you can debug your test definitions by running the
same image against a device on your desk, outside of LAVA.
For this reason I wanted to use LXC support in LAVA and just wanted to use usb0 network interface within that container.
* the application uses the usb0 interface on the slave
The application uses the IP address raised on whatever interface the
device is configured to use, in this case it happens to be called usb0
but the application has no idea how it is implemented, it's standard
TCP/IP.
Exactly, the application uses usb0 interface as standard TCP/IP interface.
* There are no tests running on the Warp7 but this might be rebooted while running above tests
So the application needs to be able to buffer until the device comes
back on the same IP address - that's manageable via DHCP and the MAC
address. Once the other node has the IP address of the device, it
doesn't matter what the device does - providing the device always
re-establishes the TCP/IP connection.
As I said earlier there are no DHCP server involved. Every works with link-local IP (both IPv4 and IPv6)
and those change at ever reboot of the board (which is fine: we can rescan it via mdns/avahi).
The USB gadget interface is the wrong side of the interface - you
already have a POSIX test action running on the device, so use that to
raise and configure the TCP/IP by accessing the relevant driver
support directly on the device.
On the device I don’t need to run anything. The “issue” is on the host (lava slave) side.
I disagree. The issue is the confusion of /dev/bus/usb and usb0 -
along with the mistake of putting testjob specific configuration into
a system-wide file.
As stated earlier, the device comes up with the right settings already: usb0 up and running with IPv4 and IPv6 local-link addresses.
A possible test case is to have some process running on the LAVA dispatcher (within a LXC container) which targets the WaRP7 over this network interface.
LXC support does not provide any means of synchronisation across test
actions. Strict sequence only. If anything isn't ready, the test
definition will either have to just cope with the situation or fail
the entire test job.
Why are you trying to do USB device passthrough when you have this
network interface? This device doesn't need an LXC to run a standard
test job.
https://staging.validation.linaro.org/scheduler/job/248129
Therefore, avoid using the LXC protocol in the first place and
communicate over the network. You'll need to declare the IP address of
the device but that's a standard MultiNode API call from a POSIX shell
on the device.
The process running the test case does NOT have to be on the LAVA
dispatcher if it is targetting the device over TCP/IP. All it needs
is the IP address, nothing USB at all.
In our case we don’t need to run anything on the WaRP7. The WaRP7 just needs to be up and running and be visible via usb0 from the dispatcher.
The device just needs to be configured to automatically raise a
network interface and get an IP address when booted. What interface
that uses is completely irrelevant. This makes it trivial to test with
a different kind of device or two docker images or to QEMU VMs etc.
Tests are run on lava dispatcher
Tests would be better run in a dedicated container. Quicker and easier
to reproduce.
Yes, what I meant that tests are running off target. For this reason I looked into LXC containers as LAVA already supports them.
WaRP7 <—> usb0 net iface <———> usb0 net iface <—> LAVA slave
device <--> TCP/IP <--> container.
The LAVA slave does not need to have any part in this (apart from
running two test jobs).
Unfortunately it has: WaRP7 doesn’t have any wired network interface and it exposes the usb0 interface over the USB power cable which is connected to the LAVA slave. The same cable is used to flash the board (via uboot ums)
This WArp7 has a physical connection via USB with the LAVA slave (from the OS point of view is a yet another network interface) and I don’t think it can be reached by other nodes.
You will have to try and provide more information for udev on the
slave to manage the addition of the device to the LXC and this must be
done dynamically, not statically for all containers.
This can be done by adopting support from AOSP testing to attempt to
use secondary udev IDs to identify the device dynamically:
{% set device_info = [{'board_id': '0123456789', 'usb_vendor_id':
'0451', 'usb_product_id': 'd109'}] %}
https://master.lavasoftware.org/static/docs/v2/admin-lxc-deploy.html#android-testing-with-lxc-support
I
tried to do something like that, but it doesn’t work (as expected)
{%
set device_info = [{'board_id': '0', 'usb_vendor_id’: ‘0525', 'usb_product_id': ‘a4a2'}] %}
If this does not work, then full automation of this use case will not
be possible, until such time as the hardware is modified. You will
have to adopt a semi-automated approach where a human deals with the
consequences of USB re-enumeration if the device reboots or eliminate
all causes of reboots from the Test Plan and make all *unanticipated*
reboots into a test failure.
This shows a design fault in the hardware to not uniquely identify
itself to udev, resulting in a lack of hardware support for
automation.
Unfortunately
we cannot modify the hw.
I
think the only option I have is to treat it as a network interface and do something at network level.
The USB node is of no concern. The device can be booted without
needing any LXC and it can be configured to raise usb0 at boot. It can
be configured to get a DHCP IP address at boot.
The only thing anything outside the device needs to know about it the
IP address and that is configured by allocating an address in the DHCP
config of the lab.
There is no DHCP service involved: as soon as the boot is up and running this can be discovered via IPv6 with the local-link address.
Through LXC I'm able to passtrhough this interface from the host to the container and use it within the container (via /etc/lxc/default.conf)
How are you passing it through? If the device is dynamic, you must
declare the board_id of the device in the device dictionary so that
LAVA will create a suitable udev rule to add the re-enumerated device
back to the LXC when udev sees an ADD event.
I’m passing the usb0 network interface to LXC as stated at the beginning of the email.
usb0 is just.
I tried also to use the board_id but unfortunately it doesn’t have any iSerial (only usb and vendo id). Better the iSerial field is 0.
All the more reason to disregard the entire /dev/bus/usb issue and use
TCP/IP as standalone.
Agreed.
If a test requires the reboot of the WaRP7, the usb0 interface disappears from the container. When the WaRP7 boots again the usb0 interface is available on the host (but not in the container).
The usb0 interface is accessible from the device and you're already
running a POSIX shell in the test action on the device, so that test
action needs to take care of re-establishing the network connection
(and possibly re-declaring the IP address to the other node).
Again, the problem is on the container side. As soon as the board is up and running I have usb0 network interface both on WaRP7 and host.
The container though loses visibility.
Things I tried or thought about:
* I tried synchronizing boots both of the WaRP7 and LXC container but it seems not possible to "reboot" (restart) a container within the same job execution.
* Is it possible to "restart" a container during a job execution?
No. This has nothing to do with the start of the LXC.
Well it does because if I restart the LXC container AFTER the board has rebooted, usb0 is re-passed through and it has visibility of this network interface.
You cannot contaminate every LXC ever run on that lava-slave with the
usb0 device details - do not make changes to /etc/lxc/default.conf -
that cannot scale.
Provided I don’t do that, how can I pass the enp0s12u4 to the LXC container?
* Outside LAVA it is possible to run a command (lxc-device --name diegor-test -- add usb0) which re-passthrough the interface from Linux to LXC container.
* Is it possible to run the above command ad job execution time on the lava dispatcher?
How can I solve this situation?
If you do want to do passthrough:
https://master.lavasoftware.org/static/docs/v2/admin-lxc-deploy.html#deploying-lxc-devices
https://lava.codehelp.co.uk/scheduler/job/4313#action_1-2
https://lava.codehelp.co.uk/scheduler/device/tom/devicedict#defline11
As said earlier, the board_id is 0.
Then passthrough is undermined by the broken hardware / firmware.
I don’t know why, but WaRP7 has been designed in this way, it doesn’t expose either an ID over the serial (even though has a FDTI chip on it).
If you want to use MultiNode, use a QEMU device as the second node
which communicates with the other node using the MultiNode API.
https://master.lavasoftware.org/static/docs/v2/multinode.html
I think using the multinode won’t help for this specific case.
Cheers
--
Diego Russo
Staff Software Engineer - diego.russo@arm.com
Direct Tel. no: +44 1223 405920
Main Tel. no: +44 1223 400400
ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom
http://www.diegor.co.uk -
http://twitter.com/diegor
http://www.linkedin.com/in/diegor
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose,
or store or copy the information in any medium. Thank you.
_______________________________________________
Lava-users mailing list
Lava-users@lists.lavasoftware.org
https://lists.lavasoftware.org/mailman/listinfo/lava-users
--
Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/
--
Diego Russo | Staff Software Engineer | Mbed Linux OS
ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom
http://www.diegor.co.uk -
https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose,
or store or copy the information in any medium. Thank you.
--
Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/
--
Diego Russo | Staff Software Engineer | Mbed Linux OS
ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom
http://www.diegor.co.uk -
https://os.mbed.com/linux-os/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose,
or store or copy the information in any medium. Thank you.
--
Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/
--
Diego
Russo | Staff Software Engineer | Mbed Linux OS
ARM
Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom
http://www.diegor.co.uk - https://os.mbed.com/linux-os/
IMPORTANT
NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or
store or copy the information in any medium. Thank you.
_______________________________________________
Lava-users
mailing list
Lava-users@lists.lavasoftware.org
https://lists.lavasoftware.org/mailman/listinfo/lava-users
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose,
or store or copy the information in any medium. Thank you.