Just to tell you we still have not fixed this issue. I'm still waiting for your suggestions.
Thanks, Hedy Lamarr
On Mon, Jul 5, 2021 at 4:02 PM Hedy Lamarr lamarrhedy97@gmail.com wrote:
Hi, Remi, I just have one dispatcher, both devices linked to this dispatcher. And the issue continues after change to next:
{ "port": 3079, "blocksize": 4096, "poll_delay": 3, "coordinator_hostname": "10.191.253.109" }
On Thu, Jul 1, 2021 at 4:55 PM Remi Duraffort remi.duraffort@linaro.org wrote:
Hello,
by default, lava-coordinator is expected to run on localhost. So the second device will connect to localhost instead of the real host hosting coordinator. Change the host into /etc/lava-coordinator/lava-coordinator.conf
Le mar. 29 juin 2021 à 07:19, Hedy Lamarr lamarrhedy97@gmail.com a écrit :
The "ssh" device will ssh to another machine which is not on the same dispatcher to start a iperf server. The "dragonboard-410c" device will start a docker container, then in this container, it will call iperf client to connect to the iperf server.
- In lava admin page, I link both "ssh device" & " dragonboard-410c" to
the same worker. 2. But, the command in ssh(iperf server) will run on another machine, while command in docker container(iperf client) run on the same machine of worker I think. I'm not sure you mean 1 or 2?
On Mon, Jun 28, 2021 at 10:34 PM Remi Duraffort < remi.duraffort@linaro.org> wrote:
Hello,
Le lun. 28 juin 2021 à 08:26, Hedy Lamarr lamarrhedy97@gmail.com a écrit :
Hello,
What additional I need to afford to debug this issue?
Thanks, Hedy Lamarr
On Thu, Jun 17, 2021 at 4:34 PM Hedy Lamarr lamarrhedy97@gmail.com wrote:
YES, to make it clear, I restart the lava server just now and give you a full log when that multinode job run:
2021-06-17 09:16:07,428 INFO [INIT] LAVA coordinator has started. 2021-06-17 09:16:07,757 INFO [INIT] Version 2021.03 2021-06-17 09:16:07,757 INFO [INIT] Loading configuration from /etc/lava-coordinator/lava-coordinator.conf 2021-06-17 09:16:08,076 INFO [BTSP] binding to 0.0.0.0:3079 2021-06-17 09:16:08,076 INFO Ready to accept new connections 2021-06-17 09:17:23,603 INFO The decbbfe5-f3be-4e6c-a2b8-5744eabfe8a7 group will contain 2 nodes. 2021-06-17 09:17:23,603 INFO Waiting for 1 more clients to connect to decbbfe5-f3be-4e6c-a2b8-5744eabfe8a7 group 2021-06-17 09:17:23,603 INFO Ready to accept new connections 2021-06-17 09:17:23,790 INFO Group complete, starting tests 2021-06-17 09:17:23,790 INFO Ready to accept new connections 2021-06-17 09:17:26,613 INFO Group complete, starting tests 2021-06-17 09:17:26,613 INFO Ready to accept new connections 2021-06-17 09:18:03,522 DEBUG clear Group Data: 1 of 2 2021-06-17 09:18:03,522 INFO Ready to accept new connections 2021-06-17 09:18:06,001 DEBUG clear Group Data: 2 of 2 2021-06-17 09:18:06,001 DEBUG Clearing group data for decbbfe5-f3be-4e6c-a2b8-5744eabfe8a7 2021-06-17 09:18:06,001 INFO Ready to accept new connections 2021-06-17 09:24:43,620 INFO The 8956d8e7-1097-43e0-95dd-7afc61b2908b group will contain 2 nodes. 2021-06-17 09:24:43,620 INFO Waiting for 1 more clients to connect to 8956d8e7-1097-43e0-95dd-7afc61b2908b group 2021-06-17 09:24:43,620 INFO Ready to accept new connections 2021-06-17 09:24:43,871 INFO Group complete, starting tests 2021-06-17 09:24:43,871 INFO Ready to accept new connections 2021-06-17 09:24:46,634 INFO Group complete, starting tests 2021-06-17 09:24:46,634 INFO Ready to accept new connections 2021-06-17 09:25:45,746 INFO lava_send: {'port': 3079, 'blocksize': 4096, 'poll_delay': 3, 'host': '10.191.253.109', 'hostname': 'lavaslave1', 'client_name': '3077', 'group_name': '8956d8e7-1097-43e0-95dd-7afc61b2908b', 'role': 'host', 'request': 'lava_send', 'messageID': 'server_ready', 'message': {}} 2021-06-17 09:25:45,747 INFO lavaSend handler in Coordinator received a messageID 'server_ready' for group '8956d8e7-1097-43e0-95dd-7afc61b2908b' from 3077 2021-06-17 09:25:45,747 DEBUG message ID server_ready {"3077": {}} for 3077 2021-06-17 09:25:45,747 DEBUG broadcast ID server_ready {"3077": {}} for 3076 2021-06-17 09:25:45,747 DEBUG broadcast ID server_ready {"3077": {}} for 3077 2021-06-17 09:25:45,747 INFO Ready to accept new connections
This log similar to the log I saw on web, the "SSH device" with lava_send looks ok, but "dragonboard device for android test" with lava_wait looks not ok, it's just hung. From above log, looks the coordinator did not receive anything?
For what I see in the logs, lava-coordinator is not receiving any signal from the second test.
Are both devices on the same dispatcher/worker?
On Thu, Jun 17, 2021 at 4:02 PM Remi Duraffort <
remi.duraffort@linaro.org> wrote:
> > > Le jeu. 17 juin 2021 à 09:11, Hedy Lamarr lamarrhedy97@gmail.com > a écrit : > >> The output is: >> >> service lava-coordinator status >> ● lava-coordinator.service - LAVA coordinator >> Loaded: loaded (/lib/systemd/system/lava-coordinator.service; >> enabled; vendor preset: enabled) >> Active: active (running) since Fri 2021-06-04 18:09:19 CET; 1 >> weeks 5 days ago >> Main PID: 629 (lava-coordinato) >> Tasks: 1 (limit: 4915) >> Memory: 7.4M >> CGroup: /system.slice/lava-coordinator.service >> └─629 /usr/bin/python3 /usr/bin/lava-coordinator >> --loglevel DEBUG >> > > So it's working. > > Is it listening on 10.191.253.109:3079 ? > Do you have anything in the lava-coordinator logs? > (/var/log/lava-coordinator.log) > > > >> On Thu, Jun 17, 2021 at 3:05 PM Remi Duraffort < >> remi.duraffort@linaro.org> wrote: >> >>> >>> >>> Le jeu. 17 juin 2021 à 09:02, Hedy Lamarr lamarrhedy97@gmail.com >>> a écrit : >>> >>>> Hello Remi, >>>> >>>> I think lava-coordinator is running. >>>> >>>> Because there are 2 devices here: >>>> Device1: dragonboard-410c, when lava-wait server_ready, it hangs >>>> with above log. >>>> Device2: ssh, when lava-send server_ready, it shows: Connecting >>>> to LAVA Coordinator on 10.191.253.109:3079 timeout=300 seconds. >>>> >>>> Would it be possible that lava-coordinator just works for ssh, >>>> but not for dragonboard-410c? >>>> Also I think the netstat, it shows: >>>> tcp 0 0 0.0.0.0:3079 0.0.0.0:* >>>> LISTEN 629/python3 off (0.00/0/0) >>>> Does this mean coordinator running? Or how can I make sure >>>> coordinator running? >>>> >>> >>> service lava-coordinator status >>> >>> >>>> >>>> Thanks, >>>> Hedy Lamarr >>>> >>>> On Thu, Jun 17, 2021 at 2:33 PM Remi Duraffort < >>>> remi.duraffort@linaro.org> wrote: >>>> >>>>> Hello, >>>>> >>>>> do you have lava-coordinator running? >>>>> >>>>> Le lun. 14 juin 2021 à 14:29, Hedy Lamarr < >>>>> lamarrhedy97@gmail.com> a écrit : >>>>> >>>>>> By the way, we use 2021.03.post1. >>>>>> >>>>>> On Wed, Jun 9, 2021 at 10:40 AM Hedy Lamarr < >>>>>> lamarrhedy97@gmail.com> wrote: >>>>>> >>>>>>> Dear community, >>>>>>> >>>>>>> We are new to lava and try to use lava in our android test. We >>>>>>> have issues when test iperf. >>>>>>> >>>>>>> Job: >>>>>>> >>>>>>> job_name: android iperf test >>>>>>> timeouts: >>>>>>> job: >>>>>>> minutes: 10080 >>>>>>> action: >>>>>>> minutes: 120 >>>>>>> connection: >>>>>>> minutes: 5 >>>>>>> priority: medium >>>>>>> visibility: public >>>>>>> protocols: >>>>>>> lava-multinode: >>>>>>> roles: >>>>>>> device: >>>>>>> count: 1 >>>>>>> device_type: dragonboard-410c >>>>>>> timeout: >>>>>>> minutes: 5 >>>>>>> host: >>>>>>> count: 1 >>>>>>> device_type: ssh >>>>>>> timeout: >>>>>>> minutes: 5 >>>>>>> context: >>>>>>> ssh_host: localhost >>>>>>> ssh_user: root >>>>>>> ssh_port: 22 >>>>>>> ssh_identity_file: /root/.ssh/id_rsa >>>>>>> actions: >>>>>>> - deploy: >>>>>>> role: >>>>>>> - host >>>>>>> timeout: >>>>>>> minutes: 2 >>>>>>> to: ssh >>>>>>> os: debian >>>>>>> - boot: >>>>>>> role: >>>>>>> - host >>>>>>> method: ssh >>>>>>> connection: ssh >>>>>>> prompts: >>>>>>> - '@labpc1' >>>>>>> - test: >>>>>>> role: >>>>>>> - host >>>>>>> timeout: >>>>>>> minutes: 120 >>>>>>> definitions: >>>>>>> - from: inline >>>>>>> name: smoke-case >>>>>>> path: inline/test.yaml >>>>>>> repository: >>>>>>> metadata: >>>>>>> format: Lava-Test Test Definition >>>>>>> name: smoke >>>>>>> description: Run smoke case >>>>>>> run: >>>>>>> steps: >>>>>>> - sleep 60 >>>>>>> - lava-send "server_ready" >>>>>>> - iperf -s -V -P 1 >>>>>>> - test: >>>>>>> role: >>>>>>> - device >>>>>>> definitions: >>>>>>> - from: inline >>>>>>> name: cts_cts-media_test >>>>>>> path: inline/cts_cts-media_test.yaml >>>>>>> repository: >>>>>>> metadata: >>>>>>> description: cts cts-media test run >>>>>>> format: Lava-Test Test Definition 1.0 >>>>>>> name: cts-cts-media-test-run >>>>>>> run: >>>>>>> steps: >>>>>>> - adb wait-for-device >>>>>>> - adb devices >>>>>>> - adb root >>>>>>> - adb wait-for-device >>>>>>> - adb devices >>>>>>> - lava-wait "server_ready" >>>>>>> - sleep 3 >>>>>>> - lava-test-case "Case1" --shell adb shell >>>>>>> /data/local/iperf -c 10.191.253.21 -t 10 >>>>>>> docker: >>>>>>> image: terceiro/android-platform-tools >>>>>>> timeout: >>>>>>> minutes: 4200 >>>>>>> >>>>>>> The job log for dragonboard-410c is: >>>>>>> + lava-wait server_ready >>>>>>> <LAVA_WAIT_DEBUG preparing Wed Jun 8 10:07:22 CST 2021> >>>>>>> <LAVA_WAIT_DEBUG started Wed Jun 8 10:07:22 CST 2021> >>>>>>> <LAVA_MULTI_NODE> <LAVA_WAIT server_ready> >>>>>>> <LAVA_WAIT_DEBUG finished Wed Jun 8 10:07:22 CST 2021> >>>>>>> <LAVA_WAIT_DEBUG finished Wed Jun 8 10:07:22 CST 2021> >>>>>>> <LAVA_WAIT_DEBUG starting to wait Wed Jun 8 10:07:22 CST >>>>>>> 2021> >>>>>>> NOTE: it looks hung at this step, the job can't continue. >>>>>>> >>>>>>> The job log for ssh is: >>>>>>> + lava-send server_ready >>>>>>> <LAVA_SEND_DEBUG lava_multi_node_send preparing Wed Jun 8 >>>>>>> 10:07:53 CST 2021> >>>>>>> <LAVA_SEND_DEBUG _lava_multi_node_send started Wed Jun 8 >>>>>>> 10:07:53 CST 2021> >>>>>>> <LAVA_MULTI_NODE> <LAVA_SEND server_ready> >>>>>>> Received Multi_Node API <LAVA_SEND> >>>>>>> messageID: SEND-server_ready >>>>>>> lava-multinode lava-send >>>>>>> Handling signal <LAVA_SEND {"request": "lava_send", >>>>>>> "messageID": "server_ready", "message": {}, "timeout": 300}> >>>>>>> Setting poll timeout of 300 seconds >>>>>>> requesting lava_send server_ready >>>>>>> message: {} >>>>>>> requesting lava_send server_ready with args {} >>>>>>> request_send server_ready {} >>>>>>> Sending {'request': 'lava_send', 'messageID': 'server_ready', >>>>>>> 'message': {}} >>>>>>> final message: {"port": 3079, "blocksize": 4096, "poll_delay": >>>>>>> 3, "host": "10.191.253.109", "hostname": "lavaslave1", "client_name": >>>>>>> "3035", "group_name": "8a362e2a-6ee9-4f48-bddb-378ac2425f06", "role": >>>>>>> "host", "request": "lava_send", "messageID": "server_ready", "message": {}} >>>>>>> Connecting to LAVA Coordinator on 10.191.253.109:3079 >>>>>>> timeout=300 seconds. >>>>>>> case: multinode-send-server_ready >>>>>>> case_id: 39177 >>>>>>> definition: 0_smoke-case >>>>>>> result: pass >>>>>>> <LAVA_SEND_DEBUG _lava_multi_node_send finished Wed Jun 8 >>>>>>> 10:07:53 CST 2021> >>>>>>> <LAVA_SEND_DEBUG lava_multi_node_send finished Wed Jun 8 >>>>>>> 10:07:53 CST 2021> >>>>>>> + iperf -s -V -P 1 >>>>>>> ------------------------------------------------------------ >>>>>>> Server listening on TCP port 5001 >>>>>>> TCP window size: 85.3 KByte (default) >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> It seems that one node can't receive server-ready from another >>>>>>> node, what's wrong with my job? Please help! >>>>>>> >>>>>>> Thanks, >>>>>>> Hedy Lamarr >>>>>>> >>>>>>> _______________________________________________ >>>>>> Lava-users mailing list >>>>>> Lava-users@lists.lavasoftware.org >>>>>> https://lists.lavasoftware.org/mailman/listinfo/lava-users >>>>>> >>>>> >>>>> >>>>> -- >>>>> Rémi Duraffort >>>>> TuxArchitect >>>>> Linaro >>>>> >>>> >>> >>> -- >>> Rémi Duraffort >>> TuxArchitect >>> Linaro >>> >> > > -- > Rémi Duraffort > TuxArchitect > Linaro >
-- Rémi Duraffort TuxArchitect Linaro
-- Rémi Duraffort TuxArchitect Linaro