On Thu, Jul 1, 2021 at 4:55 PM Remi Duraffort <remi.duraffort@linaro.org> wrote:

Hello,

by default, lava-coordinator is expected to run on localhost. So the second device will connect to localhost instead of the real host hosting coordinator. Change the host into /etc/lava-coordinator/lava-coordinator.conf

Le mar. 29 juin 2021 à 07:19, Hedy Lamarr <lamarrhedy97@gmail.com> a écrit :
The "ssh" device will ssh to another machine which is not on the same dispatcher to start a iperf server.
The "dragonboard-410c" device will start a docker container, then in this container, it will call iperf client to connect to the iperf server.

1. In lava admin page, I link both "ssh device" & " dragonboard-410c" to the same worker.
2. But, the command in ssh(iperf server) will run on another machine, while command in docker container(iperf client) run on the same machine of worker I think.
I'm not sure you mean 1 or 2?

On Mon, Jun 28, 2021 at 10:34 PM Remi Duraffort <remi.duraffort@linaro.org> wrote:
Hello,

Le lun. 28 juin 2021 à 08:26, Hedy Lamarr <lamarrhedy97@gmail.com> a écrit :
Hello,

What additional I need to afford to debug this issue?

Thanks,
Hedy Lamarr

On Thu, Jun 17, 2021 at 4:34 PM Hedy Lamarr <lamarrhedy97@gmail.com> wrote:
YES, to make it clear, I restart the lava server just now and give you a full log when that multinode job run:

2021-06-17 09:16:07,428 INFO [INIT] LAVA coordinator has started.
2021-06-17 09:16:07,757 INFO [INIT] Version 2021.03
2021-06-17 09:16:07,757 INFO [INIT] Loading configuration from /etc/lava-coordinator/lava-coordinator.conf
2021-06-17 09:16:08,076 INFO [BTSP] binding to 0.0.0.0:3079
2021-06-17 09:16:08,076 INFO Ready to accept new connections
2021-06-17 09:17:23,603 INFO The decbbfe5-f3be-4e6c-a2b8-5744eabfe8a7 group will contain 2 nodes.
2021-06-17 09:17:23,603 INFO Waiting for 1 more clients to connect to decbbfe5-f3be-4e6c-a2b8-5744eabfe8a7 group
2021-06-17 09:17:23,603 INFO Ready to accept new connections
2021-06-17 09:17:23,790 INFO Group complete, starting tests
2021-06-17 09:17:23,790 INFO Ready to accept new connections
2021-06-17 09:17:26,613 INFO Group complete, starting tests
2021-06-17 09:17:26,613 INFO Ready to accept new connections
2021-06-17 09:18:03,522 DEBUG clear Group Data: 1 of 2
2021-06-17 09:18:03,522 INFO Ready to accept new connections
2021-06-17 09:18:06,001 DEBUG clear Group Data: 2 of 2
2021-06-17 09:18:06,001 DEBUG Clearing group data for decbbfe5-f3be-4e6c-a2b8-5744eabfe8a7
2021-06-17 09:18:06,001 INFO Ready to accept new connections
2021-06-17 09:24:43,620 INFO The 8956d8e7-1097-43e0-95dd-7afc61b2908b group will contain 2 nodes.
2021-06-17 09:24:43,620 INFO Waiting for 1 more clients to connect to 8956d8e7-1097-43e0-95dd-7afc61b2908b group
2021-06-17 09:24:43,620 INFO Ready to accept new connections
2021-06-17 09:24:43,871 INFO Group complete, starting tests
2021-06-17 09:24:43,871 INFO Ready to accept new connections
2021-06-17 09:24:46,634 INFO Group complete, starting tests
2021-06-17 09:24:46,634 INFO Ready to accept new connections
2021-06-17 09:25:45,746 INFO lava_send: {'port': 3079, 'blocksize': 4096, 'poll_delay': 3, 'host': '10.191.253.109', 'hostname': 'lavaslave1', 'client_name': '3077', 'group_name': '8956d8e7-1097-43e0-95dd-7afc61b2908b', 'role': 'host', 'request': 'lava_send', 'messageID': 'server_ready', 'message': {}}
2021-06-17 09:25:45,747 INFO lavaSend handler in Coordinator received a messageID 'server_ready' for group '8956d8e7-1097-43e0-95dd-7afc61b2908b' from 3077
2021-06-17 09:25:45,747 DEBUG message ID server_ready {"3077": {}} for 3077
2021-06-17 09:25:45,747 DEBUG broadcast ID server_ready {"3077": {}} for 3076
2021-06-17 09:25:45,747 DEBUG broadcast ID server_ready {"3077": {}} for 3077
2021-06-17 09:25:45,747 INFO Ready to accept new connections

This log similar to the log I saw on web, the "SSH device" with lava_send looks ok, but "dragonboard device for android test" with lava_wait looks not ok, it's just hung. From above log, looks the coordinator did not receive anything?

For what I see in the logs, lava-coordinator is not receiving any signal from the second test.

Are both devices on the same dispatcher/worker?

On Thu, Jun 17, 2021 at 4:02 PM Remi Duraffort <remi.duraffort@linaro.org> wrote:

Le jeu. 17 juin 2021 à 09:11, Hedy Lamarr <lamarrhedy97@gmail.com> a écrit :
The output is:

service lava-coordinator status
● lava-coordinator.service - LAVA coordinator
Loaded: loaded (/lib/systemd/system/lava-coordinator.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-06-04 18:09:19 CET; 1 weeks 5 days ago
Main PID: 629 (lava-coordinato)
Tasks: 1 (limit: 4915)
Memory: 7.4M
CGroup: /system.slice/lava-coordinator.service
└─629 /usr/bin/python3 /usr/bin/lava-coordinator --loglevel DEBUG

So it's working.

Is it listening on 10.191.253.109:3079 ?
Do you have anything in the lava-coordinator logs? (/var/log/lava-coordinator.log)

On Thu, Jun 17, 2021 at 3:05 PM Remi Duraffort <remi.duraffort@linaro.org> wrote:

Le jeu. 17 juin 2021 à 09:02, Hedy Lamarr <lamarrhedy97@gmail.com> a écrit :
Hello Remi,

I think lava-coordinator is running.

Because there are 2 devices here:
Device1: dragonboard-410c, when lava-wait server_ready, it hangs with above log.
Device2: ssh, when lava-send server_ready, it shows: Connecting to LAVA Coordinator on 10.191.253.109:3079 timeout=300 seconds.

Would it be possible that lava-coordinator just works for ssh, but not for dragonboard-410c?
Also I think the netstat, it shows:
tcp 0 0 0.0.0.0:3079 0.0.0.0:* LISTEN 629/python3 off (0.00/0/0)
Does this mean coordinator running? Or how can I make sure coordinator running?

service lava-coordinator status

Thanks,
Hedy Lamarr

On Thu, Jun 17, 2021 at 2:33 PM Remi Duraffort <remi.duraffort@linaro.org> wrote:
Hello,

do you have lava-coordinator running?

Le lun. 14 juin 2021 à 14:29, Hedy Lamarr <lamarrhedy97@gmail.com> a écrit :
By the way, we use 2021.03.post1.

On Wed, Jun 9, 2021 at 10:40 AM Hedy Lamarr <lamarrhedy97@gmail.com> wrote:
Dear community,

We are new to lava and try to use lava in our android test. We have issues when test iperf.

Job:

job_name: android iperf test
timeouts:
job:
minutes: 10080
action:
minutes: 120
connection:
minutes: 5
priority: medium
visibility: public
protocols:
lava-multinode:
roles:
device:
count: 1
device_type: dragonboard-410c
timeout:
minutes: 5
host:
count: 1
device_type: ssh
timeout:
minutes: 5
context:
ssh_host: localhost
ssh_user: root
ssh_port: 22
ssh_identity_file: /root/.ssh/id_rsa
actions:
- deploy:
role:
- host
timeout:
minutes: 2
to: ssh
os: debian
- boot:
role:
- host
method: ssh
connection: ssh
prompts:
- '@labpc1'
- test:
role:
- host
timeout:
minutes: 120
definitions:
- from: inline
name: smoke-case
path: inline/test.yaml
repository:
metadata:
format: Lava-Test Test Definition
name: smoke
description: Run smoke case
run:
steps:
- sleep 60
- lava-send "server_ready"
- iperf -s -V -P 1
- test:
role:
- device
definitions:
- from: inline
name: cts_cts-media_test
path: inline/cts_cts-media_test.yaml
repository:
metadata:
description: cts cts-media test run
format: Lava-Test Test Definition 1.0
name: cts-cts-media-test-run
run:
steps:
- adb wait-for-device
- adb devices
- adb root
- adb wait-for-device
- adb devices
- lava-wait "server_ready"
- sleep 3
- lava-test-case "Case1" --shell adb shell /data/local/iperf -c 10.191.253.21 -t 10
docker:
image: terceiro/android-platform-tools
timeout:
minutes: 4200

The job log for dragonboard-410c is:
+ lava-wait server_ready
<LAVA_WAIT_DEBUG preparing Wed Jun 8 10:07:22 CST 2021>
<LAVA_WAIT_DEBUG started Wed Jun 8 10:07:22 CST 2021>
<LAVA_MULTI_NODE> <LAVA_WAIT server_ready>
<LAVA_WAIT_DEBUG finished Wed Jun 8 10:07:22 CST 2021>
<LAVA_WAIT_DEBUG finished Wed Jun 8 10:07:22 CST 2021>
<LAVA_WAIT_DEBUG starting to wait Wed Jun 8 10:07:22 CST 2021>
NOTE: it looks hung at this step, the job can't continue.

The job log for ssh is:
+ lava-send server_ready
<LAVA_SEND_DEBUG lava_multi_node_send preparing Wed Jun 8 10:07:53 CST 2021>
<LAVA_SEND_DEBUG _lava_multi_node_send started Wed Jun 8 10:07:53 CST 2021>
<LAVA_MULTI_NODE> <LAVA_SEND server_ready>
Received Multi_Node API <LAVA_SEND>
messageID: SEND-server_ready
lava-multinode lava-send
Handling signal <LAVA_SEND {"request": "lava_send", "messageID": "server_ready", "message": {}, "timeout": 300}>
Setting poll timeout of 300 seconds
requesting lava_send server_ready
message: {}
requesting lava_send server_ready with args {}
request_send server_ready {}
Sending {'request': 'lava_send', 'messageID': 'server_ready', 'message': {}}
final message: {"port": 3079, "blocksize": 4096, "poll_delay": 3, "host": "10.191.253.109", "hostname": "lavaslave1", "client_name": "3035", "group_name": "8a362e2a-6ee9-4f48-bddb-378ac2425f06", "role": "host", "request": "lava_send", "messageID": "server_ready", "message": {}}
Connecting to LAVA Coordinator on 10.191.253.109:3079 timeout=300 seconds.
case: multinode-send-server_ready
case_id: 39177
definition: 0_smoke-case
result: pass
<LAVA_SEND_DEBUG _lava_multi_node_send finished Wed Jun 8 10:07:53 CST 2021>
<LAVA_SEND_DEBUG lava_multi_node_send finished Wed Jun 8 10:07:53 CST 2021>
+ iperf -s -V -P 1
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------

It seems that one node can't receive server-ready from another node, what's wrong with my job? Please help!

Thanks,
Hedy Lamarr

_______________________________________________
Lava-users mailing list
Lava-users@lists.lavasoftware.org
https://lists.lavasoftware.org/mailman/listinfo/lava-users

--
Rémi Duraffort
TuxArchitect
Linaro

--
Rémi Duraffort
TuxArchitect
Linaro

--
Rémi Duraffort
TuxArchitect
Linaro

--
Rémi Duraffort
TuxArchitect
Linaro

--
Rémi Duraffort
TuxArchitect
Linaro