Hi Team,
I had a Query which is particularly to use lava-test-shell or other
binaries like lava-test-runner.
My board is booted with Linux and It has a POSIX environment but it doesn't
support either ssh/nfs due to the low memory footprint available and the
ethernet driver not fully functional.
To test/run my test-suite drivers How can I use lava-test-runner/
lava-test-shell ? Is it possible to test our suite using
lava-test-shell/runner where DUT doesn't have the ethernet/nfs support.
I am getting lava-test-shell timeout on the DUT console whereas Lava-worker
had all the binaries available by lava-overlay method.
Please find the attached test job definition/lava-job log files for your
reference. Kindly let me know the solution.
Hi.
It looks like I am facing the same problem and the job does not exist even
after the timeout. .
I guess there might be communication gap between the Dispatcher and server.
Dispatcher log screenshot: (/var/log/lava-dispatcher/lava-worker.log)
######################
[image: image.png]
any solution to resolve this?
Regards,
Koti
On Sat, 26 Feb 2022 at 05:30, <lava-users-request(a)lists.lavasoftware.org>
wrote:
> Send Lava-users mailing list submissions to
> lava-users(a)lists.lavasoftware.org
>
> To subscribe or unsubscribe via email, send a message with subject or
> body 'help' to
> lava-users-request(a)lists.lavasoftware.org
>
> You can reach the person managing the list at
> lava-users-owner(a)lists.lavasoftware.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Lava-users digest..."
>
> Today's Topics:
>
> 1. Re: Job is not exiting after the timeout (P T, Sarath)
> 2. Re: Job is not exiting after the timeout (Antonio Terceiro)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 25 Feb 2022 05:10:58 +0000
> From: "P T, Sarath" <Sarath_PT(a)mentor.com>
> Subject: [Lava-users] Re: Job is not exiting after the timeout
> To: Antonio Terceiro <antonio.terceiro(a)linaro.org>
> Cc: "lava-users(a)lists.lavasoftware.org"
> <lava-users(a)lists.lavasoftware.org>
> Message-ID:
> <7b18ad8ebf54460e935b147659d2da99(a)svr-orw-mbx-01.mgc.mentorg.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Antonio,
>
> These are the logs for the server connection:
>
> Worker side log ( /var/log/lava-dispatcher/lava-worker.log )
> ------------------------------------------------------------
>
> 2022-02-24 05:56:58,718 INFO [3834] FINISHED => server
> 2022-02-24 05:57:01,233 ERROR [3834] -> server error: code 404
> 2022-02-24 05:57:01,233 DEBUG [3834] --> {"error": "Unknown job '3834'"}
> 2022-02-24 05:57:18,246 INFO PING => server
> 2022-02-24 05:57:18,729 INFO [3834] FINISHED => server
> 2022-02-24 05:57:18,965 ERROR [3834] -> server error: code 503
> 2022-02-24 05:57:18,965 DEBUG [3834] --> ('Connection aborted.',
> RemoteDisconnected('Remote end closed connection without response'))
> 2022-02-24 05:57:38,248 INFO PING => server
> 2022-02-24 05:57:38,737 INFO [3834] FINISHED => server
> 2022-02-24 05:57:38,977 ERROR [3834] -> server error: code 503
> 2022-02-24 05:57:38,977 DEBUG [3834] --> ('Connection aborted.',
> RemoteDisconnected('Remote end closed connection without response'))
> 2022-02-24 05:57:58,250 INFO PING => server
> 2022-02-24 05:57:58,731 INFO [3834] FINISHED => server
> 2022-02-24 05:57:58,968 ERROR [3834] -> server error: code 503
> 2022-02-24 05:57:58,969 DEBUG [3834] --> ('Connection aborted.',
> RemoteDisconnected('Remote end closed connection without response'))
> 2022-02-24 05:58:18,252 INFO PING => server
> 2022-02-24 05:58:18,745 INFO [3834] FINISHED => server
> 2022-02-24 05:58:21,739 ERROR [3834] -> server error: code 502
> 2022-02-24 05:58:21,740 DEBUG [3834] --> <!DOCTYPE HTML PUBLIC
> "-//IETF//DTD HTML 2.0//EN">
> <html><head>
> <title>502 Bad Gateway</title>
> </head><body>
> <h1>Bad Gateway</h1>
> <p>The proxy server received an invalid
> response from an upstream server.<br />
> </p>
> <hr>
> <address>Apache/2.4.38 (Debian) Server at 132.186.71.148 Port 80</address>
> </body></html>
>
>
> 2022-02-24 05:58:38,253 INFO PING => server
> 2022-02-24 05:58:38,735 INFO [3834] FINISHED => server
> 2022-02-24 05:58:38,971 ERROR [3834] -> server error: code 503
> 2022-02-24 05:58:38,971 DEBUG [3834] --> ('Connection aborted.',
> RemoteDisconnected('Remote end closed connection without response'))
> 2022-02-24 05:58:58,254 INFO PING => server
> 2022-02-24 05:58:58,738 INFO [3834] FINISHED => server
> 2022-02-24 05:58:58,973 ERROR [3834] -> server error: code 503
> 2022-02-24 05:58:58,973 DEBUG [3834] --> ('Connection aborted.',
> RemoteDisconnected('Remote end closed connection without response'))
> 2022-02-24 05:59:18,256 INFO PING => server
>
>
> Server side log ( /var/log/apache2/lava-server.log )
> ------------------------------------------------------
>
> 134.86.62.69 - - [24/Feb/2022:19:39:46 +0530] "GET /ws/ HTTP/1.1" 500 804
> "-" "lava-worker 2021.10"
> ::1 - - [24/Feb/2022:19:39:46 +0530] "POST /scheduler/internal/v1/workers/
> HTTP/1.1" 400 68338 "-" "lava-worker 2021.10"
> [Thu Feb 24 19:39:46.711251 2022] [proxy:warn] [pid 9108:tid
> 140199652738816] [client 134.86.62.139:42968] AH01144: No protocol
> handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO
> version of mod_proxy, make sure the proxy submodules are included in the
> configuration using LoadModule.
> 134.86.62.139 - - [24/Feb/2022:19:39:46 +0530] "GET /ws/ HTTP/1.1" 500 804
> "-" "lava-worker 2021.10"
> [Thu Feb 24 19:39:47.054716 2022] [proxy:warn] [pid 9151:tid
> 140199132653312] [client 134.86.61.20:43200] AH01144: No protocol handler
> was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of
> mod_proxy, make sure the proxy submodules are included in the configuration
> using LoadModule.
> 134.86.61.20 - - [24/Feb/2022:19:39:47 +0530] "GET /ws/ HTTP/1.1" 500 804
> "-" "lava-worker 2021.10"
> [Thu Feb 24 19:39:47.919417 2022] [proxy:warn] [pid 9108:tid
> 140200256718592] [client 134.86.62.69:45566] AH01144: No protocol handler
> was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of
> mod_proxy, make sure the proxy submodules are included in the configuration
> using LoadModule.
> 134.86.62.69 - - [24/Feb/2022:19:39:47 +0530] "GET /ws/ HTTP/1.1" 500 804
> "-" "lava-worker 2021.10"
> [Thu Feb 24 19:39:48.202295 2022] [proxy:warn] [pid 9151:tid
> 140199661131520] [client 134.86.62.139:42970] AH01144: No protocol
> handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO
> version of mod_proxy, make sure the proxy submodules are included in the
> configuration using LoadModule.
> 134.86.62.139 - - [24/Feb/2022:19:39:48 +0530] "GET /ws/ HTTP/1.1" 500 804
> "-" "lava-worker 2021.10"
> [Thu Feb 24 19:39:48.515377 2022] [proxy:warn] [pid 9108:tid
> 140200655480576] [client 134.86.61.20:43202] AH01144: No protocol handler
> was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of
> mod_proxy, make sure the proxy submodules are included in the configuration
> using LoadModule.
> 134.86.61.20 - - [24/Feb/2022:19:39:48 +0530] "GET /ws/ HTTP/1.1" 500 804
> "-" "lava-worker 2021.10"
>
>
> Server side log ( /var/log/lava-server/gunicorn.log )
> --------------------------------------------------------
>
> [2022-02-24 14:02:17 +0000] [704] [DEBUG] GET
> /scheduler/internal/v1/workers/slll-worker-testing/
> [2022-02-24 14:02:18 +0000] [704] [DEBUG] POST
> /scheduler/internal/v1/workers/
> [2022-02-24 14:02:19 +0000] [704] [DEBUG] GET
> /scheduler/internal/v1/workers/bng-test-worker/
> [2022-02-24 14:02:20 +0000] [722] [DEBUG] GET
> /scheduler/internal/v1/workers/Test-worker/
> [2022-02-24 14:02:20 +0000] [704] [DEBUG] POST
> /scheduler/internal/v1/jobs/3879/
> [2022-02-24 14:02:23 +0000] [722] [DEBUG] POST
> /scheduler/internal/v1/workers/
> [2022-02-24 14:02:28 +0000] [721] [DEBUG] POST
> /scheduler/internal/v1/workers/
> [2022-02-24 14:02:29 +0000] [704] [DEBUG] GET
> /scheduler/job/3966/job_status
> [2022-02-24 14:02:29 +0000] [721] [DEBUG] GET
> /scheduler/job/3966/log_pipeline_incremental
> [2022-02-24 14:02:33 +0000] [704] [DEBUG] POST
> /scheduler/internal/v1/workers/
> [2022-02-24 14:02:37 +0000] [704] [DEBUG] GET
> /scheduler/internal/v1/workers/slll-worker-testing/
> [2022-02-24 14:02:38 +0000] [720] [DEBUG] POST
> /scheduler/internal/v1/workers/
> [2022-02-24 14:02:38 +0000] [704] [DEBUG] POST
> /scheduler/internal/v1/jobs/3834/
> [2022-02-24 14:02:39 +0000] [704] [DEBUG] GET
> /scheduler/internal/v1/workers/bng-test-worker/
> [2022-02-24 14:02:40 +0000] [704] [DEBUG] GET
> /scheduler/internal/v1/workers/Test-worker/
> [2022-02-24 14:02:43 +0000] [722] [DEBUG] POST
> /scheduler/internal/v1/workers/
> [2022-02-24 14:02:48 +0000] [722] [DEBUG] POST
> /scheduler/internal/v1/workers/
>
>
> Regards
> Sarath P T
>
> -----Original Message-----
> From: Antonio Terceiro [mailto:antonio.terceiro@linaro.org]
> Sent: 24 February 2022 18:37
> To: P T, Sarath <Sarath_PT(a)mentor.com>
> Cc: lava-users(a)lists.lavasoftware.org
> Subject: Re: [Lava-users] Re: Job is not exiting after the timeout
>
> On Thu, Feb 24, 2022 at 09:40:22AM +0000, P T, Sarath wrote:
> > Hi Team,
> >
> > I could able to find the root cause of the issue just giving my
> observation :
> >
> > 1. I deleted a `cancelling` job with the ID 3834 from the GUI.
> > 2. And for the next test run its giving an error log under worker like
> this .
> >
> > 2022-02-24 01:18:57,502 ERROR [3834] -> server error: code 503
> > 2022-02-24 01:18:57,502 DEBUG [3834] --> ('Connection aborted.',
> RemoteDisconnected('Remote end closed connection without response'))
> > 2022-02-24 01:19:16,795 INFO PING => server
> > 2022-02-24 01:19:17,268 INFO [3834] FINISHED => server
> > 2022-02-24 01:19:18,666 ERROR [3834] -> server error: code 404
> > 2022-02-24 01:19:18,666 DEBUG [3834] --> {"error": "Unknown job
> '3834'"}
> > 2022-02-24 01:19:36,797 INFO PING => server
> > 2022-02-24 01:19:37,274 INFO [3834] FINISHED => server
> > 2022-02-24 01:19:37,509 ERROR [3834] -> server error: code 503
> > 2022-02-24 01:19:37,509 DEBUG [3834] --> ('Connection aborted.',
> RemoteDisconnected('Remote end closed connection without response'))
>
> Is the server receiving the connections normally? If you look at the
> server logs (apache and/or gunicorn) there should be corresponding error
> messages in there telling you what went wrong.
>
> ------------------------------
>
> Message: 2
> Date: Fri, 25 Feb 2022 10:37:00 -0300
> From: Antonio Terceiro <antonio.terceiro(a)linaro.org>
> Subject: [Lava-users] Re: Job is not exiting after the timeout
> To: "P T, Sarath" <Sarath_PT(a)mentor.com>
> Cc: "lava-users(a)lists.lavasoftware.org"
> <lava-users(a)lists.lavasoftware.org>
> Message-ID: <YhjbfBGnnyO67EIY(a)linaro.org>
> Content-Type: multipart/signed; micalg=pgp-sha256;
> protocol="application/pgp-signature"; boundary="U431ChLU/1f+Fa7u"
>
> On Fri, Feb 25, 2022 at 05:10:58AM +0000, P T, Sarath wrote:
> > Server side log ( /var/log/apache2/lava-server.log )
> > ------------------------------------------------------
> >
> > 134.86.62.69 - - [24/Feb/2022:19:39:46 +0530] "GET /ws/ HTTP/1.1" 500
> 804 "-" "lava-worker 2021.10"
> > ::1 - - [24/Feb/2022:19:39:46 +0530] "POST
> /scheduler/internal/v1/workers/ HTTP/1.1" 400 68338 "-" "lava-worker
> 2021.10"
> > [Thu Feb 24 19:39:46.711251 2022] [proxy:warn] [pid 9108:tid
> 140199652738816] [client 134.86.62.139:42968] AH01144: No protocol
> handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO
> version of mod_proxy, make sure the proxy submodules are included in the
> configuration using LoadModule.
> > 134.86.62.139 - - [24/Feb/2022:19:39:46 +0530] "GET /ws/ HTTP/1.1" 500
> 804 "-" "lava-worker 2021.10"
> > [Thu Feb 24 19:39:47.054716 2022] [proxy:warn] [pid 9151:tid
> 140199132653312] [client 134.86.61.20:43200] AH01144: No protocol handler
> was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of
> mod_proxy, make sure the proxy submodules are included in the configuration
> using LoadModule.
> > 134.86.61.20 - - [24/Feb/2022:19:39:47 +0530] "GET /ws/ HTTP/1.1" 500
> 804 "-" "lava-worker 2021.10"
> > [Thu Feb 24 19:39:47.919417 2022] [proxy:warn] [pid 9108:tid
> 140200256718592] [client 134.86.62.69:45566] AH01144: No protocol handler
> was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of
> mod_proxy, make sure the proxy submodules are included in the configuration
> using LoadModule.
> > 134.86.62.69 - - [24/Feb/2022:19:39:47 +0530] "GET /ws/ HTTP/1.1" 500
> 804 "-" "lava-worker 2021.10"
> > [Thu Feb 24 19:39:48.202295 2022] [proxy:warn] [pid 9151:tid
> 140199661131520] [client 134.86.62.139:42970] AH01144: No protocol
> handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO
> version of mod_proxy, make sure the proxy submodules are included in the
> configuration using LoadModule.
> > 134.86.62.139 - - [24/Feb/2022:19:39:48 +0530] "GET /ws/ HTTP/1.1" 500
> 804 "-" "lava-worker 2021.10"
> > [Thu Feb 24 19:39:48.515377 2022] [proxy:warn] [pid 9108:tid
> 140200655480576] [client 134.86.61.20:43202] AH01144: No protocol handler
> was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of
> mod_proxy, make sure the proxy submodules are included in the configuration
> using LoadModule.
> > 134.86.61.20 - - [24/Feb/2022:19:39:48 +0530] "GET /ws/ HTTP/1.1" 500
> 804 "-" "lava-worker 2021.10"
>
> Your apache is not configured correctly, you are probably missing
> enabling mod_proxy and/or mod_proxy_http. See
>
> https://master.lavasoftware.org/static/docs/v2/installing_on_debian.html#pr…
>
In fact, I didn't see any performance drop because I'm still trying it in a small and initial phase, just want to eliminate any risk when move to debian11, you know if issue happen in production environment, it maybe really not easy to debug.
Back to the question: in lava, if we use debian11 which default cgroupV2 enabled, then when use docker-test-shell, lava will attach a custom BPF Device program to container to replace the default one in docker.
Everything looks fine, just I observed if I use "adb devices" in container, then the trace_pipe will be flushed with next:
```
device poll-7289 [001] d... 103054.767620: bpf_trace_printk: Device access: major = 189, minor = 261
device poll-7289 [001] d... 103055.767851: bpf_trace_printk: Device access: major = 189, minor = 261
device poll-7289 [001] d... 103056.768117: bpf_trace_printk: Device access: major = 189, minor = 261
device poll-7289 [001] d... 103057.768354: bpf_trace_printk: Device access: major = 189, minor = 261
device poll-7289 [001] d... 103058.768590: bpf_trace_printk: Device access: major = 189, minor = 261
device poll-7289 [001] d... 103059.768819: bpf_trace_printk: Device access: major = 189, minor = 261
device poll-7289 [001] d... 103060.769053: bpf_trace_printk: Device access: major = 189, minor = 261
```
Which means that bpf function frequently be called (interval less than 1 second)
On the other hand, if I do next then the BPF prog unregistered from linux kernel, but looks every adb devices still works.
```
/sys/fs/cgroup/system.slice/docker-a9354f54a8c6a56932e15b4d577432abf86c897630d5e94da442474e938bf875.scope
78 device multi lava_docker_dev
$ bpftool cgroup detach /sys/fs/cgroup/system.slice/docker-a9354f54a8c6a56932e15b4d577432abf86c897630d5e94da442474e938bf875.scope device id 78
```
So, I just want to confirm have you guys noticed this behavior, and you confirm this behavior is ok?
(To be honestly, I'm not sure BPF performance if it's frequently be called, so this is just a enquire)
Or, we have better methods handle it in lava?
I need your confirm to decide if I need to downgrade to CGroupV1 when I migrate, thanks!
Regards,
Larry
Hi,
I'm facing an issue after updating the base-uboot file on the server.
*Configuration Error: missing or invalid template.*
*Jobs requesting this device type will not be able to start until a
template is available on the master.*
I have restarted the server and dispatcher but no update. All the devices
went offline automatically.
Any advice to resolve this issue would be appreciated.
Thank you
Hello Team
After facing " Infrastructure ERror: bootloader interrupt ", I have made
changes in the base-uboot.jinja file and restarted the server.
But, it says "*Configuration Error: missing or invalid template.*
Jobs requesting this device type will not be able to start until a template
is available on the master."
Please advise how to fix this.
Thank you