Hi Team,
I could able to find the root cause of the issue just giving my observation :
1. I deleted a `cancelling` job with the ID 3834 from the GUI. 2. And for the next test run its giving an error log under worker like this .
2022-02-24 01:18:57,502 ERROR [3834] -> server error: code 503 2022-02-24 01:18:57,502 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) 2022-02-24 01:19:16,795 INFO PING => server 2022-02-24 01:19:17,268 INFO [3834] FINISHED => server 2022-02-24 01:19:18,666 ERROR [3834] -> server error: code 404 2022-02-24 01:19:18,666 DEBUG [3834] --> {"error": "Unknown job '3834'"} 2022-02-24 01:19:36,797 INFO PING => server 2022-02-24 01:19:37,274 INFO [3834] FINISHED => server 2022-02-24 01:19:37,509 ERROR [3834] -> server error: code 503 2022-02-24 01:19:37,509 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
So can you people help me what this behaviour is for and a solution too ? Your response will be a great help for me.
Regards Sarath P T From: P T, Sarath Sent: 23 February 2022 14:26 To: lava-users@lists.lavasoftware.org Subject: Job is not exiting after the timeout
Hi Team,
Facing an issue related to the LAVA job timeout. Job is not exiting even after the timeout , its continuously loading after the job completion. Below I am giving my observations:
1. A LAVA job has executed with the job id : 3959
2. And under the worker I am getting the log as shown below : 2022-02-22 23:19:09,534 DEBUG [3959] dispatch: None
2022-02-22 23:19:09,535 DEBUG [3959] env : {'purge': True, 'overrides': {'LC_ALL': 'C.UTF-8', 'LANG': 'C', 'PATH': '/usr/local/bin:/usr/local/sbin:/bin:/usr/bin:/usr/sbin:/sbin'}}
2022-02-22 23:19:09,535 DEBUG [3959] env-dut : None
2022-02-22 23:19:09,584 INFO [3959] RUNNING => server
2022-02-22 23:19:09,850 INFO [3834] FINISHED => server
2022-02-22 23:19:10,087 ERROR [3834] -> server error: code 503
2022-02-22 23:19:10,088 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2022-02-22 23:19:28,651 INFO PING => server
2022-02-22 23:19:29,142 INFO [3834] FINISHED => server
2022-02-22 23:19:29,775 ERROR [3834] -> server error: code 503
2022-02-22 23:19:29,776 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2022-02-22 23:19:48,652 INFO PING => server
2022-02-22 23:19:49,146 INFO [3834] FINISHED => server
2022-02-22 23:19:49,391 ERROR [3834] -> server error: code 503
2022-02-22 23:19:49,391 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2022-02-22 23:20:08,653 INFO PING => server
2022-02-22 23:20:09,171 INFO [3959] running -> finished
2022-02-22 23:20:09,224 INFO [3834] FINISHED => server
2022-02-22 23:20:09,525 ERROR [3834] -> server error: code 503
2022-02-22 23:20:09,525 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2022-02-22 23:20:28,654 INFO PING => server
2022-02-22 23:20:29,147 INFO [3834] FINISHED => server
2022-02-22 23:20:29,756 ERROR [3834] -> server error: code 503
2022-02-22 23:20:29,756 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Seems after the job execution somewhere the communication between server and worker is getting interrupted. It will be a great help if you people are looking into the issue and hope you will find the solution for the same.
Regards Sarath P T
On Thu, Feb 24, 2022 at 09:40:22AM +0000, P T, Sarath wrote:
Hi Team,
I could able to find the root cause of the issue just giving my observation :
- I deleted a `cancelling` job with the ID 3834 from the GUI.
- And for the next test run its giving an error log under worker like this .
2022-02-24 01:18:57,502 ERROR [3834] -> server error: code 503 2022-02-24 01:18:57,502 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) 2022-02-24 01:19:16,795 INFO PING => server 2022-02-24 01:19:17,268 INFO [3834] FINISHED => server 2022-02-24 01:19:18,666 ERROR [3834] -> server error: code 404 2022-02-24 01:19:18,666 DEBUG [3834] --> {"error": "Unknown job '3834'"} 2022-02-24 01:19:36,797 INFO PING => server 2022-02-24 01:19:37,274 INFO [3834] FINISHED => server 2022-02-24 01:19:37,509 ERROR [3834] -> server error: code 503 2022-02-24 01:19:37,509 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Is the server receiving the connections normally? If you look at the server logs (apache and/or gunicorn) there should be corresponding error messages in there telling you what went wrong.
Hi Antonio,
These are the logs for the server connection:
Worker side log ( /var/log/lava-dispatcher/lava-worker.log ) ------------------------------------------------------------
2022-02-24 05:56:58,718 INFO [3834] FINISHED => server 2022-02-24 05:57:01,233 ERROR [3834] -> server error: code 404 2022-02-24 05:57:01,233 DEBUG [3834] --> {"error": "Unknown job '3834'"} 2022-02-24 05:57:18,246 INFO PING => server 2022-02-24 05:57:18,729 INFO [3834] FINISHED => server 2022-02-24 05:57:18,965 ERROR [3834] -> server error: code 503 2022-02-24 05:57:18,965 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) 2022-02-24 05:57:38,248 INFO PING => server 2022-02-24 05:57:38,737 INFO [3834] FINISHED => server 2022-02-24 05:57:38,977 ERROR [3834] -> server error: code 503 2022-02-24 05:57:38,977 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) 2022-02-24 05:57:58,250 INFO PING => server 2022-02-24 05:57:58,731 INFO [3834] FINISHED => server 2022-02-24 05:57:58,968 ERROR [3834] -> server error: code 503 2022-02-24 05:57:58,969 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) 2022-02-24 05:58:18,252 INFO PING => server 2022-02-24 05:58:18,745 INFO [3834] FINISHED => server 2022-02-24 05:58:21,739 ERROR [3834] -> server error: code 502 2022-02-24 05:58:21,740 DEBUG [3834] --> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Bad Gateway</title> </head><body> <h1>Bad Gateway</h1> <p>The proxy server received an invalid response from an upstream server.<br /> </p> <hr> <address>Apache/2.4.38 (Debian) Server at 132.186.71.148 Port 80</address> </body></html>
2022-02-24 05:58:38,253 INFO PING => server 2022-02-24 05:58:38,735 INFO [3834] FINISHED => server 2022-02-24 05:58:38,971 ERROR [3834] -> server error: code 503 2022-02-24 05:58:38,971 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) 2022-02-24 05:58:58,254 INFO PING => server 2022-02-24 05:58:58,738 INFO [3834] FINISHED => server 2022-02-24 05:58:58,973 ERROR [3834] -> server error: code 503 2022-02-24 05:58:58,973 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) 2022-02-24 05:59:18,256 INFO PING => server
Server side log ( /var/log/apache2/lava-server.log ) ------------------------------------------------------
134.86.62.69 - - [24/Feb/2022:19:39:46 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10" ::1 - - [24/Feb/2022:19:39:46 +0530] "POST /scheduler/internal/v1/workers/ HTTP/1.1" 400 68338 "-" "lava-worker 2021.10" [Thu Feb 24 19:39:46.711251 2022] [proxy:warn] [pid 9108:tid 140199652738816] [client 134.86.62.139:42968] AH01144: No protocol handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule. 134.86.62.139 - - [24/Feb/2022:19:39:46 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10" [Thu Feb 24 19:39:47.054716 2022] [proxy:warn] [pid 9151:tid 140199132653312] [client 134.86.61.20:43200] AH01144: No protocol handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule. 134.86.61.20 - - [24/Feb/2022:19:39:47 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10" [Thu Feb 24 19:39:47.919417 2022] [proxy:warn] [pid 9108:tid 140200256718592] [client 134.86.62.69:45566] AH01144: No protocol handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule. 134.86.62.69 - - [24/Feb/2022:19:39:47 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10" [Thu Feb 24 19:39:48.202295 2022] [proxy:warn] [pid 9151:tid 140199661131520] [client 134.86.62.139:42970] AH01144: No protocol handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule. 134.86.62.139 - - [24/Feb/2022:19:39:48 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10" [Thu Feb 24 19:39:48.515377 2022] [proxy:warn] [pid 9108:tid 140200655480576] [client 134.86.61.20:43202] AH01144: No protocol handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule. 134.86.61.20 - - [24/Feb/2022:19:39:48 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10"
Server side log ( /var/log/lava-server/gunicorn.log ) --------------------------------------------------------
[2022-02-24 14:02:17 +0000] [704] [DEBUG] GET /scheduler/internal/v1/workers/slll-worker-testing/ [2022-02-24 14:02:18 +0000] [704] [DEBUG] POST /scheduler/internal/v1/workers/ [2022-02-24 14:02:19 +0000] [704] [DEBUG] GET /scheduler/internal/v1/workers/bng-test-worker/ [2022-02-24 14:02:20 +0000] [722] [DEBUG] GET /scheduler/internal/v1/workers/Test-worker/ [2022-02-24 14:02:20 +0000] [704] [DEBUG] POST /scheduler/internal/v1/jobs/3879/ [2022-02-24 14:02:23 +0000] [722] [DEBUG] POST /scheduler/internal/v1/workers/ [2022-02-24 14:02:28 +0000] [721] [DEBUG] POST /scheduler/internal/v1/workers/ [2022-02-24 14:02:29 +0000] [704] [DEBUG] GET /scheduler/job/3966/job_status [2022-02-24 14:02:29 +0000] [721] [DEBUG] GET /scheduler/job/3966/log_pipeline_incremental [2022-02-24 14:02:33 +0000] [704] [DEBUG] POST /scheduler/internal/v1/workers/ [2022-02-24 14:02:37 +0000] [704] [DEBUG] GET /scheduler/internal/v1/workers/slll-worker-testing/ [2022-02-24 14:02:38 +0000] [720] [DEBUG] POST /scheduler/internal/v1/workers/ [2022-02-24 14:02:38 +0000] [704] [DEBUG] POST /scheduler/internal/v1/jobs/3834/ [2022-02-24 14:02:39 +0000] [704] [DEBUG] GET /scheduler/internal/v1/workers/bng-test-worker/ [2022-02-24 14:02:40 +0000] [704] [DEBUG] GET /scheduler/internal/v1/workers/Test-worker/ [2022-02-24 14:02:43 +0000] [722] [DEBUG] POST /scheduler/internal/v1/workers/ [2022-02-24 14:02:48 +0000] [722] [DEBUG] POST /scheduler/internal/v1/workers/
Regards Sarath P T
-----Original Message----- From: Antonio Terceiro [mailto:antonio.terceiro@linaro.org] Sent: 24 February 2022 18:37 To: P T, Sarath Sarath_PT@mentor.com Cc: lava-users@lists.lavasoftware.org Subject: Re: [Lava-users] Re: Job is not exiting after the timeout
On Thu, Feb 24, 2022 at 09:40:22AM +0000, P T, Sarath wrote:
Hi Team,
I could able to find the root cause of the issue just giving my observation :
- I deleted a `cancelling` job with the ID 3834 from the GUI.
- And for the next test run its giving an error log under worker like this .
2022-02-24 01:18:57,502 ERROR [3834] -> server error: code 503 2022-02-24 01:18:57,502 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) 2022-02-24 01:19:16,795 INFO PING => server 2022-02-24 01:19:17,268 INFO [3834] FINISHED => server 2022-02-24 01:19:18,666 ERROR [3834] -> server error: code 404 2022-02-24 01:19:18,666 DEBUG [3834] --> {"error": "Unknown job '3834'"} 2022-02-24 01:19:36,797 INFO PING => server 2022-02-24 01:19:37,274 INFO [3834] FINISHED => server 2022-02-24 01:19:37,509 ERROR [3834] -> server error: code 503 2022-02-24 01:19:37,509 DEBUG [3834] --> ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Is the server receiving the connections normally? If you look at the server logs (apache and/or gunicorn) there should be corresponding error messages in there telling you what went wrong.
On Fri, Feb 25, 2022 at 05:10:58AM +0000, P T, Sarath wrote:
Server side log ( /var/log/apache2/lava-server.log )
134.86.62.69 - - [24/Feb/2022:19:39:46 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10" ::1 - - [24/Feb/2022:19:39:46 +0530] "POST /scheduler/internal/v1/workers/ HTTP/1.1" 400 68338 "-" "lava-worker 2021.10" [Thu Feb 24 19:39:46.711251 2022] [proxy:warn] [pid 9108:tid 140199652738816] [client 134.86.62.139:42968] AH01144: No protocol handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule. 134.86.62.139 - - [24/Feb/2022:19:39:46 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10" [Thu Feb 24 19:39:47.054716 2022] [proxy:warn] [pid 9151:tid 140199132653312] [client 134.86.61.20:43200] AH01144: No protocol handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule. 134.86.61.20 - - [24/Feb/2022:19:39:47 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10" [Thu Feb 24 19:39:47.919417 2022] [proxy:warn] [pid 9108:tid 140200256718592] [client 134.86.62.69:45566] AH01144: No protocol handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule. 134.86.62.69 - - [24/Feb/2022:19:39:47 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10" [Thu Feb 24 19:39:48.202295 2022] [proxy:warn] [pid 9151:tid 140199661131520] [client 134.86.62.139:42970] AH01144: No protocol handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule. 134.86.62.139 - - [24/Feb/2022:19:39:48 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10" [Thu Feb 24 19:39:48.515377 2022] [proxy:warn] [pid 9108:tid 140200655480576] [client 134.86.61.20:43202] AH01144: No protocol handler was valid for the URL /ws/ (scheme 'ws'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule. 134.86.61.20 - - [24/Feb/2022:19:39:48 +0530] "GET /ws/ HTTP/1.1" 500 804 "-" "lava-worker 2021.10"
Your apache is not configured correctly, you are probably missing enabling mod_proxy and/or mod_proxy_http. See https://master.lavasoftware.org/static/docs/v2/installing_on_debian.html#pro...
lava-users@lists.lavasoftware.org