Hi all,
I am using a policy client/server setup which works perfectly anywhere from 1-10 hours. However, eventually, I get the following error on the client’s at the same time:
Traceback (most recent call last):
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
urllib3\connection.py", line 169, in _new_conn
conn = connection.create_connection(
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
urllib3\util\connection.py", line 96, in create_connection
raise err
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
urllib3\util\connection.py", line 86, in create_connection
sock.connect(sa)
TimeoutError: [WinError 10060] A connection attempt failed because the connected
party did not properly respond after a period of time, or established connectio
n failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
urllib3\connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
urllib3\connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
urllib3\connection.py", line 234, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\http\client.py
", line 1252, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\http\client.py
", line 1298, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\http\client.py
", line 1247, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\http\client.py
", line 1007, in _send_output
self.send(msg)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\http\client.py
", line 947, in send
self.connect()
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
urllib3\connection.py", line 200, in connect
conn = self._new_conn()
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
urllib3\connection.py", line 181, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object
at 0x00000000312110D0>: Failed to establish a new connection: [WinError 10060]
A connection attempt failed because the connected party did not properly respond
after a period of time, or established connection failed because connected host
has failed to respond
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
requests\adapters.py", line 439, in send
resp = conn.urlopen(
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
urllib3\connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
urllib3\util\retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='192.168.0.18', port=5
5556): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.
connection.HTTPConnection object at 0x00000000312110D0>: Failed to establish a n
ew connection: [WinError 10060] A connection attempt failed because the connecte
d party did not properly respond after a period of time, or established connecti
on failed because connected host has failed to respond'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "policy_client.py", line 138, in <module>
action = client.get_action(episode_id=episode_id, observation=gameObservatio
n)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
ray\rllib\env\policy_client.py", line 129, in get_action
return self._send({
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
ray\rllib\env\policy_client.py", line 222, in _send
response = requests.post(self.address, data=payload)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
requests\api.py", line 117, in post
return request('post', url, data=data, json=json, **kwargs)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\Denys\AppData\Local\Programs\Python\Python38\lib\site-packages\
requests\adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='192.168.0.18', por
t=55556): Max retries exceeded with url: / (Caused by NewConnectionError('<urlli
b3.connection.HTTPConnection object at 0x00000000312110D0>: Failed to establish
a new connection: [WinError 10060] A connection attempt failed because the conne
cted party did not properly respond after a period of time, or established conne
ction failed because connected host has failed to respond'))
As far as I can tell, this happens on different models, different numbers of successful iterations. Moreover, I doubt its because the policy server (ppo_trainer) is blocked doing an iteration because the reported learning time is about ~5 seconds.
Any ideas what might be causing this?
The policy server is running on win10, python 3.8. Lates dev rrllib wheel.