RLlib evaluation rollout: socket.gaierror [Errno -2] Name or service not known

  • High: It blocks me to complete my task.


(raylet) Traceback (most recent call last):
(raylet)   File "/usr/local/lib/python3.6/dist-packages/ray/dashboard/agent.py", line 391, in <module>
(raylet)     loop.run_until_complete(agent.run())
(raylet)   File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
(raylet)     return future.result()
(raylet)   File "/usr/local/lib/python3.6/dist-packages/ray/dashboard/agent.py", line 178, in run
(raylet)     modules = self._load_modules()
(raylet)   File "/usr/local/lib/python3.6/dist-packages/ray/dashboard/agent.py", line 120, in _load_modules
(raylet)     c = cls(self)
(raylet)   File "/usr/local/lib/python3.6/dist-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 163, in __init__
(raylet)     dashboard_agent.metrics_export_port)
(raylet)   File "/usr/local/lib/python3.6/dist-packages/ray/_private/metrics_agent.py", line 79, in __init__
(raylet)     address=metrics_export_address)))
(raylet)   File "/usr/local/lib/python3.6/dist-packages/ray/_private/prometheus_exporter.py", line 333, in new_stats_exporter
(raylet)     options=option, gatherer=option.registry, collector=collector)
(raylet)   File "/usr/local/lib/python3.6/dist-packages/ray/_private/prometheus_exporter.py", line 265, in __init__
(raylet)     self.serve_http()
(raylet)   File "/usr/local/lib/python3.6/dist-packages/ray/_private/prometheus_exporter.py", line 320, in serve_http
(raylet)     port=self.options.port, addr=str(self.options.address))
(raylet)   File "/usr/local/lib/python3.6/dist-packages/prometheus_client/exposition.py", line 168, in start_wsgi_server
(raylet)     TmpServer.address_family, addr = _get_best_family(addr, port)
(raylet)   File "/usr/local/lib/python3.6/dist-packages/prometheus_client/exposition.py", line 157, in _get_best_family
(raylet)     infos = socket.getaddrinfo(address, port)
(raylet)   File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
(raylet)     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
(raylet) socket.gaierror: [Errno -2] Name or service not known
(raylet) During handling of the above exception, another exception occurred:
(raylet) Traceback (most recent call last):
(raylet)   File "/usr/local/lib/python3.6/dist-packages/ray/dashboard/agent.py", line 407, in <module>
(raylet)     gcs_publisher = GcsPublisher(args.gcs_address)
(raylet) TypeError: __init__() takes 1 positional argument but 2 were given

This happens when I use RLlib’s evaluate() script (ray/evaluate.py at master · ray-project/ray · GitHub) to perform rollouts and save their results.

Let test.py be a modified version of evaluate.py that registers custom environment. When I run ./test.py --run PPO --env “pheromone_env” --episodes 10 --out rollout.pkl, my rollout reader script:

import ray
import ray.cloudpickle as cloudpickle

objects = []
with (open("rollout.pkl", "rb")) as openfile:
    while True:
        except EOFError:


returns [[[]]]
(empty). Do I have to modify the script for it to actually save something? Is the error above related to it?

Thank you for any help in advance.


We’ve just started seeing this too. Not specific to RLlib, but when launching Ray in general on Ubuntu.

Was there any feedback or responses from Ray committers?

I should add: we see this exception trace too when we try to use memory placement groups. First saw it with Ray 1.11 on Ubuntu, then we tried upgrading to Ray 1.13 – but still get this same error.

PS: does this have any relation to recent build errors that require a ​protobuf rollback to v3.20 ? That’s the only dependency in our build which appeared to change, concurrent with this error.