Socket.gaierror: [Errno -2] Name or service not known

How severe does this issue affect your experience of using Ray?

  • High: It blocks our organization from running any Ray-based infrastructure.

Without other recent updates to our Ray-based infrastructure (as far as we can tell) this week we started seeing a warning repeated:

(raylet) /app/venv/lib/python3.8/site-packages/ray/dashboard/agent.py:163: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
(raylet)   if LooseVersion(aiohttp.__version__) < LooseVersion("4.0.0"):

At the same time there was a warning about protobuf needing to be version 3.20 or earlier.:

venv/lib/python3.8/site-packages/ray/__init__.py:91: in <module>
    import ray._raylet  # noqa: E402
python/ray/_raylet.pyx:115: in init ray._raylet
    ???
venv/lib/python3.8/site-packages/ray/exceptions.py:7: in <module>
    from ray.core.generated.common_pb2 import RayException, Language, PYTHON
venv/lib/python3.8/site-packages/ray/core/generated/common_pb2.py:15: in <module>
    from . import runtime_env_common_pb2 as src_dot_ray_dot_protobuf_dot_runtime__env__common__pb2
venv/lib/python3.8/site-packages/ray/core/generated/runtime_env_common_pb2.py:36: in <module>
    _descriptor.FieldDescriptor(
venv/lib/python3.8/site-packages/google/protobuf/descriptor.py:560: in __new__
    _message.Message._CheckCalledFromGeneratedFile()

E   TypeError: Descriptors cannot not be created directly.
E   If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
E   If you cannot immediately regenerate your protos, some other possible workarounds are:
E    1. Downgrade the protobuf package to 3.20.x or lower.
E    2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
E
E   More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

So we pinned that dependency. No change, other than no further related warnings.

Then an exception began happening whenever we make use of actors:

 Traceback (most recent call last):
(raylet)   File "/app/venv/lib/python3.8/site-packages/ray/dashboard/agent.py", line 391, in <module>
(raylet)     loop.run_until_complete(agent.run())
(raylet)   File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
(raylet)     return future.result()
(raylet)   File "/app/venv/lib/python3.8/site-packages/ray/dashboard/agent.py", line 178, in run
(raylet)     modules = self._load_modules()
(raylet)   File "/app/venv/lib/python3.8/site-packages/ray/dashboard/agent.py", line 120, in _load_modules
(raylet)     c = cls(self)
(raylet)   File "/app/venv/lib/python3.8/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 161, in __init__
(raylet)     self._metrics_agent = MetricsAgent(
(raylet)   File "/app/venv/lib/python3.8/site-packages/ray/_private/metrics_agent.py", line 75, in __init__
(raylet)     prometheus_exporter.new_stats_exporter(
(raylet)   File "/app/venv/lib/python3.8/site-packages/ray/_private/prometheus_exporter.py", line 332, in new_stats_exporter
(raylet)     exporter = PrometheusStatsExporter(
(raylet)   File "/app/venv/lib/python3.8/site-packages/ray/_private/prometheus_exporter.py", line 265, in __init__
(raylet)     self.serve_http()
(raylet)   File "/app/venv/lib/python3.8/site-packages/ray/_private/prometheus_exporter.py", line 319, in serve_http
(raylet)     start_http_server(
(raylet)   File "/app/venv/lib/python3.8/site-packages/prometheus_client/exposition.py", line 168, in start_wsgi_server
(raylet)     TmpServer.address_family, addr = _get_best_family(addr, port)
(raylet)   File "/app/venv/lib/python3.8/site-packages/prometheus_client/exposition.py", line 157, in _get_best_family
(raylet)     infos = socket.getaddrinfo(address, port)
(raylet)   File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo
(raylet)     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
(raylet) socket.gaierror: [Errno -2] Name or service not known

Then on exit:

(raylet) Traceback (most recent call last):
(raylet)   File "/app/venv/lib/python3.8/site-packages/ray/dashboard/agent.py", line 407, in <module>
(raylet)     gcs_publisher = GcsPublisher(args.gcs_address)
(raylet) TypeError: __init__() takes 1 positional argument but 2 were given

Digging into the code, perhaps this is the constructor being referenced?

  GcsPublisher(const std::shared_ptr<RedisClient> &redis_client,
               std::unique_ptr<pubsub::Publisher> publisher)
      : pubsub_(std::make_unique<GcsPubSub>(redis_client)),
	publisher_(std::move(publisher)) {}

This happens for us on:

  • Ray 1.11
  • Python 3.8
  • Ubuntu 20.04 LTS on Intel CPU (no GPU)
  • this happens running Ray standalone
  • the application calling Ray is in FastAPI running under Uvicorn
  • reproduced on GCS instances, Azure instances, and also running in Docker locally

We tried upgrading to Ray 1.13, with the same errors occurring – with or without protobuf being pinned to 3.20

Since it was mentioning the dashboard, we’ve tried disabling the Ray dashboard, although that didn’t change the error.

It seems closely related to RLlib evaluation rollout: socket.gaierror [Errno -2] Name or service not known

We’re working on troubleshooting this. Any ideas or suggestions?

Many thanks -
Paco

1 Like

Getting a sense this could be a prometheus-client out of sync with ray for the Pip installs?

2 Likes

Yeah, we have seen similar issues and pin protobuf to 3.20.1 have solved those issues. Does downgrade prometheus-client solve your problem? Also for our debugging, can you show your current python packages by running pip freeze?

1 Like

Great, thank you!

When we upgraded to Ray 1.13 then it appears that prometheus-client downgrades to 0.13.1

aiohttp==3.7.0
aiohttp-cors==0.7.0
aiosignal==1.2.0
anyio==3.6.1
appnope==0.1.3
astroid==2.11.6
asttokens==2.0.5
async-timeout==3.0.1
attrs==21.4.0
azure-common==1.1.28
azure-core==1.24.1
azure-identity==1.10.0
azure-keyvault-secrets==4.4.0
azure-storage-blob==12.12.0
backcall==0.2.0
bandit==1.7.4
bleach==5.0.1
blessed==1.19.1
boto3==1.24.18
botocore==1.27.18
cachetools==5.2.0
certifi==2022.6.15
cffi==1.15.0
cfgv==3.3.1
chardet==3.0.4
charset-normalizer==2.0.12
chocolate==0.0.2
click==8.0.4
cloudpathlib==0.9.0
codespell==2.1.0
colorama==0.4.5
colorful==0.5.4
commonmark==0.9.1
cryptography==37.0.2
curlify==2.2.1
cycler==0.11.0
Cython==0.29.30
debugpy==1.6.0
decorator==5.1.1
dill==0.3.5.1
distlib==0.3.4
docutils==0.18.1
entrypoints==0.4
exceptiongroup==1.0.0rc8
executing==0.8.3
fastapi==0.78.0
fastapi-jwt-auth==0.5.0
fastapi-versioning==0.10.0
fastjsonschema==2.15.3
filelock==3.7.1
fonttools==4.33.3
frozenlist==1.3.0
gitdb==4.0.9
GitPython==3.1.27
google-api-core==2.8.2
google-auth==2.8.0
google-cloud-core==2.3.1
google-cloud-storage==2.4.0
google-crc32c==1.3.0
google-resumable-media==2.3.3
googleapis-common-protos==1.56.3
gpustat==1.0.0b1
graphql-core==3.2.1
grpcio==1.43.0
h11==0.13.0
httptools==0.4.0
hypothesis==6.48.1
hypothesis-graphql==0.9.0
hypothesis-jsonschema==0.22.0
icecream==2.1.2
identify==2.5.1
idna==3.3
importlib-metadata==4.12.0
importlib-resources==5.8.0
iniconfig==1.1.1
ipykernel==6.15.0
ipython==8.4.0
iso8601==1.0.2
isodate==0.6.1
isort==5.10.1
jedi==0.18.1
Jinja2==3.1.2
jmespath==1.0.1
jsonpickle==2.2.0
jsonschema==4.6.0
junit-xml==1.9
jupyter-client==7.3.4
jupyter-core==4.10.0
keyring==23.6.0
kiwisolver==1.4.3
kopf==1.35.5
kubernetes==24.2.0
lazy-object-proxy==1.7.1
MarkupSafe==2.1.1
matplotlib==3.5.2
matplotlib-inline==0.1.3
mccabe==0.7.0
msal==1.18.0
msal-extensions==1.0.0
msgpack==1.0.4
msrest==0.7.1
multidict==6.0.2
mypy==0.961
mypy-extensions==0.4.3
nbclient==0.5.13
nbformat==5.4.0
nbmake==1.3.0
nest-asyncio==1.5.5
networkx==2.8.4
nodeenv==1.7.0
numpy==1.23.0
nvidia-ml-py3==7.352.0
oauthlib==3.2.0
opencensus==0.9.0
opencensus-context==0.1.2
owlrl==6.0.2
packaging==21.3
pandas==1.4.3
parso==0.8.3
pbr==5.9.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.1.1
pkginfo==1.8.3
platformdirs==2.5.2
pluggy==1.0.0
portalocker==2.4.0
pre-commit==2.19.0
prettytable==2.5.0
prometheus-client==0.13.1
prompt-toolkit==3.0.30
protobuf==3.20.0
pslpython==2.2.2
psutil==5.9.1
ptyprocess==0.7.0
pulsar-client==2.10.0
pure-eval==0.2.2
py==1.11.0
py-spy==0.3.12
pyarrow==8.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pydantic==1.9.1
Pygments==2.12.0
PyJWT==1.7.1
pylint==2.14.3
pyparsing==3.0.9
pyrsistent==0.18.1
pyshacl==0.19.0
pytest==7.1.2
pytest-subtests==0.7.0
python-dateutil==2.8.2
python-dotenv==0.20.0
python-json-logger==2.0.2
pytz==2022.1
pyvis==0.2.1
PyYAML==6.0
pyzmq==23.2.0
ray==1.13.0
rdflib==6.1.1
readme-renderer==35.0
requests==2.27.1
requests-oauthlib==1.3.1
requests-toolbelt==0.9.1
rfc3986==2.0.0
rich==12.4.4
rsa==4.8
s3transfer==0.6.0
schemathesis==3.15.6
six==1.16.0
smart-open==6.0.0
smmap==5.0.0
sniffio==1.2.0
sortedcontainers==2.4.0
stack-data==0.3.0
starlette==0.19.1
stevedore==3.5.0
toml==0.10.2
tomli==2.0.1
tomli_w==1.0.0
tomlkit==0.11.0
tornado==6.1
tqdm==4.64.0
traitlets==5.3.0
twine==4.0.1
types-PyYAML==6.0.9
typing_extensions==4.2.0
urllib3==1.26.9
uvicorn==0.18.2
uvloop==0.16.0
virtualenv==20.15.0
watchfiles==0.15.0
watermark==2.3.1
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.3.3
websockets==10.3
Werkzeug==2.1.2
wrapt==1.14.1
xmltodict==0.13.0
yarl==1.7.2
zipp==3.8.0
1 Like

hey @ceteri sorry dropped the ball here. last time i tried it’s working for me. here is my pip freeze and a simple ray script.

test.py

import ray

@ray.remote
class Actor:
    def foo(self):
        return 1

a = Actor.remote()
print(ray.get(a.foo.remote()))

output

ubuntu@ip-172-31-12-154:~$ python3 ./test.py
2022-07-11 20:13:05,895	INFO services.py:1470 -- View the Ray dashboard at http://127.0.0.1:8265
1

pip3 freeze:

aiohttp==3.7.0
aiohttp-cors==0.7.0
aiosignal==1.2.0
anyio==3.6.1
appnope==0.1.3
astroid==2.11.6
asttokens==2.0.5
async-timeout==3.0.1
attrs==21.4.0
Automat==20.2.0
azure-common==1.1.28
azure-core==1.24.1
azure-identity==1.10.0
azure-keyvault-secrets==4.4.0
azure-storage-blob==12.12.0
Babel==2.8.0
backcall==0.2.0
bandit==1.7.4
bcrypt==3.2.0
bleach==5.0.1
blessed==1.19.1
blinker==1.4
boto3==1.24.18
botocore==1.27.18
cachetools==5.2.0
certifi==2022.6.15
cffi==1.15.0
cfgv==3.3.1
chardet==3.0.4
charset-normalizer==2.0.12
chocolate==0.0.2
click==8.0.4
cloud-init==22.2
cloudpathlib==0.9.0
codespell==2.1.0
colorama==0.4.5
colorful==0.5.4
command-not-found==0.3
commonmark==0.9.1
configobj==5.0.6
constantly==15.1.0
cryptography==37.0.2
curlify==2.2.1
cycler==0.11.0
Cython==0.29.30
dbus-python==1.2.18
debugpy==1.6.0
decorator==5.1.1
dill==0.3.5.1
distlib==0.3.4
distro==1.7.0
distro-info===1.1build1
docutils==0.18.1
ec2-hibinit-agent==1.0.0
entrypoints==0.4
exceptiongroup==1.0.0rc8
executing==0.8.3
fastapi==0.78.0
fastapi-jwt-auth==0.5.0
fastapi-versioning==0.10.0
fastjsonschema==2.15.3
filelock==3.7.1
fonttools==4.33.3
frozenlist==1.3.0
gitdb==4.0.9
GitPython==3.1.27
google-api-core==2.8.2
google-auth==2.8.0
google-cloud-core==2.3.1
google-cloud-storage==2.4.0
google-crc32c==1.3.0
google-resumable-media==2.3.3
googleapis-common-protos==1.56.3
gpustat==1.0.0b1
graphql-core==3.2.1
grpcio==1.43.0
h11==0.13.0
httplib2==0.20.2
httptools==0.4.0
hyperlink==21.0.0
hypothesis==6.48.1
hypothesis-graphql==0.9.0
hypothesis-jsonschema==0.22.0
icecream==2.1.2
identify==2.5.1
idna==3.3
importlib-metadata==4.12.0
importlib-resources==5.8.0
incremental==21.3.0
iniconfig==1.1.1
ipykernel==6.15.0
ipython==8.4.0
iso8601==1.0.2
isodate==0.6.1
isort==5.10.1
jedi==0.18.1
jeepney==0.7.1
Jinja2==3.1.2
jmespath==1.0.1
jsonpatch==1.32
jsonpickle==2.2.0
jsonpointer==2.0
jsonschema==4.6.0
junit-xml==1.9
jupyter-client==7.3.4
jupyter-core==4.10.0
keyring==23.6.0
kiwisolver==1.4.3
kopf==1.35.5
kubernetes==24.2.0
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
lazy-object-proxy==1.7.1
MarkupSafe==2.1.1
matplotlib==3.5.2
matplotlib-inline==0.1.3
mccabe==0.7.0
more-itertools==8.10.0
msal==1.18.0
msal-extensions==1.0.0
msgpack==1.0.4
msrest==0.7.1
multidict==6.0.2
mypy==0.961
mypy-extensions==0.4.3
nbclient==0.5.13
nbformat==5.4.0
nbmake==1.3.0
nest-asyncio==1.5.5
netifaces==0.11.0
networkx==2.8.4
nodeenv==1.7.0
numpy==1.23.0
nvidia-ml-py3==7.352.0
oauthlib==3.2.0
opencensus==0.9.0
opencensus-context==0.1.2
owlrl==6.0.2
packaging==21.3
pandas==1.4.3
parso==0.8.3
pbr==5.9.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.1.1
pkginfo==1.8.3
platformdirs==2.5.2
pluggy==1.0.0
portalocker==2.4.0
pre-commit==2.19.0
prettytable==2.5.0
prometheus-client==0.13.1
prompt-toolkit==3.0.30
protobuf==3.20.0
pslpython==2.2.2
psutil==5.9.1
ptyprocess==0.7.0
pulsar-client==2.10.0
pure-eval==0.2.2
py==1.11.0
py-spy==0.3.12
pyarrow==8.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pydantic==1.9.1
Pygments==2.12.0
PyGObject==3.42.0
PyHamcrest==2.0.2
PyJWT==1.7.1
pylint==2.14.3
pyOpenSSL==21.0.0
pyparsing==3.0.9
pyrsistent==0.18.1
pyserial==3.5
pyshacl==0.19.0
pytest==7.1.2
pytest-subtests==0.7.0
python-apt==2.3.0+ubuntu2
python-dateutil==2.8.2
python-debian===0.1.43ubuntu1
python-dotenv==0.20.0
python-json-logger==2.0.2
pytz==2022.1
pyvis==0.2.1
PyYAML==6.0
pyzmq==23.2.0
ray==1.13.0
rdflib==6.1.1
readme-renderer==35.0
requests==2.27.1
requests-oauthlib==1.3.1
requests-toolbelt==0.9.1
rfc3986==2.0.0
rich==12.4.4
rsa==4.8
s3transfer==0.6.0
schemathesis==3.15.6
SecretStorage==3.3.1
service-identity==18.1.0
six==1.16.0
smart-open==6.0.0
smmap==5.0.0
sniffio==1.2.0
sortedcontainers==2.4.0
sos==4.3
ssh-import-id==5.11
stack-data==0.3.0
starlette==0.19.1
stevedore==3.5.0
systemd-python==234
toml==0.10.2
tomli==2.0.1
tomli_w==1.0.0
tomlkit==0.11.0
tornado==6.1
tqdm==4.64.0
traitlets==5.3.0
twine==4.0.1
Twisted==22.1.0
types-PyYAML==6.0.9
typing_extensions==4.2.0
ubuntu-advantage-tools==27.8
ufw==0.36.1
unattended-upgrades==0.1
urllib3==1.26.9
uvicorn==0.18.2
uvloop==0.16.0
virtualenv==20.15.0
wadllib==1.3.6
watchfiles==0.15.0
watermark==2.3.1
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.3.3
websockets==10.3
Werkzeug==2.1.2
wrapt==1.14.1
xmltodict==0.13.0
yarl==1.7.2
zipp==3.8.0
zope.interface==5.4.0
1 Like

Thank you kindly.
Yes, with the upgrade and library pinning, this has been resolved.

1 Like