How severe does this issue affect your experience of using Ray?
- High: It blocks our organization from running any Ray-based infrastructure.
Without other recent updates to our Ray-based infrastructure (as far as we can tell) this week we started seeing a warning repeated:
(raylet) /app/venv/lib/python3.8/site-packages/ray/dashboard/agent.py:163: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
(raylet) if LooseVersion(aiohttp.__version__) < LooseVersion("4.0.0"):
At the same time there was a warning about protobuf
needing to be version 3.20 or earlier.:
venv/lib/python3.8/site-packages/ray/__init__.py:91: in <module>
import ray._raylet # noqa: E402
python/ray/_raylet.pyx:115: in init ray._raylet
???
venv/lib/python3.8/site-packages/ray/exceptions.py:7: in <module>
from ray.core.generated.common_pb2 import RayException, Language, PYTHON
venv/lib/python3.8/site-packages/ray/core/generated/common_pb2.py:15: in <module>
from . import runtime_env_common_pb2 as src_dot_ray_dot_protobuf_dot_runtime__env__common__pb2
venv/lib/python3.8/site-packages/ray/core/generated/runtime_env_common_pb2.py:36: in <module>
_descriptor.FieldDescriptor(
venv/lib/python3.8/site-packages/google/protobuf/descriptor.py:560: in __new__
_message.Message._CheckCalledFromGeneratedFile()
E TypeError: Descriptors cannot not be created directly.
E If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
E If you cannot immediately regenerate your protos, some other possible workarounds are:
E 1. Downgrade the protobuf package to 3.20.x or lower.
E 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
E
E More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
So we pinned that dependency. No change, other than no further related warnings.
Then an exception began happening whenever we make use of actors:
Traceback (most recent call last):
(raylet) File "/app/venv/lib/python3.8/site-packages/ray/dashboard/agent.py", line 391, in <module>
(raylet) loop.run_until_complete(agent.run())
(raylet) File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
(raylet) return future.result()
(raylet) File "/app/venv/lib/python3.8/site-packages/ray/dashboard/agent.py", line 178, in run
(raylet) modules = self._load_modules()
(raylet) File "/app/venv/lib/python3.8/site-packages/ray/dashboard/agent.py", line 120, in _load_modules
(raylet) c = cls(self)
(raylet) File "/app/venv/lib/python3.8/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 161, in __init__
(raylet) self._metrics_agent = MetricsAgent(
(raylet) File "/app/venv/lib/python3.8/site-packages/ray/_private/metrics_agent.py", line 75, in __init__
(raylet) prometheus_exporter.new_stats_exporter(
(raylet) File "/app/venv/lib/python3.8/site-packages/ray/_private/prometheus_exporter.py", line 332, in new_stats_exporter
(raylet) exporter = PrometheusStatsExporter(
(raylet) File "/app/venv/lib/python3.8/site-packages/ray/_private/prometheus_exporter.py", line 265, in __init__
(raylet) self.serve_http()
(raylet) File "/app/venv/lib/python3.8/site-packages/ray/_private/prometheus_exporter.py", line 319, in serve_http
(raylet) start_http_server(
(raylet) File "/app/venv/lib/python3.8/site-packages/prometheus_client/exposition.py", line 168, in start_wsgi_server
(raylet) TmpServer.address_family, addr = _get_best_family(addr, port)
(raylet) File "/app/venv/lib/python3.8/site-packages/prometheus_client/exposition.py", line 157, in _get_best_family
(raylet) infos = socket.getaddrinfo(address, port)
(raylet) File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo
(raylet) for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
(raylet) socket.gaierror: [Errno -2] Name or service not known
Then on exit:
(raylet) Traceback (most recent call last):
(raylet) File "/app/venv/lib/python3.8/site-packages/ray/dashboard/agent.py", line 407, in <module>
(raylet) gcs_publisher = GcsPublisher(args.gcs_address)
(raylet) TypeError: __init__() takes 1 positional argument but 2 were given
Digging into the code, perhaps this is the constructor being referenced?
GcsPublisher(const std::shared_ptr<RedisClient> &redis_client,
std::unique_ptr<pubsub::Publisher> publisher)
: pubsub_(std::make_unique<GcsPubSub>(redis_client)),
publisher_(std::move(publisher)) {}
This happens for us on:
- Ray 1.11
- Python 3.8
- Ubuntu 20.04 LTS on Intel CPU (no GPU)
- this happens running Ray standalone
- the application calling Ray is in FastAPI running under Uvicorn
- reproduced on GCS instances, Azure instances, and also running in Docker locally
We tried upgrading to Ray 1.13, with the same errors occurring – with or without protobuf
being pinned to 3.20
Since it was mentioning the dashboard, we’ve tried disabling the Ray dashboard, although that didn’t change the error.
It seems closely related to RLlib evaluation rollout: socket.gaierror [Errno -2] Name or service not known
We’re working on troubleshooting this. Any ideas or suggestions?
Many thanks -
Paco