Ray 1.12.0 Windows Issues

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Hello everyone,

We attempted to update ray to 1.12.x, but seems it has a lot of issues on Windows, and some issues on Linux.

Unlike v1.11.x, installing using pip on a fresh environments fails on both WSL and Windows.

In WSL, we get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/[redacted]/projects/oss/test-ray/venv/lib/python3.9/site-packages/ray/__init__.py", line 108, in <module>
    import ray._raylet  # noqa: E402
  File "python/ray/_raylet.pyx", line 115, in init ray._raylet
  File "/home/[redacted]/projects/oss/test-ray/venv/lib/python3.9/site-packages/ray/exceptions.py", line 7, in <module>
    from ray.core.generated.common_pb2 import RayException, Language, PYTHON
  File "/home/[redacted]/projects/oss/test-ray/venv/lib/python3.9/site-packages/ray/core/generated/common_pb2.py", line 15, in <module>
    from . import runtime_env_common_pb2 as src_dot_ray_dot_protobuf_dot_runtime__env__common__pb2
  File "/home/[redacted]/projects/oss/test-ray/venv/lib/python3.9/site-packages/ray/core/generated/runtime_env_common_pb2.py", line 36, in <module>
  File "/home/[redacted]/projects/oss/test-ray/venv/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 560, in __new__
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

In Windows, it’s a bit worse:

  1. When using simple pip install ray, it fails on redis import error.
  2. When using poetry to install, we get all the dependencies, however, the raylet dies without any hints I can find in the logs files 20-30 seconds after ray.init() is called.

Here I try to use ray in REPL on Windows, it gives an exception which for some reason doesn’t persist in the terminal:

[2022-05-29 09:45:03,975 E 11980 16468] (raylet.exe) agent_manager.cc:107: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. See `dashboard_agent.log` for the root cause.
>>> @ray.remote
... def foo():
...  print("hi")
>>> foo.Windows fatal exception: access violation
[REPL dies here]

It says foo.Windows because I was trying to call foo.remote()

This blocks us from upgrading, but we can easily downgrade as we don’t use anything fancy, just ray core for now.


@Mazyod this is an issue caused by protobuf upgrading in the dependence. We’ve limited the protobuf to <3.20.x in the nightly version.
For your case, could you uninstall protobuf and install the version <3.20.x?

1 Like

Thanks for the reply, yic.

I’m happy to help if you need me to investigate in order to identity a root cause. However, for us, downgrading is a perfectly fine resolution, so we don’t need a workaround at the moment. In all cases, any workaround would have to work on windows, too, as we deploy on both environments.

@Mazyod Thank you for willing to help, but we already identified the root cause: the ray core layer is compiled with a protobuf not compatible with the newest protobuf (py). So until ray core upgrade the protobuf (cpp) we have to stick to the one < 3.20.x. It’s actually also broken in mac/linux as well.

We never pin or restrict protobuf version in ray/python/setup.py before and that’s why when you install ray, it’ll always try to fetch the newest version of protobuf and unfortunately, it’s not backward compatible right now. The version restrict has been added in this PR.

Are you suggesting that the crash on windows is also due to the protobuf issue?

Yes, it could be. Have you given it a try? Let me know if there is any new error showed up and we can see what might be the cause. We have CI tests for windows as well and most of them are running well.