The Ray agent couldn't be started due to the port conflict. To solve the problem, start Ray with a hard-coded agent port

raytune_kuberay_user · October 28, 2022, 10:01pm

We are running some ray.tune unit tests on Linux. The test setup is like this. We only launch one ray cluster.


ray.init()

tune.run(xxxx)

ray.shutdown()

we occasionally see this error.


E ray.exceptions.RuntimeEnvSetupError: Failed to setup runtime environment.

E Could not create the actor because its associated runtime env failed to be created.

E Failed to create runtime environment {"envVars": {"TUNE_ORIG_WORKING_DIR": "xxxx”}} because the Ray agent couldn't be started due to the port conflict. See `dashboard_agent.log` for more details. To solve the problem, start Ray with a hard-coded agent port. `ray start --dashboard-agent-grpc-port [port]` and make sure the port is not used by other processes.

However, I think internally ray assigns this port randomly. Saw a similar issue from TEST: CI: flaky ray windows failure: "System error: Unknown error" · Issue #4905 · modin-project/modin · GitHub . Also, ray.init() does not have a argument called dashboard-agent-grpc-port…
How to properly bypass this flakiness? Should we have a subprocess running ray start --dashboard-agent-grpc-port 0? Any suggestions?

cade · November 1, 2022, 2:17am

cc @architkulkarni I think we should retry if we find that the port number that we randomly generated is currently in-use. Is that work planned for runtime env creation // is my diagnosis correct?

architkulkarni · November 1, 2022, 9:12pm

Sorry you’re running into this, I think the best thing would be to specify the port in ray start as you suggest. We plan to fix this in the future.

raytune_kuberay_user · November 4, 2022, 9:15pm

doing

+        sock = socket.socket()
+        sock.bind(('', 0))
+        free_port = sock.getsockname()[1]
+        subprocess.check_call(["ray", "start" , "--head", f"--dashboard-agent-grpc-port={free_port}", "--include-dashboard=False"])
+        ray.init(address="auto")

Still doesn’t solve the issue.@architkulkarni Do you have some suggestions?

sangcho · November 9, 2022, 11:06pm

Can you share the log from dashboard_agent.log?

sangcho · November 10, 2022, 5:56am

Synced offline. @raytune_kuberay_user will try setting both agent ports specified in this doc Configuring Ray — Ray 3.0.0.dev0

raytune_kuberay_user · November 10, 2022, 5:59pm

Thanks. For local testing, do you recommand us to do
ray.init(local_mode=True) ? Will this help resolving the port conflict? @sangcho

sangcho · November 10, 2022, 11:35pm

We deprecated that feature from 2.2 (the feature was unmaintained for a while…). We recommend you to run ray.init() and run workloads. If you think local_mode=True is important for unit test, please file a feature request to re-enable it!

nateyoder · August 30, 2023, 11:12pm

I’m running into the same issue (Failed to create runtime environment for job 01000000 because the Ray agent couldn't be started due to the port conflict. See dashboard_agent.log for more details.. Is this planned way to do this from ray.init or do I still need to call out to a seperate subprocess even for unit testing? Is there now a better way to do this? Furthermore, in my ray.init call I have set include_dashboard=False but am still getting this error. Is this expected?

Topic		Replies	Views
Runtime_env fails when running Ray in Docker Ray Core	8	1952	April 6, 2022
Unknown error that no appears on dashboard_agent.log Ray Core	4	352	January 25, 2023
Intial setup for ray on a HPC Ray Serve	4	654	January 20, 2024
Ray init fail in my local server with error agent_manager.cc:135: Ray Core	8	1761	December 19, 2023
Can't connect to ray cluster when passing `runtime_env` to `ray.init` Ray Client	2	253	July 12, 2024

The Ray agent couldn't be started due to the port conflict. To solve the problem, start Ray with a hard-coded agent port

Related topics