1. Severity of the issue: (select one)
Low: Annoying but doesn’t hinder my work.
2. Environment:
Ray version: 2.44.1
Python version: 3.11.18
OS: Ubuntu 24.04.
3. What happened vs. what you expected:
Expected: Call ray.shutdown() and ray exits gracefully
Actual: Ray initializes a new cluster
I have a standard ray.rllib workflow with a try ... finally block like such
import ray
try:
ray.init(**params)
algo = PPOConfig().build_algo()
for i in range(num_iters):
res = algo.train()
# other things here
algo.save()
except Exception:
# handle here
finally:
if ray.is_initialized():
ray.shutdown()
But every time the program exits, either successfully or with an exception and also with keyboard interrupts, a new ray cluster is initialized again.
I checked the worker and driver logs. The driver and half (3/6) of the workers appear to be shutting down correctly a few seconds before the new cluster is spun up. The other three don’t log anything regarding a shutdown.
I have tested all combinations of the initialization config (the args passed to ray.init()) and it doesn’t change anything.
I also checked to see if ray.is_initialized() is true in the finally clause, which it is. Also occurs without the if clause.
Am I not shutting down everything properly here? I’d appreciate any pointers.
Your usage of ray.shutdown() in the finally block is correct, and it should terminate the Ray runtime and all associated processes. However, if a new Ray cluster is being initialized immediately after shutdown, this is not expected behavior. According to the Ray documentation and the Ray startup guide, ray.shutdown() should clean up the Ray state, and Ray should not automatically restart unless something in your code or environment is triggering a new ray.init().
If you are not explicitly calling ray.init() again after shutdown, check for:
Any background threads or callbacks that might be using Ray APIs after shutdown, which can implicitly trigger a new ray.init() (see docs).
Any code outside your shown snippet that could be causing Ray to reinitialize.
If using Ray RLlib, ensure no background workers or environments are still running after shutdown.
Would you like a step-by-step checklist to debug this further?
You can programmatically check for running Ray worker processes after ray.shutdown() by using the Ray State API: call ray.util.state.list_workers() to list all workers and verify their status. This approach is shown in Ray’s test suite, where after shutdown, the test waits until all workers are gone by checking the worker list (test_ray_shutdown.py).
Example:
import ray
from ray.util.state import list_workers
ray.shutdown()
# Wait until all workers are gone
while any(w['is_alive'] for w in list_workers(detail=True)):
time.sleep(1)
Would you like more detail on interpreting the worker list or automating this check?
@RunLLM this raises a ConnectionError because no running ray instance can be found. Then straight after raising the error, the new local ray instance starts up.
This behavior occurs because calling Ray APIs (like list_workers) after ray.shutdown() and when no Ray instance is running will raise a ConnectionError, and if you then call any Ray API again, Ray may auto-initialize a new local instance (Ray startup guide). This is expected: Ray will auto-start if you use its APIs without an active cluster.
To avoid this, do not call Ray APIs after shutdown, or catch the ConnectionError and prevent further Ray API calls. There is no supported way to query Ray state after shutdown without risking auto-initialization.
The solution for me is running ray.shutdown() in the global scope, on the module level. Then it doesn’t start a new ray instance after shutdown.
Before I had the try ... finally block inside a function scope.