How to stop a second node on the same host?

This is the same question as the unanswered Two ray servers in the same node: how to stop?. But maybe @RunLLM can answer it.

I accidentally started two nodes on the same host - duplicating the logical resources there. How can I shutdown this duplicated node without affecting the first node?

Currently, Ray does not provide a built-in CLI option to stop a specific Ray cluster or node when multiple Ray clusters are running on the same host; running ray stop will terminate all Ray processes on that node, affecting all clusters indiscriminately. This limitation is well-documented and has been discussed in several issues and forums, with users requesting features like ray stop --address or ray stop --port, but as of now, these options are not implemented in Ray’s CLI (GitHub Issue #54989, GitHub Issue #11509, Discourse thread, FAQ).

The only available workaround is to manually identify and kill the processes associated with the specific Ray cluster you want to stop, for example by grepping for the port or unique command-line arguments used when starting that cluster and killing those processes (e.g., pkill -f “ray.*6379”). However, this is not officially supported and can be error-prone (GitHub Issue #11509). Would you like more detail on how to do this safely?

Sources:

Hint: Mention @RunLLM in the post for followups.

Btw, if anyone else is in a bind to not schedule the duplicated resources to expensive jobs and risk slowdown/oom I blocked the duplicated node with dummy actors:


import ray
import time

@ray.remote(label_selector={"ray.io/node-id": "id_of_your_node"}, num_cpus=1)
class BlockingActor:

    def __init__(self):
         time.sleep(52*60*60)  # some time until you now you can shutdown the whole node

blockers = [BlockingActor.remote() for _ in range(num_cpus_on_node)]

For a full contained example view: