When are named actors shared and to whom?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi all, I’ve been reading the docs but I am not getting this concept, let me explain/ask by example.

I am running ten repetitions of one experiment. Each one of them calls a Python script, and so each calls

MyActor.options(name="my_actor", namespace="my_namespace", lifetime="detached).remote(...)
# then, inside code that runs on the workers
ray.get_actor("my_actor", namespace="my_namespace")

Since I execute the same code ten times for the ten repetitions, which run in parallel on a SLURM cluster, does this create ten different actors, or just one actor?

My intended behavior is to have the ten repetitions be completely independent of each other. I want each run to create their own actor, its workers connect to it, and do their thing. I do not want one actor to be created that is somehow shared between the ten repetitions.

I suppose it comes down to a lack in my understanding of what a Ray cluster is, and what resources are shared how?
I am wondering if I’m doing it wrong especially because sometimes my repetitions crash with the following error:

ValueError: Failed to look up actor with name 'my_actor'. This could because 1. You are trying to look up a named actor you didn't create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor.

and sometimes they don’t. I see no reason why the identical code should sometimes work and sometimes fail if the runs were properly independent. I would love some clarifications on this!

Further, if I do in fact create a shared, named actor between my parallel runs, would I work around this issue by either naming each actor of each repetition differently, i.e. my_actor_rep1, my_actor_rep2, or by using different namespaces? Which method is better?


After digging around on my own, I have discovered that actors are not usable from other Python processes. So instantiating an actor in one process and trying to retrieve it from another does not work in the way I do it above. Which is what I want.

I have further discovered that when my actor couldn’t be looked up, this is a valid error message, because there as an error spinning it up. I had simply overlooked this in the heap of error messages.

So, for anyone doing something similar: actors are properly independent in-between Python processes (at least if you do it as in my original post).

Oh and of course above it should not be git.init() but ray.init(); can’t edit the above post anymore.

Hmm actually it is not true if lifetime=“detached”. If you specify lifetime = detached, it can be reused from other jobs (detached means the lifetime of actor doesn’t depend on the ref count anymore. By default, actor handles are distributed ref counted, and when ref goes 0, it is GC’ed)

Huh, interesting. I’ve changed my code now so that the actors of the different repetitions have unique names and namespaces, e.g. actor_rep1 in namespace_rep1 and so on.
Would you do it in a different way or is this fine?

Also, I couldn’t reproduce this with a dummy test where I had script1.py instantiate a detached actor, kept it running, and had script2.py try to get the named, detached actor. It couldn’t find it.

Can you show me the code for ^?