When are named actors shared and to whom?

kunterbunt · April 13, 2023, 7:20am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi all, I’ve been reading the docs but I am not getting this concept, let me explain/ask by example.

I am running ten repetitions of one experiment. Each one of them calls a Python script, and so each calls

git.init()
...
MyActor.options(name="my_actor", namespace="my_namespace", lifetime="detached).remote(...)
...
# then, inside code that runs on the workers
ray.get_actor("my_actor", namespace="my_namespace")

The init creates a Ray cluster or connects to one if already present.
I have implemented my own actor, it is named, and detached, and remote.
In the workers I get the named actor.

Since I execute the same code ten times for the ten repetitions, which run in parallel on a SLURM cluster, does this create ten different actors, or just one actor?

My intended behavior is to have the ten repetitions be completely independent of each other. I want each run to create their own actor, its workers connect to it, and do their thing. I do not want one actor to be created that is somehow shared between the ten repetitions.

I suppose it comes down to a lack in my understanding of what a Ray cluster is, and what resources are shared how?
I am wondering if I’m doing it wrong especially because sometimes my repetitions crash with the following error:

ValueError: Failed to look up actor with name 'my_actor'. This could because 1. You are trying to look up a named actor you didn't create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor.

and sometimes they don’t. I see no reason why the identical code should sometimes work and sometimes fail if the runs were properly independent. I would love some clarifications on this!

Further, if I do in fact create a shared, named actor between my parallel runs, would I work around this issue by either naming each actor of each repetition differently, i.e. my_actor_rep1, my_actor_rep2, or by using different namespaces? Which method is better?

Thanks!

kunterbunt · April 14, 2023, 7:32am

After digging around on my own, I have discovered that actors are not usable from other Python processes. So instantiating an actor in one process and trying to retrieve it from another does not work in the way I do it above. Which is what I want.

I have further discovered that when my actor couldn’t be looked up, this is a valid error message, because there as an error spinning it up. I had simply overlooked this in the heap of error messages.

So, for anyone doing something similar: actors are properly independent in-between Python processes (at least if you do it as in my original post).

Oh and of course above it should not be git.init() but ray.init(); can’t edit the above post anymore.

sangcho · April 14, 2023, 3:06pm

Hmm actually it is not true if lifetime=“detached”. If you specify lifetime = detached, it can be reused from other jobs (detached means the lifetime of actor doesn’t depend on the ref count anymore. By default, actor handles are distributed ref counted, and when ref goes 0, it is GC’ed)

kunterbunt · April 14, 2023, 3:23pm

Huh, interesting. I’ve changed my code now so that the actors of the different repetitions have unique names and namespaces, e.g. actor_rep1 in namespace_rep1 and so on.
Would you do it in a different way or is this fine?

kunterbunt · April 14, 2023, 3:24pm

Also, I couldn’t reproduce this with a dummy test where I had script1.py instantiate a detached actor, kept it running, and had script2.py try to get the named, detached actor. It couldn’t find it.

sangcho · April 17, 2023, 1:40pm

Can you show me the code for ^?

Topic		Replies	Views
Starting Java actors in different namespaces than the Ray driver Ray Core	3	290	June 10, 2022
Detached Actor. Correct Definition and Declaration(Can't reproduce consistently :( )? Ray Core	3	547	May 19, 2022
Ray.serve.start failure when actors with names lacking '#' already exist in namespace Ray Serve	1	542	November 4, 2021
Divide Work between Actors Ray Core	5	328	January 22, 2021
Creating actors when their amount is more than `num_cpus` Ray Core	8	4216	April 29, 2021

When are named actors shared and to whom?

Related topics