How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi all, I’ve been reading the docs but I am not getting this concept, let me explain/ask by example.
I am running ten repetitions of one experiment. Each one of them calls a Python script, and so each calls
git.init()
...
MyActor.options(name="my_actor", namespace="my_namespace", lifetime="detached).remote(...)
...
# then, inside code that runs on the workers
ray.get_actor("my_actor", namespace="my_namespace")
- The
init
creates a Ray cluster or connects to one if already present. - I have implemented my own actor, it is named, and detached, and remote.
-
In the workers I
get
the named actor.
Since I execute the same code ten times for the ten repetitions, which run in parallel on a SLURM cluster, does this create ten different actors, or just one actor?
My intended behavior is to have the ten repetitions be completely independent of each other. I want each run to create their own actor, its workers connect to it, and do their thing. I do not want one actor to be created that is somehow shared between the ten repetitions.
I suppose it comes down to a lack in my understanding of what a Ray cluster is, and what resources are shared how?
I am wondering if I’m doing it wrong especially because sometimes my repetitions crash with the following error:
ValueError: Failed to look up actor with name 'my_actor'. This could because 1. You are trying to look up a named actor you didn't create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor.
and sometimes they don’t. I see no reason why the identical code should sometimes work and sometimes fail if the runs were properly independent. I would love some clarifications on this!
Further, if I do in fact create a shared, named actor between my parallel runs, would I work around this issue by either naming each actor of each repetition differently, i.e. my_actor_rep1
, my_actor_rep2
, or by using different namespaces? Which method is better?
Thanks!