I use multiple actors for different jobs.
I defined different functions for each job, and their computation time differ.
Therefore, I want to assign a whole GPU for the heavy-load actor and share the other GPU for light actors.
However, I cannot find how to assign a specific actor to a specific GPU.
@jsuarez5341
It seems a reasonable workaround.
However, if you have several machines with different GPU,
for example, one with 1080TI, and the other with RTX3090.
I want to assign a heavy actor to RTX3090.
I cannot find how to.
Technically, you might be able to specify custom resources per machine depending on how you spin up your Ray cluster… can’t think of anything simpler though
@kyunghyun.lee doesthis describe what you’re trying to do?
I don’t think we have a constant for those GPU’s specifically (contribution welcome though), but you should be able to find the right string in the available resources (it will probably be called accelerator_type:1080TI or something like that).
I tried to follow the answer. But it is still not clear how to use desired GPUs. Here’s my code that taken from Hugginface.
# Create Ray actors only for rank 0.
if ("LOCAL_RANK" not in os.environ or os.environ["LOCAL_RANK"] == 0) and (
"NODE_RANK" not in os.environ or os.environ["NODE_RANK"] == 0
):
remote_cls = ray.remote(RayRetriever)
named_actors = [
remote_cls.options(name="retrieval_worker_{}".format(i)).remote()
for i in range(args.num_retrieval_workers)
]
else:
logger.info(
"Getting named actors for NODE_RANK {}, LOCAL_RANK {}".format(
os.environ["NODE_RANK"], os.environ["LOCAL_RANK"]
)
)
named_actors = [ray.get_actor("retrieval_worker_{}".format(i)) for i in range(args.num_retrieval_workers)]
This is how I initialized the ray cluster according to this huggingface code.
As you can see it get created with GPU device 0 and, kind of gets copied in other devices. Now I want to execute this worker in a preferred GPU. Let’s say GPU ranked 4th. How to do this?
Hey @Shamane_Siriwardhana , Ray makes the assumption that all GPU’s on a single node are identical. Can you provide some more context about why you want the actor to run on the 5th GPU specifically?
Due to a memory constrain. While I run the training process I want to calculate some embedding on inputs from a ray worker. Stuff happens inside this works have no effect in gradient graph. During the computation I want to make sure the RAY worker has only access to certain percentage of gpu memory in a given GPU.
In a two GPU setting I assigned the num_gps for 0.25. Then called the RAY worker on GPU ranked 1 ( secondary one). Programme throws me an error saying RAY worker has only one GPU.