How to assign a specific actor to a specific GPU

I use multiple actors for different jobs.
I defined different functions for each job, and their computation time differ.
Therefore, I want to assign a whole GPU for the heavy-load actor and share the other GPU for light actors.
However, I cannot find how to assign a specific actor to a specific GPU.

I also opened github issue, https://github.com/ray-project/ray/issues/12247
Thanks!

@ray.remote(num_gpus=1)
class HeavyActor:
pass

@ray.remote(num_gpus=0.5)
class LightActor(HeavyActor):
pass

Does that do it?

@jsuarez5341
It seems a reasonable workaround.
However, if you have several machines with different GPU,
for example, one with 1080TI, and the other with RTX3090.
I want to assign a heavy actor to RTX3090.
I cannot find how to.

Oof yeah that’s rough. @devs, add some form of hardware ID to https://docs.ray.io/en/latest/tune/api_docs/trainable.html?highlight=resource#advanced-resource-allocation?

Technically, you might be able to specify custom resources per machine depending on how you spin up your Ray cluster… can’t think of anything simpler though

@kyunghyun.lee doesthis describe what you’re trying to do?

I don’t think we have a constant for those GPU’s specifically (contribution welcome though), but you should be able to find the right string in the available resources (it will probably be called accelerator_type:1080TI or something like that).

1 Like

… I didn’t even know that was a thing lol. Would be good to have that linked in the Actor doc section

I didn’t know about the accelerator type.
I should try this.
Thanks!

1 Like

Thanks! This type of feedback is actually super useful (especially since we’re trying to actively make our documentation easier to navigate).

1 Like

any update on this? I also want to assign a certain GPU at the beginning.

It seems like the issue was addressed by Alex. Is his answer not sufficient for your issue?

I tried to follow the answer. But it is still not clear how to use desired GPUs. Here’s my code that taken from Hugginface.

# Create Ray actors only for rank 0.
        if ("LOCAL_RANK" not in os.environ or os.environ["LOCAL_RANK"] == 0) and (
            "NODE_RANK" not in os.environ or os.environ["NODE_RANK"] == 0
        ):
            
            remote_cls = ray.remote(RayRetriever)
            named_actors = [
                remote_cls.options(name="retrieval_worker_{}".format(i)).remote() 
                for i in range(args.num_retrieval_workers)
            ]                    
        else:
            logger.info(
                "Getting named actors for NODE_RANK {}, LOCAL_RANK {}".format(
                    os.environ["NODE_RANK"], os.environ["LOCAL_RANK"]
                )
            )
            named_actors = [ray.get_actor("retrieval_worker_{}".format(i)) for i in range(args.num_retrieval_workers)]
     

This is how I initialized the ray cluster according to this huggingface code.

As you can see it get created with GPU device 0 and, kind of gets copied in other devices. Now I want to execute this worker in a preferred GPU. Let’s say GPU ranked 4th. How to do this?

Hey @Alex Can you answer his question?

Hey @Shamane_Siriwardhana , Ray makes the assumption that all GPU’s on a single node are identical. Can you provide some more context about why you want the actor to run on the 5th GPU specifically?

Due to a memory constrain. While I run the training process I want to calculate some embedding on inputs from a ray worker. Stuff happens inside this works have no effect in gradient graph. During the computation I want to make sure the RAY worker has only access to certain percentage of gpu memory in a given GPU.

In a two GPU setting I assigned the num_gps for 0.25. Then called the RAY worker on GPU ranked 1 ( secondary one). Programme throws me an error saying RAY worker has only one GPU.

Do you mind making a separate post/github issue about what you’re trying to do? In particular, an example may be useful here.

@Alex I already did please refer to this issue.