Creating N workers with P gpus results in each gpu worker using N/P fractional gpus if we provide num_gpus_per_worker. The learner GPU is by default present in the GPU - 0. Since the learner occupies the GPU 0 and workers are also present in GPU 0, this restricts the total number of workers in each GPU to be equal to the maximum possible number given the remaining memory in GPU 0 (after occupancy by learner)
Is there a way to disentangle the gpu usage for learners and workers?
You can disentangling it via create_env_on_driver=False. You have to have rollout workers in this case though.
Have you tried setting num_gpus_per_worker=N/P and create_env_on_driver=False and see what happens?
Resource allocation is automatic and I have experimented with CUDA_VISIBLE_DEVICES a while ago but there is not easy way to tell RLlib “Put learner load on GPU 0 and rest on GPU 1” or something similar if that is what you are looking for.
Thanks for the answer. The default value for “create_env_on_driver” is False and I believe rollout workers are also automatically created in the case of Apex DQN.
Rather than separate the use of GPUs for the learner and the worker, my goal is to make sure every GPU is used to its complete capacity. Currently if the learner uses a GPU the remaining memory on that GPU is the same amount of memory that can be used in every other GPU in the cluster. Is there a way to remove this equal allocation strategy?
Long answer:
In ray, ray actors hold on to the resources that they are created with for the entirety of their lifetimes. This means that if you create a learner that has a gpu and sampling workers that have gpus then they will hold those resources for the duration of the RLlib experiment.
The reason for that for the time being is that if actors aren’t guaranteed the resources necessary to run, then they will poll in a pending state, and its possible that the resources never become available or that there is a race condition on the resources. I think that in future versions of RLlib we could implement some smart logic to allow the transfer of gpus from samplers to the learner, but not in the near future.