Colocating workers and environments on the same GPU

smorad · January 19, 2021, 8:30pm

I am trying to use the Habitat simulator with rllib. Ray has really nice resource abstraction (e.g. gpus_per_worker=0.5).

With the Habitat environment, the GPU ID must be specified at environment initialization. As far as I can tell, each environment/simulator is run in a single actor. I would like to colocate the environment and the corresponding actor on the same GPU, e.g.:

GPU0: (env_instance0, actor0)
GPU1: (env_instance1, actor1)

The reasoning for this is to ensure I never OOM a GPU. I know that env_instance + actor is roughly 0.5 GPU’s worth of memory. If I allow the envs to select a random GPU, something like the following occurs:

t=0
GPU0 (4GB): (env_instance0:1GB, env_instance1:1GB, actor0:1GB)
...
t=1
GPU0 (4GB): (env_instance0:2GB, env_instance1:2GB, actor0:1GB) -- OOM!

smorad · January 19, 2021, 9:58pm

It seems ray exposes this functionally automatically with ray.get_gpu_ids() which returns GPU ids allocated for the specific worker/actor. Wrapping the environment in a class seems to do the trick.

import ray

class Wrapper(HabitatEnv):
   def __init__(self, config):
      [gpu_id] = ray.get_gpu_ids()
      config.SIMULATOR.HABITAT_SIM_V0.GPU_DEVICE_ID = gpu_id
      super().__init__(config)
      ...


class HabitatEnv:
....

Topic		Replies	Views
Ray tune with environment using GPU RLlib	2	814	February 8, 2021
RLlib in conjuncton with GPU env RLlib	2	375	March 29, 2023
Separating GPU's for learners and workers - Apex DQN RLlib	4	377	April 11, 2022
When num_workers=n , Why the total number of environments is n+1 RLlib	6	382	December 21, 2023
Error when trying to allocate GPU resources for multiple environments RLlib	1	91	June 26, 2024

Colocating workers and environments on the same GPU

Related Topics