Colocating workers and environments on the same GPU

I am trying to use the Habitat simulator with rllib. Ray has really nice resource abstraction (e.g. gpus_per_worker=0.5).

With the Habitat environment, the GPU ID must be specified at environment initialization. As far as I can tell, each environment/simulator is run in a single actor. I would like to colocate the environment and the corresponding actor on the same GPU, e.g.:

GPU0: (env_instance0, actor0)
GPU1: (env_instance1, actor1)

The reasoning for this is to ensure I never OOM a GPU. I know that env_instance + actor is roughly 0.5 GPU’s worth of memory. If I allow the envs to select a random GPU, something like the following occurs:

t=0
GPU0 (4GB): (env_instance0:1GB, env_instance1:1GB, actor0:1GB)
...
t=1
GPU0 (4GB): (env_instance0:2GB, env_instance1:2GB, actor0:1GB) -- OOM!

It seems ray exposes this functionally automatically with ray.get_gpu_ids() which returns GPU ids allocated for the specific worker/actor. Wrapping the environment in a class seems to do the trick.

import ray

class Wrapper(HabitatEnv):
   def __init__(self, config):
      [gpu_id] = ray.get_gpu_ids()
      config.SIMULATOR.HABITAT_SIM_V0.GPU_DEVICE_ID = gpu_id
      super().__init__(config)
      ...


class HabitatEnv:
....
1 Like