Rllib workers ignoring GPU restrictions

I recently opened an issue on github before seeing that questions/discussions have been moved here, so I am reposting it here.

I am having issues with rllib workers using GPU workers for inference when I do not want them to, as this unwanted usage is causing GPU memory errors. In order to debug this issue I have tried setting gpus to 0, which does indeed solve the problem. With any other settings however, the workers seem to consume as many GPU resources as they want regardless of the config parameters. The issue is described farther below:

In both IMPALA and PPO setting using the pytorch framework

“num_gpus”: 0,
“num_gpus_per_worker”: 0,

stops ray from using utilizing the GPU at all (as expected) but setting

“num_gpus”: 0.01,
“num_gpus_per_worker”: 0,

Or any other fractional setting utilizes the entire GPU (expected it to use a very tiny portion of the GPU). I have tried using a custom pytorch model and explicitly putting it onto the CPU but some part of ray seems to put it back on the GPU anyway, completely ignoring any fractional restrictions. In this case it seems like the workers are ignoring the “num_gpus_per_worker” setting when num_gpus > 0. The full config is below:

config[‘IMPALA’] = {
“env”: ‘dmlab’,
“num_workers”: 7,
“num_gpus”: 0,
“num_gpus_per_worker”: 0,
“num_data_loader_buffers”: 1,
“lr”: 0.0002,
“entropy_coeff”: 0.00025,

"rollout_fragment_length": 100,
"replay_proportion": 1.0, 
"replay_buffer_num_slots": 1500, 

"model": {
    "custom_model": conv_lstm_model,
    "custom_model_config": {
        "device": "cpu",
        "cnn_shape": resolution,
    },
    "max_seq_len": 100,
},
"framework": "torch",

}

Yeah, I don’t think fractional GPUs are supported by RLlib (I never tried it myself, though).
Are you using tune.run or directly RLlib’s Trainer.train()?

When creating RolloutWorkers as @ray.remotes, it takes an int as the num_gpus arg.

Also, our policies copy their models to the GPU, depending on whether one is detectable when the Policy is created, so e.g. if you run w/o tune, RLlib will place your model on the GPU, even if you specify num_gpus(_per_worker)=0 (because torch detects that it’s there).

I agree, this is behavior is far from optimal or intuitive. I’ll create a git milestone and put this on our Tech Debt list for 2021 (~Q2). In the meantime, you could look at the TorchPolicy class (in the c’tor, there is an if block, which makes that decision) and maybe find a way to hard-code something more useful for you.

Thanks for the reply - I am indeed using tune.run. I will definitely look into using the TorchPolicy class to hard-code a temporary solution.