Rllib workers ignoring GPU restrictions

mdeib · December 15, 2020, 7:13am

I recently opened an issue on github before seeing that questions/discussions have been moved here, so I am reposting it here.

I am having issues with rllib workers using GPU workers for inference when I do not want them to, as this unwanted usage is causing GPU memory errors. In order to debug this issue I have tried setting gpus to 0, which does indeed solve the problem. With any other settings however, the workers seem to consume as many GPU resources as they want regardless of the config parameters. The issue is described farther below:

In both IMPALA and PPO setting using the pytorch framework

“num_gpus”: 0,
“num_gpus_per_worker”: 0,

stops ray from using utilizing the GPU at all (as expected) but setting

“num_gpus”: 0.01,
“num_gpus_per_worker”: 0,

Or any other fractional setting utilizes the entire GPU (expected it to use a very tiny portion of the GPU). I have tried using a custom pytorch model and explicitly putting it onto the CPU but some part of ray seems to put it back on the GPU anyway, completely ignoring any fractional restrictions. In this case it seems like the workers are ignoring the “num_gpus_per_worker” setting when num_gpus > 0. The full config is below:

config[‘IMPALA’] = {
“env”: ‘dmlab’,
“num_workers”: 7,
“num_gpus”: 0,
“num_gpus_per_worker”: 0,
“num_data_loader_buffers”: 1,
“lr”: 0.0002,
“entropy_coeff”: 0.00025,

"rollout_fragment_length": 100,
"replay_proportion": 1.0, 
"replay_buffer_num_slots": 1500, 

"model": {
    "custom_model": conv_lstm_model,
    "custom_model_config": {
        "device": "cpu",
        "cnn_shape": resolution,
    },
    "max_seq_len": 100,
},
"framework": "torch",

}

sven1977 · December 22, 2020, 12:20pm

Yeah, I don’t think fractional GPUs are supported by RLlib (I never tried it myself, though).
Are you using tune.run or directly RLlib’s Trainer.train()?

When creating RolloutWorkers as @ray.remotes, it takes an int as the num_gpus arg.

Also, our policies copy their models to the GPU, depending on whether one is detectable when the Policy is created, so e.g. if you run w/o tune, RLlib will place your model on the GPU, even if you specify num_gpus(_per_worker)=0 (because torch detects that it’s there).

I agree, this is behavior is far from optimal or intuitive. I’ll create a git milestone and put this on our Tech Debt list for 2021 (~Q2). In the meantime, you could look at the TorchPolicy class (in the c’tor, there is an if block, which makes that decision) and maybe find a way to hard-code something more useful for you.

mdeib · December 22, 2020, 7:09pm

Thanks for the reply - I am indeed using tune.run. I will definitely look into using the TorchPolicy class to hard-code a temporary solution.

Topic		Replies	Views
Impala does not respect GPU allocation RLlib	4	614	February 26, 2021
Training and inference ONLY using GPUs and no CPUs RLlib	7	1903	April 12, 2021
GPU memory allocation exceeding configuration RLlib	2	812	August 25, 2021
How do I set GPU affinity of workers RLlib	17	2516	April 23, 2021
RL Trial Stuck at pending when trying to use Multi-GPU RLlib	2	1452	October 13, 2021

Rllib workers ignoring GPU restrictions

Related topics