I recently opened an issue on github before seeing that questions/discussions have been moved here, so I am reposting it here.
I am having issues with rllib workers using GPU workers for inference when I do not want them to, as this unwanted usage is causing GPU memory errors. In order to debug this issue I have tried setting gpus to 0, which does indeed solve the problem. With any other settings however, the workers seem to consume as many GPU resources as they want regardless of the config parameters. The issue is described farther below:
In both IMPALA and PPO setting using the pytorch framework
“num_gpus”: 0,
“num_gpus_per_worker”: 0,
stops ray from using utilizing the GPU at all (as expected) but setting
“num_gpus”: 0.01,
“num_gpus_per_worker”: 0,
Or any other fractional setting utilizes the entire GPU (expected it to use a very tiny portion of the GPU). I have tried using a custom pytorch model and explicitly putting it onto the CPU but some part of ray seems to put it back on the GPU anyway, completely ignoring any fractional restrictions. In this case it seems like the workers are ignoring the “num_gpus_per_worker” setting when num_gpus > 0. The full config is below:
config[‘IMPALA’] = {
“env”: ‘dmlab’,
“num_workers”: 7,
“num_gpus”: 0,
“num_gpus_per_worker”: 0,
“num_data_loader_buffers”: 1,
“lr”: 0.0002,
“entropy_coeff”: 0.00025,
"rollout_fragment_length": 100,
"replay_proportion": 1.0,
"replay_buffer_num_slots": 1500,
"model": {
"custom_model": conv_lstm_model,
"custom_model_config": {
"device": "cpu",
"cnn_shape": resolution,
},
"max_seq_len": 100,
},
"framework": "torch",
}