Hello everybody,
I was trying to install the CUDA drivers on my laptop to run RLlib on GPU and detected a strange behavior. I have not found any topic on the matter so if there is another one let me know. I am using Ray 1.1.0. You can reproduce the issue with the following example:
import ray
from ray.rllib.agents.registry import get_agent_class
from ray.tune.logger import pretty_print
ray.init()
training_iterations = 10
evaluation_steps = 300
method = 'DQN'
config = {
"log_level": "WARN",
"num_workers": 3,
"num_envs_per_worker": 8,
"dueling": True,
"double_q": True,
"train_batch_size": 128,
"model": {"fcnet_hiddens": [128, 64]},
"env": "CartPole-v0",
"num_gpus_per_worker": 0,
"num_gpus": 0
}
cls = get_agent_class(method)
trainer = cls(config=config)
for i in range(1000):
result = trainer.train()
print(pretty_print(result))
When the CUDA drivers are installed around 986 samples/s can be achieved as seen in the image below:
If I delete the cudnn64_8.dll file which is needed by tensorflow to turn GPUs on then 2175 samples/s can be achieved:
The weirdest part is that in the configuration I explicitly configure RLlib to not use the GPU. Is there any other parameter that I am missing? Is this considered normal behavior?