I’m trying to use IMPALA on multi-GPU with Pytorch, with VTrace activated.
But when I run my code, I’m having the following error :
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
After peaking into the source code, it seems that everything in
vtrace_torch.py is converted to cpu. But something must be missing, because of the error I receive !
Also, if I change the source code from here :
def get_log_rhos(target_action_log_probs, behaviour_action_log_probs): """With the selected log_probs for multi-discrete actions of behavior and target policies we compute the log_rhos for calculating the vtrace.""" t = torch.stack(target_action_log_probs).to("cpu") b = torch.stack(behaviour_action_log_probs) log_rhos = torch.sum(t - b, dim=0) return log_rhos
Then with this change it runs fine.
I’m using a custom Distribution class, can it come from here ? I had to modify this as well, because the actions received (when computing logprobs) were on CPU while my model was on GPU.
I didn’t need this modification while working with PPO.