IMPALA with VTrace on multi-GPU with Pytorch

I’m trying to use IMPALA on multi-GPU with Pytorch, with VTrace activated.

But when I run my code, I’m having the following error :

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

After peaking into the source code, it seems that everything in is converted to cpu. But something must be missing, because of the error I receive !

Also, if I change the source code from here :

into :

def get_log_rhos(target_action_log_probs, behaviour_action_log_probs):
    """With the selected log_probs for multi-discrete actions of behavior
    and target policies we compute the log_rhos for calculating the vtrace."""
    t = torch.stack(target_action_log_probs).to("cpu")
    b = torch.stack(behaviour_action_log_probs)
    log_rhos = torch.sum(t - b, dim=0)
    return log_rhos

Then with this change it runs fine.

I’m using a custom Distribution class, can it come from here ? I had to modify this as well, because the actions received (when computing logprobs) were on CPU while my model was on GPU.

I didn’t need this modification while working with PPO.

Hey @Astariul , strange. Yeah, it could have to do with your custom action distribution, which moves things back on the GPU in the dist.logp call inside multi_log_probs_from_logits_and_actions. It’s probably better to have your change in then.

As background: v-trace calculations - as per the original IMPALA paper - should be done on the CPU as these are all sequential. That’s why we do this move inside the IMPALA loss - seemingly all of a sudden - from “device” to the “cpu” (no matter what “device” is), and then back to “device” after the v-trace computations.

1 Like