IMPALA with VTrace on multi-GPU with Pytorch

Astariul · June 17, 2021, 8:33am

I’m trying to use IMPALA on multi-GPU with Pytorch, with VTrace activated.

But when I run my code, I’m having the following error :

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

After peaking into the source code, it seems that everything in vtrace_torch.py is converted to cpu. But something must be missing, because of the error I receive !

Also, if I change the source code from here :

github.com

ray-project/ray/blob/1df19a04fe2b9960ba364cbcc8da996cb214bfa8/rllib/agents/impala/vtrace_torch.py#L342-L348

    
      
          def get_log_rhos(target_action_log_probs, behaviour_action_log_probs):
              """With the selected log_probs for multi-discrete actions of behavior
              and target policies we compute the log_rhos for calculating the vtrace."""
              t = torch.stack(target_action_log_probs)
              b = torch.stack(behaviour_action_log_probs)
              log_rhos = torch.sum(t - b, dim=0)
              return log_rhos

into :

def get_log_rhos(target_action_log_probs, behaviour_action_log_probs):
    """With the selected log_probs for multi-discrete actions of behavior
    and target policies we compute the log_rhos for calculating the vtrace."""
    t = torch.stack(target_action_log_probs).to("cpu")
    b = torch.stack(behaviour_action_log_probs)
    log_rhos = torch.sum(t - b, dim=0)
    return log_rhos

Then with this change it runs fine.

I’m using a custom Distribution class, can it come from here ? I had to modify this as well, because the actions received (when computing logprobs) were on CPU while my model was on GPU.

I didn’t need this modification while working with PPO.

sven1977 · June 29, 2021, 2:14pm

Hey @Astariul , strange. Yeah, it could have to do with your custom action distribution, which moves things back on the GPU in the dist.logp call inside multi_log_probs_from_logits_and_actions. It’s probably better to have your change in then.

As background: v-trace calculations - as per the original IMPALA paper - should be done on the CPU as these are all sequential. That’s why we do this move inside the IMPALA loss - seemingly all of a sudden - from “device” to the “cpu” (no matter what “device” is), and then back to “device” after the v-trace computations.

Topic		Replies	Views
RLib on multiple GPUs with framework tf2 RLlib	3	576	April 20, 2023
RLlib IMPALA multi GPU performance Configure Algorithm, Training, Evaluation, Scaling	3	594	March 19, 2023
RL Trial Stuck at pending when trying to use Multi-GPU RLlib	2	1423	October 13, 2021
Impala seems inefficient (slow), how to properly initialize? RLlib	1	267	September 22, 2022
IMPALA agent not working RLlib	1	320	January 9, 2023

IMPALA with VTrace on multi-GPU with Pytorch

Related topics