[Rllib] Does Centralized Critic support Multi-GPU?

Hey everyone, it seems to me that the centralized critic example provided in Rllib does not support multiple GPU, as I copy-pasta the central_value_function and gets the following error telling that the input to the central_value_function is not on the same CUDA device with the weight tensors:

  File "/data/USER/Projects/mate/nips_rllib/policy/ccppo_imrl_policy.py", line 141, in loss_with_central_critic_and_ImRL
    policy._central_value_out = model.value_function()
  File "/data/USER/Projects/mate/nips_rllib/policy/ccppo_imrl_policy.py", line 139, in <lambda>
    train_batch[TEAM_ACTION])
  File "/data/USER/Projects/mate/nips_rllib/policy/ccppo_imrl_policy.py", line 264, in central_value_function
    return torch.reshape(self.central_vf(input_), [-1])
  File "/home/USER/Miniconda3/envs/mate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/USER/Miniconda3/envs/mate/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/home/USER/Miniconda3/envs/mate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/USER/Miniconda3/envs/mate/lib/python3.7/site-packages/ray/rllib/models/torch/misc.py", line 160, in forward
    return self._model(x)
  File "/home/USER/Miniconda3/envs/mate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/USER/Miniconda3/envs/mate/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/home/USER/Miniconda3/envs/mate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/USER/Miniconda3/envs/mate/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/USER/Miniconda3/envs/mate/lib/python3.7/site-packages/torch/nn/functional.py", line 1755, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: Expected tensor for 'out' to have the same device as tensor for argument #2 'mat1'; but device 0 does not equal 1 (while checking arguments for addmm)

The program runs without error if I set num_gpus to 1, so I guess the value_function is on device cuda:0 while other sample_batches are on different devices.

def linear(input: Tensor, weight: Tensor, bias: Optional[Tensor] = None) -> Tensor:
...
    if input.device != weight.device:
        print(input.device, weight.device)
...
>>> (pid=103090) cuda:1 cuda:0
>>> (pid=103090) cuda:2 cuda:0
>>> (pid=103090) cuda:3 cuda:0

I don’t think centralized critic is supposed to use the value function based on this line of code here: ray/centralized_critic_models.py at 35ec91c4e04c67adc7123aa8461cf50923a316b4 · ray-project/ray · GitHub.

It uses central_vf instead.

Hi @michaelzhiluo, perhaps it’s my bad that I did not show the full context, but in the CC example provided in ray/rllib/examples/centralized_critic.py, the CentralizedValueMixin is invoked every time when loss_with_central_critic is called, and this Mixin initializes the attribute self.compute_central_vf which will computes centralized value from self.model.central_value_function.

@mickelliu Same issue happened, is there any progress on the issue?

Unfortunately I don’t have any progress on my end, but the environment that I am using right now is pretty lightweight (similar to MPE), therefore the utilization and memory usage on GPU is quite low, and sometimes I would initiate 4 trials on a single 2080ti GPU to fill it up. Personally I don’t feel the necessity for giving multi-GPU to a single trial, it’s not efficient in my case.