How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi,
I have a custom environment and model which work fine for PPO
and IMAPALA
. I am trying to use the same setup for ApexDQN
, but I am having some trouble.
When I instantiate the ApexTrainer
class, I get the following error. It seems to be coming from within model.get_q_value_distributions()
, which takes the output of model()
.
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 613, in __init__
self._build_policy_map(
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1803, in _build_policy_map
self.policy_map.create_policy(
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/policy/policy_map.py", line 123, in create_policy
self[policy_id] = create_policy_for_framework(
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 80, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/policy/policy_template.py", line 330, in __init__
self._initialize_loss_from_dummy_batch(
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/policy/policy.py", line 1053, in _initialize_loss_from_dummy_batch
actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/policy/torch_policy.py", line 320, in compute_actions_from_input_dict
return self._compute_action_helper(
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper
return func(self, *a, **k)
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/policy/torch_policy.py", line 953, in _compute_action_helper
dist_inputs, dist_class, state_out = self.action_distribution_fn(
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/algorithms/dqn/dqn_torch_policy.py", line 234, in get_distribution_inputs_and_class
q_vals = compute_q_values(
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/algorithms/dqn/dqn_torch_policy.py", line 424, in compute_q_values
(action_scores, logits, probs_or_logits) = model.get_q_value_distributions(
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/algorithms/dqn/dqn_torch_model.py", line 146, in get_q_value_distributions
action_scores = self.advantage_module(model_out)
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/ray/rllib/models/torch/misc.py", line 169, in forward
return self._model(x)
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/zciccwf/py36/envs/ddls/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x17 and 16x256)
The problem is, I am not sure what model.get_q_value_distributions()
method is being called inside compute_q_values
, because it does not seem to be the custom model I have built for PPO
and IMPALA
(when I try to print from within my custom model’s get_q_value_distributions()
nothing is printed, so I assume it is never called) and which I am now passing into the ApexTrainer
config.
Does anyone know what might be causing this 32x17
and 16x256
mismatch? model()
is outputting 17 dimensions because the action_space.n
of my environment is 17, but for some reason model.get_q_value_distributions()
expects 16 dimensions. I am not sure how to change what model.get_q_value_distributions()
expects or what model is being used when this method is called.
Thanks in advance for your help!