Hi, I’ve been working on a RNN style model with PPO. I’ve run into this error:
File "/home/xl3942/anaconda3/envs/CommAgent/lib/python3.8/site-packages/ray/rllib/policy/policy_template.py", line 303, in postprocess_trajectory
return postprocess_fn(self, sample_batch,
File "/home/xl3942/anaconda3/envs/CommAgent/lib/python3.8/site-packages/ray/rllib/evaluation/postprocessing.py", line 174, in compute_gae_for_sample_batch
last_r = policy._value(**input_dict)
File "/home/xl3942/anaconda3/envs/CommAgent/lib/python3.8/site-packages/ray/rllib/agents/ppo/ppo_torch_policy.py", line 220, in value
model_out, _ = self.model(input_dict)
File "/home/xl3942/anaconda3/envs/CommAgent/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() missing 2 required positional arguments: 'hidden_states' and 'seq_lens'
It seems to be caused by this definition of value() function here. It only passes the observations from a sample batch, but not the hidden states. It looks like only observations are available at this scope, not the hidden states.
I don’t think this is expected behavior? I can use the init state when hidden states are not provided, but I would prefer to have the actual states just for a better estimate of value.