Compute Action with LSTM

I am using ray with normal layers and it works very well, but I cannot find in the documentation how to make predictions with LSTM layers.

action = trainer.compute_single_action(obs)

I always get this error:

    action = self.trainer.compute_single_action(obs)
  File "c:\Test\.env\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 1140, in compute_single_action
    action, state, extra = policy.compute_single_action(
  File "c:\Test\.env\lib\site-packages\ray\rllib\policy\policy.py", line 327, in compute_single_action
    out = self.compute_actions_from_input_dict(
  File "c:\Test\.env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 483, in compute_actions_from_input_dict
    return self._compute_action_helper(
  File "c:\Test\.env\lib\site-packages\ray\rllib\utils\threading.py",
line 24, in wrapper
    return func(self, *a, **k)
  File "c:\Test\.env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1016, in _compute_action_helper
    dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
  File "c:\Test\.env\lib\site-packages\ray\rllib\models\modelv2.py", line 259, in __call__
    res = self.forward(restored, state or [], seq_lens)
  File "c:\Test\.env\lib\site-packages\ray\rllib\models\torch\recurrent_net.py", line 207, in forward
    assert seq_lens is not None
AssertionError
  • High: It blocks me to complete my task.

Hi @evo11x,

This code snippet should help I think.

2 Likes

it works, but now the returned action is a tuple of 3 lists instead of numpy array of 2 actions.

(array([-1. , 0...e=float32), [array([-0.50677115, ...e=float32), array([-0.55859864, ...e=float32)], {'action_dist_inputs': array([-0.30818474, ...e=float32), 'action_prob': 0.032917757, 'action_logp': -3.413743})

action[0] looks like my actions

but what are the other values from the action ?

Great that it works.
The outputs are: (action, new_state, extra_outputs)

1 Like

Hi! I had the same issue and I solved it with the code exposed here, but a new error appeared and I don’t know how to fix it. The complete error message is:

Traceback (most recent call last):
  File "c:\Users\grhen\Documents\GitHub\eprllib_experiments\active_climatization\init_experiment\test_trained_OnOffHVAC.py", line 110, in <module>
    init_drl_evaluation(
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\eprllib\postprocess\marl_init_evaluation.py", line 88, in init_drl_evaluation
    action, state_out, _ = policy['shared_policy'].compute_single_action(obs=obs_dict[agent], state=state)
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\policy\policy.py", line 552, in compute_single_action
    out = self.compute_actions_from_input_dict(
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 557, in compute_actions_from_input_dict
    return self._compute_action_helper(
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\utils\threading.py", line 24, in wrapper
    return func(self, *a, **k)
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1260, in _compute_action_helper
    dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\models\modelv2.py", line 255, in __call__
    res = self.forward(restored, state or [], seq_lens)
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\models\torch\recurrent_net.py", line 247, in forward
    torch.reshape(input_dict[SampleBatch.PREV_REWARDS].float(), [-1, 1])
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\policy\sample_batch.py", line 950, in __getitem__
    value = dict.__getitem__(self, key)
KeyError: 'prev_rewards'

Can you provide me with some help?
Thanks!
Germán

PS: I’m using ray version 2.20.0 on Windows 11