Compute Action with LSTM

I am using ray with normal layers and it works very well, but I cannot find in the documentation how to make predictions with LSTM layers.

action = trainer.compute_single_action(obs)

I always get this error:

    action = self.trainer.compute_single_action(obs)
  File "c:\Test\.env\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 1140, in compute_single_action
    action, state, extra = policy.compute_single_action(
  File "c:\Test\.env\lib\site-packages\ray\rllib\policy\policy.py", line 327, in compute_single_action
    out = self.compute_actions_from_input_dict(
  File "c:\Test\.env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 483, in compute_actions_from_input_dict
    return self._compute_action_helper(
  File "c:\Test\.env\lib\site-packages\ray\rllib\utils\threading.py",
line 24, in wrapper
    return func(self, *a, **k)
  File "c:\Test\.env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1016, in _compute_action_helper
    dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
  File "c:\Test\.env\lib\site-packages\ray\rllib\models\modelv2.py", line 259, in __call__
    res = self.forward(restored, state or [], seq_lens)
  File "c:\Test\.env\lib\site-packages\ray\rllib\models\torch\recurrent_net.py", line 207, in forward
    assert seq_lens is not None
AssertionError
  • High: It blocks me to complete my task.

Hi @evo11x,

This code snippet should help I think.

2 Likes

it works, but now the returned action is a tuple of 3 lists instead of numpy array of 2 actions.

(array([-1. , 0...e=float32), [array([-0.50677115, ...e=float32), array([-0.55859864, ...e=float32)], {'action_dist_inputs': array([-0.30818474, ...e=float32), 'action_prob': 0.032917757, 'action_logp': -3.413743})

action[0] looks like my actions

but what are the other values from the action ?

Great that it works.
The outputs are: (action, new_state, extra_outputs)

1 Like