Compute Action with LSTM

evo11x · October 19, 2022, 9:28am

I am using ray with normal layers and it works very well, but I cannot find in the documentation how to make predictions with LSTM layers.

action = trainer.compute_single_action(obs)

I always get this error:

    action = self.trainer.compute_single_action(obs)
  File "c:\Test\.env\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 1140, in compute_single_action
    action, state, extra = policy.compute_single_action(
  File "c:\Test\.env\lib\site-packages\ray\rllib\policy\policy.py", line 327, in compute_single_action
    out = self.compute_actions_from_input_dict(
  File "c:\Test\.env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 483, in compute_actions_from_input_dict
    return self._compute_action_helper(
  File "c:\Test\.env\lib\site-packages\ray\rllib\utils\threading.py",
line 24, in wrapper
    return func(self, *a, **k)
  File "c:\Test\.env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1016, in _compute_action_helper
    dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
  File "c:\Test\.env\lib\site-packages\ray\rllib\models\modelv2.py", line 259, in __call__
    res = self.forward(restored, state or [], seq_lens)
  File "c:\Test\.env\lib\site-packages\ray\rllib\models\torch\recurrent_net.py", line 207, in forward
    assert seq_lens is not None
AssertionError

High: It blocks me to complete my task.

mannyv · October 19, 2022, 10:24am

Hi @evo11x,

This code snippet should help I think.

github.com

ray-project/ray/blob/c0ec20dc3a3f733fda85dcf9cc71f83d51132276/rllib/examples/custom_rnn_model.py#L101-L112


      
          # >> init_state = state = [
          # ..     np.zeros([lstm_cell_size], np.float32) for _ in range(2)
          # .. ]
          # >>
          # >> while True:
          # >>     a, state_out, _ = trainer.compute_single_action(obs, state)
          # >>     obs, reward, done, _ = env.step(a)
          # >>     if done:
          # >>         obs = env.reset()
          # >>         state = init_state
          # >>     else:
          # >>         state = state_out

evo11x · October 19, 2022, 12:29pm

it works, but now the returned action is a tuple of 3 lists instead of numpy array of 2 actions.

(array([-1. , 0...e=float32), [array([-0.50677115, ...e=float32), array([-0.55859864, ...e=float32)], {'action_dist_inputs': array([-0.30818474, ...e=float32), 'action_prob': 0.032917757, 'action_logp': -3.413743})

action[0] looks like my actions

but what are the other values from the action ?

mannyv · October 19, 2022, 3:10pm

Great that it works.
The outputs are: (action, new_state, extra_outputs)

hermmanhender · May 16, 2024, 7:35am

Hi! I had the same issue and I solved it with the code exposed here, but a new error appeared and I don’t know how to fix it. The complete error message is:

Traceback (most recent call last):
  File "c:\Users\grhen\Documents\GitHub\eprllib_experiments\active_climatization\init_experiment\test_trained_OnOffHVAC.py", line 110, in <module>
    init_drl_evaluation(
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\eprllib\postprocess\marl_init_evaluation.py", line 88, in init_drl_evaluation
    action, state_out, _ = policy['shared_policy'].compute_single_action(obs=obs_dict[agent], state=state)
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\policy\policy.py", line 552, in compute_single_action
    out = self.compute_actions_from_input_dict(
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 557, in compute_actions_from_input_dict
    return self._compute_action_helper(
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\utils\threading.py", line 24, in wrapper
    return func(self, *a, **k)
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1260, in _compute_action_helper
    dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\models\modelv2.py", line 255, in __call__
    res = self.forward(restored, state or [], seq_lens)
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\models\torch\recurrent_net.py", line 247, in forward
    torch.reshape(input_dict[SampleBatch.PREV_REWARDS].float(), [-1, 1])
  File "c:\Users\grhen\anaconda3\envs\eprllib1-1-1\lib\site-packages\ray\rllib\policy\sample_batch.py", line 950, in __getitem__
    value = dict.__getitem__(self, key)
KeyError: 'prev_rewards'

Can you provide me with some help?
Thanks!
Germán

PS: I’m using ray version 2.20.0 on Windows 11

Topic		Replies	Views
Compute single action with LSTM RLlib	0	61	May 21, 2024
[Rllib] compute_single_action() with an LSTM-PPO trainer fails RLlib	1	972	February 3, 2023
Compute_single_action randomly errors without changing input RLlib	0	242	October 16, 2023
LSTM wrapper giving issue when used with trainer.compute_single_action RLlib	9	956	April 25, 2022
How to deploy a trained Ray RLlib PPO policy/model in multi-agent-case? RLlib	5	806	November 10, 2021

Compute Action with LSTM

Related topics