[Rllib] compute_single_action() with an LSTM-PPO trainer fails

mannyv · February 3, 2023, 2:20pm

Try this example:

ray-project/ray/blob/b31343a8afcafef0fbcf7e81f102aa947870265f/rllib/examples/cartpole_lstm.py#L103


      
          # >> env = StatelessCartPole()
          # >> obs, info = env.reset()
          # >>
          # >> # range(2) b/c h- and c-states of the LSTM.
          # >> init_state = state = [
          # ..     np.zeros([lstm_cell_size], np.float32) for _ in range(2)
          # .. ]
          # >> prev_a = 0
          # >> prev_r = 0.0
          # >>
          # >> while True:
          # >>     a, state_out, _ = algo.compute_single_action(
          # ..         obs, state, prev_a, prev_r)
          # >>     obs, reward, done, truncated, _ = env.step(a)
          # >>     if done:
          # >>         obs, info = env.reset()
          # >>         state = init_state
          # >>         prev_a = 0
          # >>         prev_r = 0.0
          # >>     else:
          # >>         state = state_out

Topic		Replies	Views
Compute Action with LSTM RLlib	4	873	May 16, 2024
LSTM wrapper giving issue when used with trainer.compute_single_action RLlib	9	956	April 25, 2022
[rllib] Problem running compute_single_action from PPO restored checkpoint Checkpointing, Restoring	1	337	December 13, 2023
LSTM with trainer.compute_single_action broken again RLlib	12	1036	May 17, 2022
Compute single action with LSTM RLlib	0	61	May 21, 2024

[Rllib] compute_single_action() with an LSTM-PPO trainer fails

Related topics