Best way to have custom value state + LSTM

mannyv · May 3, 2021, 4:55pm

The value function look different when using the lstm wrapper than when not.

When you use the lstm wrapper there is only one set of layers going into the lstm model and the value layers use the final state coming from the lstm as the input. So instead of two heads going into the lstm you have two heads coming out of the lstm. You can see images I created of the two cases in this post What is the intended architecture of PPO vf_share_layers=False when using an LSTM.

I think this means you are going to have to add your own lstm. Luckily this is pretty straightforward with rllib. You do have to think how and where you want to do this seperation though. Are you going to add a layer before the lstm then feed the lstm a concatenation of a policy and value embedding layer? Are you going to treat them as pass through inputs and feed them after the lstm? Are you going to have two lstms? Will only your policy use an lstm and have the value function be just fc layers?

Topic		Replies	Views
What is the intended architecture of PPO vf_share_layers=False when using an LSTM RLlib	5	3426	June 24, 2023
Ppo add the lstm NN RLlib	6	2702	July 8, 2021
Centralized Critic with separate-layers LSTM to access hidden states in `post_process_trajectories` RLlib	1	691	January 31, 2022
Seperate networks for actor and critic in the ppo RLlib	2	798	April 14, 2022
LSTM Auto Wrapper RLlib	6	1585	October 2, 2021

Best way to have custom value state + LSTM

Related topics