Best way to have custom value state + LSTM

Hi @mickelliu,

I saw you also posted a question on this topic

You are right that there was no resolution to that issue and the reason is because of what you brought up here. There is no good way that I (and I think @sven1977, though I don’t want to speak for him) have found to carry around state for the value network.

The value network is not used during rollouts so you will not have states for each transition to use during training.

Two thoughts,

  1. You could construct your model to retrieve the value on every call to forward. Then you would have states for rollouts too but you would also need to modify get_initial_state to return the combined states for both branches and break the states up in your forward method. I am not certain if this would be compatible with the default handling of states in the library.

  2. You could implement something like burnin used in R2D2 and pass in some number of zeroed states to warm up the lstm.

If you find something that works please do let us know.


1 Like