Best way to have custom value state + LSTM

nathanlct · May 3, 2021, 4:08pm

Hi,

I’m doing some training using PPO, and I would like the value function to have additional states that the policy doesn’t have.

By default, the FullyConnectedNetwork looks like this (14 should be 7 here):

I slightly modify it by splitting the input layer, this way I can give additional states to the value function input. I have a state space of size 14 and split it in 2, passing the first 7 to policy_obs and the 7 last to value_obs.

This seems to work fine, however I’d like to add an lstm on top of it, and the use_lstm wrapper doesn’t work by default (cf. error below). I was wondering if there is an easier way to do this (giving additional states to the VF) that I didn’t find? That would save me from having to add an lstm manually to my custom net. Or did I miss something with the use_lstm?

ValueError: Input 0 of layer fc_value_1 is incompatible with the layer: expected axis -1 of input shape to have value 14 but received input with shape [None, 7]

Thanks!

mannyv · May 3, 2021, 4:55pm

Hi @nathanlct,

The value function look different when using the lstm wrapper than when not.

When you use the lstm wrapper there is only one set of layers going into the lstm model and the value layers use the final state coming from the lstm as the input. So instead of two heads going into the lstm you have two heads coming out of the lstm. You can see images I created of the two cases in this post What is the intended architecture of PPO vf_share_layers=False when using an LSTM.

I think this means you are going to have to add your own lstm. Luckily this is pretty straightforward with rllib. You do have to think how and where you want to do this seperation though. Are you going to add a layer before the lstm then feed the lstm a concatenation of a policy and value embedding layer? Are you going to treat them as pass through inputs and feed them after the lstm? Are you going to have two lstms? Will only your policy use an lstm and have the value function be just fc layers?

smorad · May 11, 2021, 10:03am

It’s quite straightforward, but you will have to implement your own model. See ray/rnn_model.py at 4795048f1b3779658e8b0ffaa05b1eb61914bc60 · ray-project/ray · GitHub

mickelliu · January 15, 2022, 12:34pm

hi @mannyv, if I want two separate LSTMs one for the value and one for the actor, how do I manage the hidden states of the value branch? I only saw examples showing managing hidden states of one LSTM.

Thanks.

mannyv · January 15, 2022, 1:04pm

Hi @mickelliu,

I saw you also posted a question on this topic

You are right that there was no resolution to that issue and the reason is because of what you brought up here. There is no good way that I (and I think @sven1977, though I don’t want to speak for him) have found to carry around state for the value network.

The value network is not used during rollouts so you will not have states for each transition to use during training.

Two thoughts,

You could construct your model to retrieve the value on every call to forward. Then you would have states for rollouts too but you would also need to modify get_initial_state to return the combined states for both branches and break the states up in your forward method. I am not certain if this would be compatible with the default handling of states in the library.
You could implement something like burnin used in R2D2 and pass in some number of zeroed states to warm up the lstm.

If you find something that works please do let us know.

mickelliu · January 15, 2022, 1:12pm

Hi @mannyv, thanks for your quick response!

I have an idea and I am just about to run my custom model to see if this will work.
Would it be ok if I just concat my value_hidden_states with action_hidden_states and send them together away upon each forward_run call?

More concretely:

# Two separate LSTMs with two separate branches...

            self.actor_layers[-1]._model[0].weight.new(1, self.lstm_state_size).zero_().squeeze(0),
            self.actor_layers[-1]._model[0].weight.new(1, self.lstm_state_size).zero_().squeeze(0),
            self.value_layers[-1]._model[0].weight.new(1, self.lstm_state_size).zero_().squeeze(0),
            self.value_layers[-1]._model[0].weight.new(1, self.lstm_state_size).zero_().squeeze(0)

    @override(ModelV2)
    def value_function(self):
        assert self._values is not None, "must call forward() first"
        return torch.reshape(self.value_branch(self._values), [-1])

    @override(TorchRNN)
    def forward_rnn(self, inputs, state, seq_lens):

        self._features, [h1, c1] = self.actor_lstm(
            self.actor_layers(inputs), [torch.unsqueeze(state[0], 0),
                torch.unsqueeze(state[1], 0)])
        action_out = self.action_branch(self._features)

        self._values, [h2, c2] = self.value_lstm(
            self.value_layers(inputs), [torch.unsqueeze(state[2], 0),
                torch.unsqueeze(state[3], 0)])

        return action_out, [torch.squeeze(h1, 0), torch.squeeze(c1, 0), torch.squeeze(h2, 0), torch.squeeze(c2, 0)]

If this doesn’t work out, should I change my view_requirement or anything?

mannyv · January 15, 2022, 1:49pm

@mickelliu,

That is what I was trying to describe as approach 1.

You will need to save the value states as a member in forward for them to be available in the call to value. Same as you do with _values.

I do not think you will need to modify the ViewRequirements.

mickelliu · January 15, 2022, 1:51pm

Thanks @mannyv, it seems that the code didn’t pop any error. I will provide you with an update later.

mickelliu · January 16, 2022, 7:46am

Hi @mannyv,

I ran the separate branch LSTM model for 10+ hours and it works pretty well, I didn’t see any problem yet. For our custom env it’s always been the case that separate value and actor network will yield better performance.

dezhi · April 10, 2022, 7:05pm

Hi @mickelliu,

Can you please share your separate branch LSTM model here? I think i am missing something in writing the code…

Topic		Replies	Views
Ppo add the lstm NN RLlib	6	2647	July 8, 2021
What is the intended architecture of PPO vf_share_layers=False when using an LSTM RLlib	5	3375	June 24, 2023
PPO+LSTM consistently not working Configure Algorithm, Training, Evaluation, Scaling	1	208	April 11, 2025
Centralized Critic with separate-layers LSTM to access hidden states in `post_process_trajectories` RLlib	1	683	January 31, 2022
Using custom neural network in RLlib RLlib	5	1192	December 22, 2022

Best way to have custom value state + LSTM

Related topics