What is the intended architecture of PPO vf_share_layers=False when using an LSTM

mannyv · April 6, 2021, 4:54pm

I am wondering what the intended architecture should look like when using ppo with vf_share_layers = False and an lstm. Based on the comment in the ppo config and reading the documentation I would have thought that if it is set to false then there would be no sharing of the layers above the lstm.

When I was looking at the LSTMWrapper code I realized that they would be sharing those layers even when vf_share_layers=False.

Here is a picture of the network with use_lstm=False. To make it more clear I have add 3 to the size of the _value_branch_separate model. As you can see they are not shared. The policy is on the left (size 256) and the vf is on the right (size 259).

Here is a picture of the network with use_lstm=True. As you can see in this one the layers going into the lstm are shared. I could not get the order the same so in this case the value function is on the left and the policy is on the right.

My question then is whether this is the intended behavior and if so, can it be documented somewhere?

sven1977 · April 7, 2021, 9:15am

Hey @mannyv , thanks for raising this. It’s indeed a bug. When wrapping your vf_share_layers=False model with an LSTM, all layers will be shared and the setting is ignored.
I’ll create an issue and provide a fix.

sven1977 · April 8, 2021, 9:02am

Thinking about this more, this would actually take a lot of changes to make this work:

our Models have a separate value_function method, which would - in this case - also have to take a state input list (it currently doesn’t).
the LSTM wrapper would need to know about the “location” of the separate value branch in the wrapped model (as it would have to call that branch separately). This would be ok for default models (under RLlib’s own control), but would become quite difficult for custom models.

I’m suggesting for now, you create a custom LSTM model that contains all required LSTM elements and specify that custom model in your config.
We are working on making the ModelV2 API simpler (or actually not needing it anymore at all). Instead, users will be able to register any number of models with the policy (e.g. “policy_model” and “value_model” and then call these directly in the algos). This is WIP, though.

mickelliu · January 15, 2022, 11:34am

@sven1977 Hi Sven, I think this is still an open issue, right? So if I want to use separate branches for value and actor networks I would have to write a custom model, correct?

funk_Jz · June 17, 2023, 9:04am

@mannyv Very good model architecture visualization. I have done some investigation work but experiments on rllib met problem with framework(‘torch’). It seems like you used torchviz to do visualization, right? Could you share some information about how to do this? Thanks a lot.

funk_Jz · June 24, 2023, 9:02am

I have tried some ways to visualize compute graph. Finally I think there are 2 ways for user to keep model and visualize with netron:

With onnx format, but operater symbol is too lower. A little tough to read model structure for human.
```
algo = algo_config.build()
policy = algo.get_policy() 
policy.export_model(export_dir, onnx=11)
```

With torchscript.

# Provide dummy state inputs if not an RNN (torch cannot jit with
# returned empty internal states list).
policy._lazy_tensor_dict(policy._dummy_batch)

if "state_in_0" not in policy._dummy_batch:
    policy._dummy_batch["state_in_0"] = policy._dummy_batch[
        SampleBatch.SEQ_LENS
    ] = np.array([1.0])
seq_lens = policy._dummy_batch[SampleBatch.SEQ_LENS]

state_ins = []
i = 0
while "state_in_{}".format(i) in policy._dummy_batch:
    state_ins.append(policy._dummy_batch["state_in_{}".format(i)])
    i += 1

mod = torch.jit.trace(policy.model, dummy_inputs)
filename = os.path.join(export_dir, "torchscript_model.pt")

Topic		Replies	Views
Ppo add the lstm NN RLlib	6	2754	July 8, 2021
Best way to have custom value state + LSTM RLlib	9	3138	April 10, 2022
RLlib: vf_share_layers defined in multiple places RLlib	3	464	January 13, 2021
Understanding impact of vf_share_layers over loss function calculation RLlib	5	950	April 9, 2021
Purpose and working of the vf_share_layers parameter RLlib	1	596	August 18, 2021

What is the intended architecture of PPO vf_share_layers=False when using an LSTM

Related topics