What is the intended architecture of PPO vf_share_layers=False when using an LSTM

sven1977 · April 8, 2021, 9:02am

Thinking about this more, this would actually take a lot of changes to make this work:

our Models have a separate value_function method, which would - in this case - also have to take a state input list (it currently doesn’t).
the LSTM wrapper would need to know about the “location” of the separate value branch in the wrapped model (as it would have to call that branch separately). This would be ok for default models (under RLlib’s own control), but would become quite difficult for custom models.

I’m suggesting for now, you create a custom LSTM model that contains all required LSTM elements and specify that custom model in your config.
We are working on making the ModelV2 API simpler (or actually not needing it anymore at all). Instead, users will be able to register any number of models with the policy (e.g. “policy_model” and “value_model” and then call these directly in the algos). This is WIP, though.

Topic		Replies	Views
Ppo add the lstm NN RLlib	6	2723	July 8, 2021
Best way to have custom value state + LSTM RLlib	9	3104	April 10, 2022
RLlib: vf_share_layers defined in multiple places RLlib	3	446	January 13, 2021
Understanding impact of vf_share_layers over loss function calculation RLlib	5	930	April 9, 2021
Purpose and working of the vf_share_layers parameter RLlib	1	576	August 18, 2021

What is the intended architecture of PPO vf_share_layers=False when using an LSTM

Related topics