What is the intended architecture of PPO vf_share_layers=False when using an LSTM

Thinking about this more, this would actually take a lot of changes to make this work:

  • our Models have a separate value_function method, which would - in this case - also have to take a state input list (it currently doesn’t).
  • the LSTM wrapper would need to know about the “location” of the separate value branch in the wrapped model (as it would have to call that branch separately). This would be ok for default models (under RLlib’s own control), but would become quite difficult for custom models.

I’m suggesting for now, you create a custom LSTM model that contains all required LSTM elements and specify that custom model in your config.
We are working on making the ModelV2 API simpler (or actually not needing it anymore at all). Instead, users will be able to register any number of models with the policy (e.g. “policy_model” and “value_model” and then call these directly in the algos). This is WIP, though.