Hello! I was looking at the rllib code for initializing a recurrent model here: ray/rllib/models/torch/recurrent_net.py at ray-2.9.3 · ray-project/ray · GitHub
As far as I can see in this code however, the value function and the policy network both share the same recurrent layers and hidden states. How do I set up a completely separate value function that is learning its own recurrent hidden states and is not re-using any features from the policy network?