Hi all! This is a question I posted on slack recently, which @sven1977 answered. Posting it here for broader reach.
I’ve been trying to understand how the config parameter vf_share_layers
affects learning, and I had a few questions. Would be grateful if someone could throw some light on any of these!
- While implementing a custom model , does toggling the value of
vf_share_layers
change learning behavior? If so, how? Asking because a Github search of the parameter showed me the parameter was used only in the existing models inside Rllib.
- When vf losses are high, how does disabling
vf_share_layers
alleviate the issue?
- And why does
vf_loss_coeff
need to be tuned when vf_share_layers
is true?
Here is the link to Sven’s answer, thanks again!
1 Like
Great question and yes, vf_share_layers is indeed a source for confusion. Yes, it’s only useful for non-custom (default) models, unless your custom model reads and respects this parameter, of course. RLlib’s default models (fcnet and visionnet) have the functionality to build both: a) a core net + policy-head + value-head or b) a policy-net and an independent value-net, depending on that parameter.
I think if vf_losses are too high, you should first try to set the vf_loss_coeff lower, but yes, even playing with vf_share_layers (when using default models) may help. There is also the option of making the stddev output nodes (for cont. actions) completely independent, learnable bias values that are outside the other action output: free_log_std:
# For DiagGaussian action distributions, make the second half of the model
# outputs floating bias variables instead of state-dependent. This only
# has an effect is using the default fully connected net.
"free_log_std": False,