Hi all! This is a question I posted on slack recently, which @sven1977 answered. Posting it here for broader reach.
I’ve been trying to understand how the config parameter vf_share_layers affects learning, and I had a few questions. Would be grateful if someone could throw some light on any of these!
- While implementing a custom model , does toggling the value of 
vf_share_layers change learning behavior? If so, how? Asking because a Github search of the parameter showed me the parameter was used only in the existing models inside Rllib. 
- When vf losses are high, how does disabling 
vf_share_layers alleviate the issue? 
- And why does 
vf_loss_coeff need to be tuned when vf_share_layers is true? 
Here is the link to Sven’s answer, thanks again!
             
            
              
              
              1 Like
            
            
           
          
            
            
              Great question and yes, vf_share_layers is indeed a source for confusion. Yes, it’s only useful for non-custom (default) models, unless your custom model reads and respects this parameter, of course. RLlib’s default models (fcnet and visionnet) have the functionality to build both: a) a core net + policy-head + value-head or b) a policy-net and an independent value-net, depending on that parameter.
I think if vf_losses are too high, you should first try to set the vf_loss_coeff lower, but yes, even playing with vf_share_layers (when using default models) may help. There is also the option of making the stddev output nodes (for cont. actions) completely independent, learnable bias values that are outside the other action output: free_log_std:
    # For DiagGaussian action distributions, make the second half of the model
    # outputs floating bias variables instead of state-dependent. This only
    # has an effect is using the default fully connected net.
    "free_log_std": False,