Hi,
I’m working on a policy that has a larger encoder. For PPO I use a shared encoder and separate vf and policy heads, which is mostly standard as far as I know:
@override(TorchModelV2)
def forward(self, input_dict, state, seq_lens):
self._features = self._encoder(input_dict["obs"])
logits = self._policy(self._features)
return logits, state
@override(TorchModelV2)
def value_function(self):
assert self._features is not None, "must call forward() first"
return self._vf(self._features).squeeze(1)
I wanted to try to apply SAC on the problem, but that seems to directly advise to not share parameters between the Q-net and policy-net (see here:
@override(TorchModelV2)
def forward(
self,
input_dict: Dict[str, TensorType],
state: List[TensorType],
seq_lens: TensorType,
) -> (TensorType, List[TensorType]):
"""The common (Q-net and policy-net) forward pass.
NOTE: It is not(!) recommended to override this method as it would
introduce a shared pre-network, which would be updated by both
actor- and critic optimizers.
"""
return input_dict["obs"], state
Is SAC not fit for this type of weight sharing that PPO provides, or it should be just fine?