I am using different agents, some of them are linked to the same policy using “policy_mapping_fn”. In such a case, would these agents behave as the same agent (sharing parameters and experience buffer) or would they be different agents with different parameters that predict using the same algorithm?. In the latter case, would the buffer of experiences be shared or would each agent have its own buffer?
The short answer is that agents that map to the same policy use the same models and buffer.
Here is a slightly more involved answer:
Does it make sense?