Agent_key and policy_id mismatch on multiagent ensemble training

I’m trying again to break it now.
You were right about the agent->policy mapping supporting 1 agent mapping to > 1 policies. We are doing this already in our e.g. multiagent CartPole example script.
Also, we can probably get rid of this assert at this part of the code (add_init_obs) as this is only called at the beginning of the episode. Throughout the episode, the mapping will not be updated anyways, so it should all be fine.