Hi all, I’ve set up training between two agents using league based self-play and I wanted to evaluate main policies against one another during evaluation. At each training iteration, new agents get added to the opponent pool and the policy mapping function is updated. However, I’m running into an issue where the updated policy mapping function is used for evaluation instead of the one declared in the config.
Here’s the policy_mapping_fn
config["evaluation_config"] = {
"multiagent": {
"policy_mapping_fn": lambda x: x
}
}
…
and in the callbacks:
def on_train_result(self, *, trainer, result, **kwargs):
...
def policy_mapping_fn(agent_id, episode, worker, **kwargs):
if (episode.episode_id % 2) == 0:
if agent_id == "attacker":
agents = list(range(0, self.opponents[agent_id] + 1))
agent_selection = self.rng.choice(agents).item()
return f"{agent_id}_v{agent_selection}"
elif agent_id == "defender":
return "defender"
else:
if agent_id == "attacker":
return "attacker"
elif agent_id == "defender":
agents = list(range(0, self.opponents[agent_id] + 1))
agent_selection = self.rng.choice(agents).item()
return f"{agent_id}_v{agent_selection}"
new_policy = trainer.add_policy(
policy_id=new_pol_id,
policy_cls=type(trainer.get_policy(agent)),
config=config,
policy_mapping_fn=policy_mapping_fn,
action_space=trainer.get_policy(agent).action_space
)
Is there a way to check if the episode is in evaluation, to properly map the policies? Or any other ideas would be appreciated.
How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.