Evaluating multi-agent policies trained with self-play

guillermo · March 14, 2022, 11:15pm

Hi all, I’ve set up training between two agents using league based self-play and I wanted to evaluate main policies against one another during evaluation. At each training iteration, new agents get added to the opponent pool and the policy mapping function is updated. However, I’m running into an issue where the updated policy mapping function is used for evaluation instead of the one declared in the config.

Here’s the policy_mapping_fn

config["evaluation_config"] = {
    "multiagent": {
        "policy_mapping_fn": lambda x: x
    }
}

…

and in the callbacks:

def on_train_result(self, *, trainer, result, **kwargs):
     ...
     def policy_mapping_fn(agent_id, episode, worker, **kwargs):
            if (episode.episode_id % 2) == 0:
                if agent_id == "attacker":
                    agents = list(range(0, self.opponents[agent_id] + 1))
                    agent_selection = self.rng.choice(agents).item()
                    return f"{agent_id}_v{agent_selection}"
                elif agent_id == "defender":
                    return "defender"
            else:
                if agent_id == "attacker":
                    return "attacker"
                elif agent_id == "defender":
                    agents = list(range(0, self.opponents[agent_id] + 1))
                    agent_selection = self.rng.choice(agents).item()
                    return f"{agent_id}_v{agent_selection}"

        new_policy = trainer.add_policy(
            policy_id=new_pol_id,
            policy_cls=type(trainer.get_policy(agent)),
            config=config,
            policy_mapping_fn=policy_mapping_fn,
            action_space=trainer.get_policy(agent).action_space
        )

Is there a way to check if the episode is in evaluation, to properly map the policies? Or any other ideas would be appreciated.

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

avnishn · March 16, 2022, 12:47am

so just so that I understand correctly, your issue is that you have a self play setup where you add new policies to your league, but then evaluation ends up happening on all of the policies that are added, not just the initial policies in your league, right?

guillermo · March 16, 2022, 2:07am

Right, I want to evaluate on just the initial policies (the trainable policies), not the additional policies that are added to the league.

Topic		Replies	Views
Evaluating multiple policies in multiagent RLlib	4	516	July 6, 2021
Two different method mapping policy to agents RLlib	1	283	February 2, 2023
Failing at configuring a multi-agent trainer RLlib	0	42	December 20, 2024
Updating policy_mapping_fn while using tune.run() and restoring from a checkpoint RLlib	7	881	July 4, 2023
Potential bug in client server setup with policy mapping functions RLlib	3	359	September 1, 2022

Evaluating multi-agent policies trained with self-play

Related topics