Evaluating multiple policies in multiagent

PavelC · June 21, 2021, 9:55am

Hi all,

if I understand correctly the supported way to evaluate the performance of a trained agent is to use the rollout function.
I have trained several policies that use different algorithms, e.g. one policy with PPO and one with DDPG.
Is there any way to have these different policies play against each other in the same multiagent environment?

sven1977 · June 29, 2021, 1:51pm

Hey @PavelC , great question!
For pure evaluation, you could actually use any Trainer (b/c you don’t care about how training updates are done) and set it up as “multiagent”, like in this example here:

ray.rllib.examples.multi_agent_custom_policy.py

Then call .evaluate() on your Trainer instance. Given that you have the correct agent ID->policy ID mapping function specified, this should have both policies play against each other in your env.

PavelC · July 5, 2021, 12:23pm

Thank you for taking a look Sven, this helps a lot!

Now I was not 100% sure how to get the different stored policies into the same trainer. Here is how I would do that, does that seem generally the right way to go? This would be for num_agents = len(checkpoint_paths) different trained policies

    # Load original weights by way of trainers
    restored_trainers = []  # Contains trainers for all agents except the first one
    for i in range(num_agents):
        trainer = trainer_classes[i](env, config)
        trainer.restore(checkpoint_paths[i])
        restored_trainers.append(trainer)

    # Set up the config for the eval trainer
    eval_config = ... # initialize config similar to training

    # Set policies according to loaded agents
    for i, trainer in enumerate(restored_trainers):
        eval_config['multiagent']['policies'][f'policy_{i}'] = (
            type(trainer.get_policy(f'policy_{i}')),
            env.observation_space_dict[i],
            env.action_space_dict[i],
            {'agent_id': i}
        )

    eval_config['evaluation_num_episodes'] = 100
    # Create the trainer to perform eval in
    eval_trainer = PPOTrainer(config=eval_config, env=env)

    # Set restored weights
    for i, trainer in enumerate(restored_trainers):
        # Set the weights for the other policy
        eval_trainer.set_weights(trainer.get_weights(f'policy_{i}'))

    # Perform eval
    results = eval_trainer.evaluate()

To summarize, the idea is (1) load original checkpoints with trainer.restore() (2) set the correct policy, e.g. PPOTFPolicy at config.multiagent.policies.policy_i[0] for the config used in the evaluation trainer (3) create any trainer with this multiagent config (4) Set the weights from the restored trainers.

This seems to work, but there is one thing that is still acting weirdly: I want to set the number of episodes (or alternatively timesteps) for evaluation, so I do eval_config['evaluation_num_episodes'] = 100. However, if I look at results['episodes_this_iter'] after evaluate() the value seems to be mostly unrelated to that config setting. Am I missing something here?

mannyv · July 5, 2021, 2:29pm

Hi @PavelC,

I think results['episodes_this_iter'] is returning a count of training episodes. There should be a top level key results["evaluation"] that holds a dictionary with evaluation metrics.

PavelC · July 6, 2021, 6:22pm

Yes, sorry @mannyv , you’re right, I meant results[‘evaluation’][‘episodes_this_iter’], which seeems to be 50x - 100x what I put for ‘evaluation_num_episodes’ in the config. Likewise, the the length of the lists in ‘hists_stats’, e.g. results[‘evaluation’][‘hist_stats’][episode_reward’] have ‘episodes_this_iter’ many elements

Topic		Replies	Views
Multi agent checkpoints - KeyError: 'default_policy' RLlib	1	594	October 30, 2021
[RLlib] Multiagent with one pre-trained policy (vs another adversarial one) RLlib	4	1233	June 14, 2024
Evaluating multi-agent policies trained with self-play RLlib	2	560	March 16, 2022
Proper way to load and evaluate trained agent RLlib	6	1604	March 4, 2022
RLLib Multiagent: Load only one policy from checkpoint & Compatibility of RLLib/Tune Checkpoints RLlib	9	3294	November 24, 2021

Evaluating multiple policies in multiagent

Related topics