Agents are deleted after first Ray Tune iteration

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am running a PettingZoo multi-agent environment, and after the first training iteration using Ray Tune, no actions are returned.

Within “new_episode” in ray/rllib/evaluation/, here are what actions_to_send should look like. This is what is returned on the first Tune iteration:

defaultdict(<class 'dict'>, {0: {'agent_0': 15, 'agent_1': 15, 'agent_2': 3}})

However, on the next (second) iteration, I get this:

defaultdict(<class 'dict'>, {0: {}})

which causes a KeyError to be thrown, since no agents are found in the action dictionary.

Where do I go to debug this issue? Maybe the agent-policy mapping isn’t done correctly? Somehow no actions are returned from “_process_policy_eval_results”, which uses “to_eval” to calculate new actions. It may be of note that “to_eval”, the mapping of policy IDs to lists of PolicyEvalData objects, is also empty on the second iteration.

Here’s how I set up my policies and policy mapping:

policies = {f"policy_{i}": gen_policy(i) for i in range(num_devices)}
policy_ids = {f"agent_{i}": p for i, p in enumerate(policies.keys())}
# Within config:
"multiagent": {
    "policies": policies,
    "policy_mapping_fn": (
        lambda agent_id: policy_ids[agent_id]),
    "count_steps_by": "env_steps"

Any help is appreciated! A little lost at the moment.