Multiagent Remove_Policies synchronisation

SkgTrip · October 11, 2023, 10:55am

Hi all,

I’m attempting to construct a multiagent RL trainer that has a learnable policy that plays against a range of saved historic variants of the trained agents, selected from a fixed size menagerie.

At the moment this is achieved through a callback function, for which the on_train_result function

Adds to a list of stored policy indices of fixed size (so popping off the first element if the list exceeds the memory)
Defines a new policy_mapping_fn which selects a policy from the list of stored policy indices.
If an element has been popped from the list, calling algorithm.remove_policy(policy_id=old_policy_id) to truncate the list
Creating a new policy, getting the state from the current active learning policy, and assigning that to the new policy.
Calling algorithm.sync_weights()

However, step 3 is not working as I would expect. If I remove policies from the policy set, I receive an error stating that the policy_mapping_fn has returned an invalid policy_id. Removing step 3), and just letting the size of the dictionary of the weights associated with the previously identified solutions grow without any pruning works just fine, but obviously this is unlikely to be optimal.

To me this would suggest that I don’t understand how the callback function is working in the context of the distributed solver, and that something has become un-synchronised during the process. I haven’t been able to track down any example of multiagent environments using remove_policy, so I’m struggling to work out the correct RLLIB way to manage this synchronisation issue, and was wondering if anyone had any experience on this front?

Thanks in advance

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Topic		Replies	Views
How to remove a policy from the Algorithm in the middle of the experiment? RLlib	7	281	November 2, 2022
Two different method mapping policy to agents RLlib	1	275	February 2, 2023
Change policy mapping function in the middle of an algorithm RLlib	0	211	December 20, 2023
Failing at configuring a multi-agent trainer RLlib	0	38	December 20, 2024
Policy mapping for computing actions in multi agent env RLlib	8	1176	January 2, 2022

Multiagent Remove_Policies synchronisation

Related topics