How to remove a policy from the Algorithm in the middle of the experiment?

Hi all,

I’m following the open-spiel self-play example. Though it is possible to add policies and change the mapping function during the callback, I was wondering if I could remove any unused or weak policies from the Algorithm by call the remove_policy method. It always gives me a value error (ValueError(f"Policy ID ‘{policy_id}’ not in policy map!")) during the next round of sampling after the on_train_result callback, though I have not included the removed policy in my algorithm’s policy mapping function.

Hi @mickelliu ,

This is not possible just like that because policies are built on RLLib’s RolloutWorkers at the beginning of a training. There is quite a bit of initialization there and also context that is kept distributed throughout the training. If you delete it from the Algorithm, it won’t be deleted from the RolloutWorkers.

Thanks for your response. This is kind of unfortunate. In my current use case where I have 64 workers and in each environment there are16 teams of agents controlled by 16 separate policies. I believe that there’s a feature that the workers will start dumping pkl files into the disk after the policy map hits 100 policies. As you can see below, the moment that the policy map contains 100 policies (at ~ train iter 84), the training iteration time and sync weight time skyrocketed which effectively killed my experiment. I also ended up with a bunch of pkl files in the drive, since every remote worker will need to save a separate copy. So after 100 iterations, I have around 64 x 100 pkl files in my drive.

I suppose if I could delete policies from the map this problem would be gone. Currently I’m doing manual dumping and loading checkpoints in the callback function as a backup solution.

Sorry I’ll have to correct myself here. It is possible to remove policies and update the policy map, but the ongoing episodes will still use the old policy mapping function they were started with.
So if you remove a policy and continue sampling after that, your old policy mapping function will still have to map to an existing policy until your RolloutWorkers have finished their episodes.

We don’t have a feature that starts creating pkl files at 100 policies. What do your checkpointing settings look like? Do you have a checkpointing frequency of 10 like in the example script? Try not creating checkpoints to see what happens.

hi @arturn, thanks again for your reply.

I was referring to the policy map capacity and stash_to_disk feature. My checkpoint frequency was set to 50, but besides the single checkpoint pkl file the trainer generates every 50 iterations, it starts storing a bunch of other pkl files at the same directory of my train.py after ~80-ish iteration, which creates a giant mess. Normally, the checkpoint files go into some subdirectories of ray_result. Therefore I suspected the stash-to-disk feature of the PolicyMap causes this.

It is possible to remove policies and update the policy map.

I noticed that I didn’t update the policy map after I removed a policy. Maybe that’s why it gave me an error. I will give it a try later. Though I find manual loading-dumping checkpoints and get-set states are easier to manage than the Ray 2.0 policy-related APIs.

Right, I think I understand the issue now. So, removing a policy from the policy map that is stashed should obviously also remove the pkl on the disk. That’s also how it looks in the code. The training time will skyrocket only if you train policies that have been stashed and as you have reported, each remote worker will create it’s own copy. This is a tricky one… I assume that your memory is not big enough to hold all the policies that your are training? E.g. you can’t simply use a higher policy_map_capacity?

1 Like

@mickelliu We’ll change this behaviour with a PR that is on it’s way. Check out a nightly build in the coming days to see if that works for you

Yeah that’s for sure, in fact I have plenty mem available (~1tb). TIL there are keys in multiagent called policy_map_capacity and policy_map_cache.

PR looks promising. Hopefully the slow stashing can soon be resolved.

Thanks again for the ray team being so supportive.

1 Like