How to remove a policy from the Algorithm in the middle of the experiment?

mickelliu · October 22, 2022, 2:07pm

Hi all,

I’m following the open-spiel self-play example. Though it is possible to add policies and change the mapping function during the callback, I was wondering if I could remove any unused or weak policies from the Algorithm by call the remove_policy method. It always gives me a value error (ValueError(f"Policy ID ‘{policy_id}’ not in policy map!")) during the next round of sampling after the on_train_result callback, though I have not included the removed policy in my algorithm’s policy mapping function.

arturn · October 27, 2022, 1:54am

Hi @mickelliu ,

This is not possible just like that because policies are built on RLLib’s RolloutWorkers at the beginning of a training. There is quite a bit of initialization there and also context that is kept distributed throughout the training. If you delete it from the Algorithm, it won’t be deleted from the RolloutWorkers.

mickelliu · October 27, 2022, 10:01am

Thanks for your response. This is kind of unfortunate. In my current use case where I have 64 workers and in each environment there are16 teams of agents controlled by 16 separate policies. I believe that there’s a feature that the workers will start dumping pkl files into the disk after the policy map hits 100 policies. As you can see below, the moment that the policy map contains 100 policies (at ~ train iter 84), the training iteration time and sync weight time skyrocketed which effectively killed my experiment. I also ended up with a bunch of pkl files in the drive, since every remote worker will need to save a separate copy. So after 100 iterations, I have around 64 x 100 pkl files in my drive.

I suppose if I could delete policies from the map this problem would be gone. Currently I’m doing manual dumping and loading checkpoints in the callback function as a backup solution.

arturn · October 27, 2022, 9:14pm

Sorry I’ll have to correct myself here. It is possible to remove policies and update the policy map, but the ongoing episodes will still use the old policy mapping function they were started with.
So if you remove a policy and continue sampling after that, your old policy mapping function will still have to map to an existing policy until your RolloutWorkers have finished their episodes.

We don’t have a feature that starts creating pkl files at 100 policies. What do your checkpointing settings look like? Do you have a checkpointing frequency of 10 like in the example script? Try not creating checkpoints to see what happens.

mickelliu · October 28, 2022, 8:07am

hi @arturn, thanks again for your reply.

I was referring to the policy map capacity and stash_to_disk feature. My checkpoint frequency was set to 50, but besides the single checkpoint pkl file the trainer generates every 50 iterations, it starts storing a bunch of other pkl files at the same directory of my train.py after ~80-ish iteration, which creates a giant mess. Normally, the checkpoint files go into some subdirectories of ray_result. Therefore I suspected the stash-to-disk feature of the PolicyMap causes this.

github.com

ray-project/ray/blob/cba26cc83f6b5b8a2ff166594a65cb74c0ec8740/rllib/policy/policy_map.py#L176


      
              # Item already in cache -> Rearrange deque (least recently used).
              if key in self.cache:
                  self.deque.remove(key)
                  self.deque.append(key)
                  self.cache[key] = value
              # Item not currently in cache -> store new value and - if at capacity -
              # remove leftmost one.
              else:
                  # Cache at capacity -> Drop leftmost item.
                  if len(self.deque) == self.deque.maxlen:
                      self._stash_to_disk()
                  self.deque.append(key)
                  self.cache[key] = value
              self.valid_keys.add(key)
          
          @with_lock
          @override(dict)
          def __delitem__(self, key):
              # Make key invalid.
              self.valid_keys.remove(key)
              # Remove policy from memory if currently cached.

It is possible to remove policies and update the policy map.

I noticed that I didn’t update the policy map after I removed a policy. Maybe that’s why it gave me an error. I will give it a try later. Though I find manual loading-dumping checkpoints and get-set states are easier to manage than the Ray 2.0 policy-related APIs.

arturn · October 30, 2022, 10:05pm

Right, I think I understand the issue now. So, removing a policy from the policy map that is stashed should obviously also remove the pkl on the disk. That’s also how it looks in the code. The training time will skyrocket only if you train policies that have been stashed and as you have reported, each remote worker will create it’s own copy. This is a tricky one… I assume that your memory is not big enough to hold all the policies that your are training? E.g. you can’t simply use a higher policy_map_capacity?

arturn · November 1, 2022, 10:17pm

@mickelliu We’ll change this behaviour with a PR that is on it’s way. Check out a nightly build in the coming days to see if that works for you

mickelliu · November 2, 2022, 6:54am

Yeah that’s for sure, in fact I have plenty mem available (~1tb). TIL there are keys in multiagent called policy_map_capacity and policy_map_cache.

PR looks promising. Hopefully the slow stashing can soon be resolved.

Thanks again for the ray team being so supportive.

Topic		Replies	Views
Multiagent Remove_Policies synchronisation RLlib	0	160	October 11, 2023
Change policy mapping function in the middle of an algorithm RLlib	0	211	December 20, 2023
How to change policies to train during a training run? RLlib	0	281	April 11, 2023
Updating policy_mapping_fn while using tune.run() and restoring from a checkpoint RLlib	7	856	July 4, 2023
Save off rllib policy activations from checkpoint rollout RLlib	4	412	March 25, 2021

How to remove a policy from the Algorithm in the middle of the experiment?

Related topics