Change policy mapping function in the middle of an algorithm

call-me-anything-you · December 20, 2023, 9:13am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I’m trying to implement my own RL algorithm based on PPO, and I ran into some problems concerning changing policy mapping function.
My implementation looks like this:

class MyAlgorithm(PPO):
    def training_step(self):
        def new_policy_mapping_fn(agent_id, episode, worker, **kwargs):
            # the implementation here is different for each call of training_step
        self.workers.foreach_worker(lambda w: w.set_policy_mapping_fn(new_policy_mapping_fn))
        super().training_step()

I then ran my algorithm using algo.train().
Then it comes the wierd part.
In the first run of algo.train(), everything looks fine.
However, in the second run of algo.train(), I noticed that the keys of the sampled batch are still the policy ids used in the first iteration. Besides, in the results returned by algo.train(), the reward information is also related to the policy mapping function used in the first iteration, and unrelated to the policy mapping function used in the second iteration.
Is there a way to fix this?

Topic		Replies	Views
How to remove a policy from the Algorithm in the middle of the experiment? RLlib	7	288	November 2, 2022
Updating policy_mapping_fn while using tune.run() and restoring from a checkpoint RLlib	7	899	July 4, 2023
[RLlib] Error! TypeError: create_policy_mapping_fn.<locals>.mapping_fn() got an unexpected keyword argument 'worker' RLlib	0	374	June 21, 2023
How to change policies to train during a training run? RLlib	0	290	April 11, 2023
[HIGH] TypeError: policy_mapping_fn() takes 1 positional argument but 2 were given RLlib	0	211	December 3, 2023

Change policy mapping function in the middle of an algorithm

Related topics