[RLlib] Multiagent with one pre-trained policy (vs another adversarial one)

Rory · April 20, 2021, 4:16pm

What is the best way to load a specific policy’s weights from a checkpoint file to the be used as another trainers policy? I’d like to use a pre-trained model to evaluate the currently training one against in a marl setting.

Doing something like this doesn’t work for me:

        loader = get_trainer_class(algo)(env="yaniv", config=config)
        loader.load_checkpoint(checkpoint_path)
        policy = loader.get_policy("policy_1").get_weights()
        self.trainer.set_weights({
            "eval_policy": policy
        })

I think this is because it makes a new trainer with all the workers and what not, where as I just want the policy, and gives the following error:

(pid=56641)   File "/home/jippo/Code/yaniv/yaniv-rl/yaniv_rl/utils/rllib/trainer.py", line 18, in setup
(pid=56641)     loader.load_checkpoint(checkpoint_path)
(pid=56641)   File "/home/jippo/.conda/envs/yaniv-torch/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 755, in load_checkpoint
(pid=56641)     self.__setstate__(extra_data)
(pid=56641)   File "/home/jippo/.conda/envs/yaniv-torch/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 191, in __setstate__
(pid=56641)     Trainer.__setstate__(self, state)
(pid=56641)   File "/home/jippo/.conda/envs/yaniv-torch/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 1321, in __setstate__
(pid=56641)     self.workers.local_worker().restore(state["worker"])
(pid=56641)   File "/home/jippo/.conda/envs/yaniv-torch/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1059, in restore
(pid=56641)     self.sync_filters(objs["filters"])
(pid=56641)   File "/home/jippo/.conda/envs/yaniv-torch/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1026, in sync_filters
(pid=56641)     assert all(k in new_filters for k in self.filters)
(pid=56641) AssertionError

Topic		Replies	Views
Loading pre-trained single-agent policy weights for multi-agent training RLlib	2	883	June 11, 2021
RLLib Multiagent: Load only one policy from checkpoint & Compatibility of RLLib/Tune Checkpoints RLlib	9	3269	November 24, 2021
Transfer Learning for Multi-Agent env. with RLlib RLlib	4	793	September 21, 2022
Pre-train one type of policies in MARL Checkpointing, Restoring	0	58	June 18, 2024
Evaluating multiple policies in multiagent RLlib	4	520	July 6, 2021

[RLlib] Multiagent with one pre-trained policy (vs another adversarial one)

Related topics