[RLlib] Multiagent with one pre-trained policy (vs another adversarial one)

sven1977 · January 13, 2021, 12:13pm

Kathy J. asked this question on our slack channel:

Hi, I’m running a multiagent adversarial policy – in this setup, I’m interested in training 2 policies. First, I have a trained RLlib policy that I want to reload. Second, I want to start the adversary agent from scratch. Is there any documentation you can point me at to do this, or any example code? Thank you!

sven1977 · January 13, 2021, 12:13pm

We don’t really support any pre-training settings/APIs right now. What you could do is to use the multiagent config sub-dict to set up your two policies, mapping of environment’s agent-IDs to which policy, etc…
Then you could pre-load the first policy (which you already trained) via something like:

trainer.get_policy("pol1").set_weights([pre-trained weights])

“pol1" would be the identifier for your first policy. This is the same identifier that you would use inside the “config->multiagent->policies” setting and “config->multiagent->policy_mapping_fn”.

Example: rllib/examples/multi_agent_custom_policy.py

Rory · April 20, 2021, 4:16pm

What is the best way to load a specific policy’s weights from a checkpoint file to the be used as another trainers policy? I’d like to use a pre-trained model to evaluate the currently training one against in a marl setting.

Doing something like this doesn’t work for me:

        loader = get_trainer_class(algo)(env="yaniv", config=config)
        loader.load_checkpoint(checkpoint_path)
        policy = loader.get_policy("policy_1").get_weights()
        self.trainer.set_weights({
            "eval_policy": policy
        })

I think this is because it makes a new trainer with all the workers and what not, where as I just want the policy, and gives the following error:

(pid=56641)   File "/home/jippo/Code/yaniv/yaniv-rl/yaniv_rl/utils/rllib/trainer.py", line 18, in setup
(pid=56641)     loader.load_checkpoint(checkpoint_path)
(pid=56641)   File "/home/jippo/.conda/envs/yaniv-torch/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 755, in load_checkpoint
(pid=56641)     self.__setstate__(extra_data)
(pid=56641)   File "/home/jippo/.conda/envs/yaniv-torch/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 191, in __setstate__
(pid=56641)     Trainer.__setstate__(self, state)
(pid=56641)   File "/home/jippo/.conda/envs/yaniv-torch/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 1321, in __setstate__
(pid=56641)     self.workers.local_worker().restore(state["worker"])
(pid=56641)   File "/home/jippo/.conda/envs/yaniv-torch/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1059, in restore
(pid=56641)     self.sync_filters(objs["filters"])
(pid=56641)   File "/home/jippo/.conda/envs/yaniv-torch/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1026, in sync_filters
(pid=56641)     assert all(k in new_filters for k in self.filters)
(pid=56641) AssertionError

sven1977 · April 21, 2021, 9:22am

Hey @Rory , could you try doing something (admittedly hacky ) like this:

# Get the weights dict/list of (learnt) policy1:
weights = my_trainer.get_policy([your policy's name or blank for "default_policy"]).get_weights()

# Transfer the weights to policy2:
my_new_trainer.get_policy([your policy's name or blank for "default_policy"]).set_weights(weights)

Xianyue_Peng · June 14, 2024, 5:45pm

Hi sven,

I would like to know that if it is possible to have the pre-training setting using the newest version. Can we use “policies_to_train” to pre-training part of agents’ policies?

I am now facing the error that “The observation collected from env.step(sampled_action) was not contained within your env’s observation space.” when only train part of agents’ policies. I think is might due to that the sampled_action including both to-train agents and not-to-train agent. Do you know how to solve it? Or I have to do as you mentioned above.

Thank you so much for your help.

Best,
Xianyue Peng

Topic		Replies	Views
Loading pre-trained single-agent policy weights for multi-agent training RLlib	2	884	June 11, 2021
RLLib Multiagent: Load only one policy from checkpoint & Compatibility of RLLib/Tune Checkpoints RLlib	9	3271	November 24, 2021
Transfer Learning for Multi-Agent env. with RLlib RLlib	4	793	September 21, 2022
Pre-train one type of policies in MARL Checkpointing, Restoring	0	58	June 18, 2024
Evaluating multiple policies in multiagent RLlib	4	520	July 6, 2021

[RLlib] Multiagent with one pre-trained policy (vs another adversarial one)

Related topics