Multiple training phases with different configurations

cool-RR · November 13, 2022, 6:13pm

I want to do some complicated training using RLlib and I’m not sure how.

I have an environment for two agents, and I want to train the first agent while I’m forcing the policy of the second agent to be a hard-coded policy that I write. I want to run that training for 10 steps. Then I want to continue training both agents normally for 10 more steps. That means that in the second training, the first agent is starting out with the policy that I trained in the first training phase, while the second agent is starting with a blank policy.

Is that possible with RLlib? How?

Thanks for your help,
Ram Rachum.

kourosh · January 6, 2023, 11:02pm

Hi @cool-RR, very interesting use-case.

I think yes, it’s possible; and here is a sketch of how using callbacks.

Consider having two policies. Let’s call them {‘main’, ‘hard-coded’}.
make main the only trainable policy.
using the on_train_result() hook of your callback you can check a criteria (e.g. self.iteration % 10 == 0) and do what ever you want with the algorithm and policy states. You can switch out the policy states, and let the new policy start from scratch. You can add a new policy. You can update the policy_mapping fn, etc. I recommend taking a look at self_play_with_open_spiel.py to get ideas on how this can be done.

I hope it helps.

cool-RR · January 7, 2023, 4:23pm

Hi Kourosh!

Thanks for your reply. This is a thread from mid-November, I already did something similar and it worked

Thank you for helping anyway.

Topic		Replies	Views
Best practice for multi-stage training workflow RLlib	3	487	September 6, 2022
[RLlib] Multiagent with one pre-trained policy (vs another adversarial one) RLlib	4	1225	June 14, 2024
How can I train multiple 'trainer' in same environment?(or embed trained trainer in environment?) RLlib	3	488	January 9, 2023
Evaluating multiple policies in multiagent RLlib	4	516	July 6, 2021
An example of RLLib used with multiple neural networks RLlib	2	362	June 29, 2022

Multiple training phases with different configurations

Related topics