Board game self-play PPO

george_sk · April 30, 2021, 12:36pm

Hi @sven1977. Thanks for the reply. A quick question to clarify the correct self-play scheme.

If “menagerie” is a dictionary of previous policies, “shared_policy_1” is the trained policy (“policies_to_train”: [“shared_policy_1”]) and “shared_policy_2” is the self-play policy does the below code correctly sync the weights?
Also as I understand this is necessary even for zero number of workers. Is this correct?

class MyCallbacks(DefaultCallbacks):

        def on_train_result(self, *, trainer, result: dict, **kwargs):
            print("trainer.train() result: {} -> {} episodes".format(
                trainer, result["episodes_this_iter"]))
            
            menagerie[one_key] = trainer.get_policy("shared_policy_1").get_weights()  #saving weights in dictionary
            trainer.set_weights({"shared_policy_2": menagerie[some_key]})  #loading weights from dictionary

            weights = ray.put(trainer.workers.local_worker().save())
            trainer.workers.foreach_worker(
                lambda w: w.restore(ray.get(weights))
            )

Thanks,
George

Topic		Replies	Views
Rllib multi agent connect 4 issues - why does it 'forget' what it learnt? RLlib	0	245	November 27, 2023
Self-play modifications via callbacks RLlib	4	509	February 24, 2023
Tips for tuning in a competitive multi-agent turn based environment RLlib	2	787	April 9, 2021
Not Sure Which RLlib Algorithm To Use RLlib	5	642	April 27, 2021
RLlib self play with league example stops learning after first generation RLlib	2	222	February 11, 2024

Board game self-play PPO

Related topics