Board game self-play PPO

Hi @sven1977. Thanks for the reply. A quick question to clarify the correct self-play scheme.

If “menagerie” is a dictionary of previous policies, “shared_policy_1” is the trained policy (“policies_to_train”: [“shared_policy_1”]) and “shared_policy_2” is the self-play policy does the below code correctly sync the weights?
Also as I understand this is necessary even for zero number of workers. Is this correct?

class MyCallbacks(DefaultCallbacks):

        def on_train_result(self, *, trainer, result: dict, **kwargs):
            print("trainer.train() result: {} -> {} episodes".format(
                trainer, result["episodes_this_iter"]))
            
            menagerie[one_key] = trainer.get_policy("shared_policy_1").get_weights()  #saving weights in dictionary
            trainer.set_weights({"shared_policy_2": menagerie[some_key]})  #loading weights from dictionary

            weights = ray.put(trainer.workers.local_worker().save())
            trainer.workers.foreach_worker(
                lambda w: w.restore(ray.get(weights))
            )

Thanks,
George