Hey @jgonik , great question. We should add an example script to RLlib that shows how to do that.
You can basically do a pre-run using the BCTrainer (ray.rllib.agents.marwil.bc.py
). The test case in ray.rllib.agents.marwil.tests.test_bc.py
shows how to train from an offline file.
After training your BCTrainer, you save the policies weights by doing:
trainer = BCTrainer(...)
... #<- training
weights = trainer.get_policy().get_weights() # <- single agent weights
# Create the actual trainer and load the BC trained weights into it.
new_trainer = PPOTrainer(...)
for n in range(4):
policy = new_trainer.get_policy([the nth policy ID])
policy.set_weights(weights)