Hey guys,
I need some advice for the following:
I have a already trained policy stored in a checkpoint (trainer.save()
). Now I restore from that checkpoint (trainer.restore()
) and want to employ the already trained policy for an “external application”, i.e. I want to employ RLlib’s PolicyServerInput
and PolicyClient
classes for inference.
What’s the best practice on the trainer/server side to test (“rollout”) the trained policy?
E.g.,
trainer = PPOTrainer(config={"explore": False}, ...)
trainer.restore(checkpoint)
while True:
trainer.train()
or
trainer = PPOTrainer(config={"explore": False}, ...)
trainer.restore(checkpoint)
while True:
trainer.evaluate()
Or is it better to directly restore a trainer from the checkpoint on the “virtual client side” and then do action = trainer.compute_action(obs)
here (so neither server nor client)?