Hey guys,
I need some advice for the following:
I have a already trained policy stored in a checkpoint (trainer.save()). Now I restore from that checkpoint (trainer.restore()) and want to employ the already trained policy for an “external application”, i.e. I want to employ RLlib’s PolicyServerInput and PolicyClient classes for inference.
What’s the best practice on the trainer/server side to test (“rollout”) the trained policy?
E.g.,
trainer = PPOTrainer(config={"explore": False}, ...)
trainer.restore(checkpoint)
while True:
    trainer.train()
or
trainer = PPOTrainer(config={"explore": False}, ...)
trainer.restore(checkpoint)
while True:
    trainer.evaluate()
Or is it better to directly restore a trainer from the checkpoint on the “virtual client side” and then do action = trainer.compute_action(obs) here (so neither server nor client)?