I’ve tuned hyperparameters of a rllib policy by tune and saved many checkpoints. I’m trying to restore them and evaluate them by using compute_action.
But how can I evaluate them at the same time and use something like FIFO scheduler, since I want to evaluate hundreds of checkpoints on 40 cpus (one cpu per evaluation).
I’ve read the documentation and know how to evaluate one checkpoint.
But I have hundreds of checkpoints, I want to evaluate them on 40 cpus (one cpu per evaluation). So how can I evaluate them?
One possible method is writing a bash script which run the evaluation command one by one. But this cannot efficiently utilize all the 40 cpus.
Thus, I am wondering if I can use FIFO scheduler in tune to treat all the checkpoints?
I was not clear enough in my response. Thanks @kai for clarifying what I was suggesting.
@RaphaelCS your other option of you have the time /spare resources and your not doing something g special with the checkpoints is to run an evaluation at the same interval that you checkpoint. Then you will not have to do it as a seperate step afterwards.
I thought there might be simpler way since now we create a new Trainable (run_rollout) and have to restore manually the original Trainable inside run_rollout.
But what I need is to evaluate the checkpoints after training. Checkpoints are determined by a evaluation metric during training (just like what you’ve put forward), and there are more metrics that I want to evaluate after training.