[tune] How to evaluate from multiple checkpoints

I’ve tuned hyperparameters of a rllib policy by tune and saved many checkpoints. I’m trying to restore them and evaluate them by using compute_action.

But how can I evaluate them at the same time and use something like FIFO scheduler, since I want to evaluate hundreds of checkpoints on 40 cpus (one cpu per evaluation).

Thanks in advance.

Hi @RaphaelCS,

Have a look at the documentation on evaluating with rollout here: RLlib Training APIs — Ray v1.2.0

If you need more customization you can copy/edit this file: ray/rollout.py at master · ray-project/ray · GitHub

Thanks for your reply!

I’ve read the documentation and know how to evaluate one checkpoint.

But I have hundreds of checkpoints, I want to evaluate them on 40 cpus (one cpu per evaluation). So how can I evaluate them?
One possible method is writing a bash script which run the evaluation command one by one. But this cannot efficiently utilize all the 40 cpus.

Thus, I am wondering if I can use FIFO scheduler in tune to treat all the checkpoints?

Generally this should be quite straightforward. If you put your rollout command into a trainable function like this:

def run_rollout(config):
    checkpoint_path = config["checkpoint"]
    # ...

And then use Ray Tune to call it:

        "checkpoint": [

You might also want to call tune.report() in the trainable function to report the evaluation results.

Does this make sense?

I was not clear enough in my response. Thanks @kai for clarifying what I was suggesting.

@RaphaelCS your other option of you have the time /spare resources and your not doing something g special with the checkpoints is to run an evaluation at the same interval that you checkpoint. Then you will not have to do it as a seperate step afterwards.

RLlib Training APIs — Ray v2.0.0.dev0

Thanks for your reply!

I thought there might be simpler way since now we create a new Trainable (run_rollout) and have to restore manually the original Trainable inside run_rollout.

But what you’ve put forward really meets my need!

Yes, I see.

But what I need is to evaluate the checkpoints after training. Checkpoints are determined by a evaluation metric during training (just like what you’ve put forward), and there are more metrics that I want to evaluate after training.

Anyway, thanks for your help!