[tune] How to evaluate from multiple checkpoints

RaphaelCS · April 18, 2021, 3:59am

I’ve tuned hyperparameters of a rllib policy by tune and saved many checkpoints. I’m trying to restore them and evaluate them by using compute_action.

But how can I evaluate them at the same time and use something like FIFO scheduler, since I want to evaluate hundreds of checkpoints on 40 cpus (one cpu per evaluation).

Thanks in advance.

mannyv · April 18, 2021, 10:29am

Hi @RaphaelCS,

Have a look at the documentation on evaluating with rollout here: RLlib Training APIs — Ray v1.2.0

If you need more customization you can copy/edit this file: ray/rollout.py at master · ray-project/ray · GitHub

RaphaelCS · April 19, 2021, 3:34am

Thanks for your reply!

I’ve read the documentation and know how to evaluate one checkpoint.

But I have hundreds of checkpoints, I want to evaluate them on 40 cpus (one cpu per evaluation). So how can I evaluate them?
One possible method is writing a bash script which run the evaluation command one by one. But this cannot efficiently utilize all the 40 cpus.

Thus, I am wondering if I can use FIFO scheduler in tune to treat all the checkpoints?

kai · April 19, 2021, 8:56am

Generally this should be quite straightforward. If you put your rollout command into a trainable function like this:

def run_rollout(config):
    checkpoint_path = config["checkpoint"]
    # ...

And then use Ray Tune to call it:

tune.run(
    run_rollout,
    config={
        "checkpoint": [
            "path/to/checkpoint_0",
            "path/to/checkpoint_1",
            "path/to/checkpoint_2",
        ]
    }
)

You might also want to call tune.report() in the trainable function to report the evaluation results.

Does this make sense?

mannyv · April 19, 2021, 12:21pm

I was not clear enough in my response. Thanks @kai for clarifying what I was suggesting.

@RaphaelCS your other option of you have the time /spare resources and your not doing something g special with the checkpoints is to run an evaluation at the same interval that you checkpoint. Then you will not have to do it as a seperate step afterwards.

RLlib Training APIs — Ray v2.0.0.dev0

RaphaelCS · April 19, 2021, 3:03pm

Thanks for your reply!

I thought there might be simpler way since now we create a new Trainable (run_rollout) and have to restore manually the original Trainable inside run_rollout.

But what you’ve put forward really meets my need!

RaphaelCS · April 19, 2021, 3:07pm

Yes, I see.

But what I need is to evaluate the checkpoints after training. Checkpoints are determined by a evaluation metric during training (just like what you’ve put forward), and there are more metrics that I want to evaluate after training.

Anyway, thanks for your help!

Topic		Replies	Views
How do I evaluate my trained policy after tune.fit() RLlib	1	684	March 30, 2023
Custom checkpoints in RLLIB RLlib	1	185	December 23, 2023
Cannot get a simple Evaluation to work as intended RLlib	6	357	September 5, 2022
Parallel workers compute action RLlib	4	676	June 12, 2021
Trainer.evaluate() runs 1 extra episode instead of as defined in evaluation_duration RLlib	1	351	August 26, 2022

[tune] How to evaluate from multiple checkpoints

Related topics