How do I evaluate my trained policy after tune.fit()

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I have configured a simple DQN training script (shown below). How do I then evaluate in code (not via cli) my multiple trained policies post training. I have searched ray’s documentation but I can’t find an example showing how I would evaluate my policies as such. So for the given code below, I would have 4 trained policies after training. For each policy I would like to get the mean reward over 100 episodes. So I should have 4 values in the end.

import os
import argparse
import ray
from ray import air, tune
from ray.rllib.algorithms.dqn import DQNConfig

ray.init()

param_space = DQNConfig()\
    .framework(framework=tune.grid_search(['torch', 'tf2']))\
    .environment(env=tune.grid_search(['ALE/Boxing-v5', 'ALE/VideoPinball-v5']))\
    .resources(num_gpus=1) 

tune_config = tune.TuneConfig(
    num_samples=1,
)

run_config = air.RunConfig(
    name='base',
    stop={'timesteps_total': 1e6},
    checkpoint_config=air.CheckpointConfig(
        checkpoint_at_end=True,
    ),
    local_dir='rllib/train'
)

tune = tune.Tuner(
    'DQN',
    param_space=param_space,
    tune_config=tune_config,
    run_config=run_config,
)

tune.fit()

# how do I evaluate my trained policies here?

ray.shutdown()

Hi @rajfly ,

in general you can restore an algorithm from a checkpoint you recorded during training. This shown for example here in the documentation. You could then call

restored_algo.evaluate() 

to evaluate the policy on your environment (used by the agent in the checkpoint).

Here you can also find some documentation about serving your RLlib models.

1 Like