How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have configured a simple DQN training script (shown below). How do I then evaluate in code (not via cli) my multiple trained policies post training. I have searched ray’s documentation but I can’t find an example showing how I would evaluate my policies as such. So for the given code below, I would have 4 trained policies after training. For each policy I would like to get the mean reward over 100 episodes. So I should have 4 values in the end.
import os
import argparse
import ray
from ray import air, tune
from ray.rllib.algorithms.dqn import DQNConfig
ray.init()
param_space = DQNConfig()\
.framework(framework=tune.grid_search(['torch', 'tf2']))\
.environment(env=tune.grid_search(['ALE/Boxing-v5', 'ALE/VideoPinball-v5']))\
.resources(num_gpus=1)
tune_config = tune.TuneConfig(
num_samples=1,
)
run_config = air.RunConfig(
name='base',
stop={'timesteps_total': 1e6},
checkpoint_config=air.CheckpointConfig(
checkpoint_at_end=True,
),
local_dir='rllib/train'
)
tune = tune.Tuner(
'DQN',
param_space=param_space,
tune_config=tune_config,
run_config=run_config,
)
tune.fit()
# how do I evaluate my trained policies here?
ray.shutdown()