RLlib: using evaluation workers on previously trained models

Imagine you’ve spent a long time training a RL model. You have a checkpoint from that training, and want to run various metrics and analyses (which you’ll keep iterating on) on the behaviour of that saved model.

You could do rollouts using an approach like in rollout.py, but to get parallelism it might be nice to use evaluation workers, maybe via something like this:

import os
import pickle

import ray
from ray.rllib.agents import ppo


def my_custom_eval(trainer, eval_workers):
    # do some cool evaluation metrics here                                                                                                                                                                          
    print('Evaluating')

# previously trained model, can use something like this to generate
# rllib train --run=PPO --env=CartPole-v0  --config="{\"num_workers\": 10, \"train_batch_size\": 1000}" --checkpoint-freq=10
checkpoint_path = os.path.expanduser('~/ray_results/default/PPO_CartPole-v0_bf91c_00000_0_2020-11-25_10-34-08/checkpoint_100/checkpoint-100')

ray.init()

run_base_dir = os.path.dirname(os.path.dirname(checkpoint_path))
config_path = os.path.join(run_base_dir, 'params.pkl')
with open(config_path, 'rb') as f:
    config = pickle.load(f)

# convert all the training workers to evaluation workers                                                                                                                                                           
config['evaluation_num_workers'] = config['num_workers']
config['num_workers'] = 0
# hook in your callback
config['custom_eval_function'] = my_custom_eval

trainer = ppo.PPOTrainer(config=config)
trainer.restore(checkpoint_path)
trainer._evaluate()

You will get an error like:

Error in sys.excepthook:
Traceback (most recent call last):
  File "/home/andrew/miniconda3/envs/ray_nightly_tf_2.2/lib/python3.8/site-packages/ray/worker.py", line 856, in custom_excepthook
    ray.state.state.add_worker(worker_id, worker_type, worker_info)
  File "/home/andrew/miniconda3/envs/ray_nightly_tf_2.2/lib/python3.8/site-packages/ray/state.py", line 733, in add_worker
    return self.global_state_accessor.add_worker_info(
AttributeError: 'NoneType' object has no attribute 'add_worker_info'

Original exception was:
Traceback (most recent call last):
  File "ray_eval_after_training.py", line 28, in <module>
    trainer._evaluate()
  File "/home/andrew/miniconda3/envs/ray_nightly_tf_2.2/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 713, in _evaluate
    self._sync_weights_to_workers(worker_set=self.evaluation_workers)
AttributeError: 'PPO' object has no attribute 'evaluation_workers'

Is there any way to initialise these evaluation workers so that this would work?

Hey Andrew! Thanks for filing this. This exact problem is actually on our tech debt list for 2021: How to do parallel evaluation rollouts (rllib rollout (rollout.py) currently does not allow parallelization either).

I’ll take a look at your script. I think the problem is the loading from the pickled Trainer (which did not have eval workers to begin with).

Yeah, the error makes sense: The original Trainer, which we saved, does not contain these eval workers. So even if we have the config correct before restoring, the created Trainer object will not have these workers in place.
Two workarounds:

  • You create the original Trainer already with eval workers (not great, b/c maybe you don’t need them during training and you also don’t know up front how many you need).
  • We do a hot-fix for this particular scenario, where we do create these eval workers after restoring.

Here is the solution (the below issue has been also updated and closed). The key is to set evaluation_interval > 0 in the restored Trainer’s config.

The following works fine:

import os
import pickle

import ray
from ray.rllib.agents import ppo


def my_custom_eval(trainer, eval_workers):
    # do some cool evaluation metrics here
    print('Evaluating')
    return {  # <-- HERE: must return a dict
        "a": 1.0,
    }

# previously trained model, can use something like this to generate
# rllib train --run=PPO --env=CartPole-v0  --config="{\"num_workers\": 10, \"train_batch_size\": 1000}" --checkpoint-freq=10
checkpoint_path = os.path.expanduser('~/ray_results/default/PPO_CartPole-v0_c2ec3_00000_0_2020-12-11_15-18-38/checkpoint_9/checkpoint-9')

ray.init()

run_base_dir = os.path.dirname(os.path.dirname(checkpoint_path))
config_path = os.path.join(run_base_dir, 'params.pkl')
with open(config_path, 'rb') as f:
    config = pickle.load(f)

# convert all the training workers to evaluation workers
config['evaluation_num_workers'] = 3
config['evaluation_interval'] = 1  # <-- HERE: must set this to > 0!
config['num_workers'] = 0
# hook in your callback
config['custom_eval_function'] = my_custom_eval

trainer = ppo.PPOTrainer(config=config)
trainer.restore(checkpoint_path)
trainer._evaluate()

Hi @sven1977,

I’m trying to do a similar thing, where I just want to run a bunch of experiments on a pretrained model using the rllib worker system. Admittedly I don’t entirely understand it but have managed to hack my way around to getting it to more or less work.

Only trouble now, is that I only want to use the trainer for evaluation and setting config[‘num_workers’] = 0 doesn’t work it raises a value error:

File ".../ray/rllib/agents/a3c/a3c.py", line 58, in validate_config
    raise ValueError("`num_workers` for A3C must be >= 1!")

Is there any way around this, as I’d really like to use the extra worker in my evaluations

Cheers,

Rory :slight_smile:

Actually, why don’t you use the “evaluation_workers” (a different worker set defined by the config keys: evaluation_(num_workers|config|interval|num_episodes), and then call Trainer._evaluate() (<- this should use these evaluation workers only, not the “regular” rollout workers). Then you can still set num_workers=1 and e.g. evaluation_num_workers=[any number should work here; 0=local evaluation]; >0=parallel evaluation via ray remote.

1 Like

Hello, sorry to bother you but could you maybe paste your hacky solution of simply evaluating a trained model? I am finding it near impossible to do, and is quite frustrating. Thank you.

Hello, did you ever get a solution to this working on your end? I’m also facing a similar challenge and would like to perform parallel evaluations on a trained model to collect some statistics. Thank you.