RLlib: using evaluation workers on previously trained models

andrew-rosenfeld-ts · November 25, 2020, 11:06am

Imagine you’ve spent a long time training a RL model. You have a checkpoint from that training, and want to run various metrics and analyses (which you’ll keep iterating on) on the behaviour of that saved model.

You could do rollouts using an approach like in rollout.py, but to get parallelism it might be nice to use evaluation workers, maybe via something like this:

import os
import pickle

import ray
from ray.rllib.agents import ppo


def my_custom_eval(trainer, eval_workers):
    # do some cool evaluation metrics here                                                                                                                                                                          
    print('Evaluating')

# previously trained model, can use something like this to generate
# rllib train --run=PPO --env=CartPole-v0  --config="{\"num_workers\": 10, \"train_batch_size\": 1000}" --checkpoint-freq=10
checkpoint_path = os.path.expanduser('~/ray_results/default/PPO_CartPole-v0_bf91c_00000_0_2020-11-25_10-34-08/checkpoint_100/checkpoint-100')

ray.init()

run_base_dir = os.path.dirname(os.path.dirname(checkpoint_path))
config_path = os.path.join(run_base_dir, 'params.pkl')
with open(config_path, 'rb') as f:
    config = pickle.load(f)

# convert all the training workers to evaluation workers                                                                                                                                                           
config['evaluation_num_workers'] = config['num_workers']
config['num_workers'] = 0
# hook in your callback
config['custom_eval_function'] = my_custom_eval

trainer = ppo.PPOTrainer(config=config)
trainer.restore(checkpoint_path)
trainer._evaluate()

You will get an error like:

Error in sys.excepthook:
Traceback (most recent call last):
  File "/home/andrew/miniconda3/envs/ray_nightly_tf_2.2/lib/python3.8/site-packages/ray/worker.py", line 856, in custom_excepthook
    ray.state.state.add_worker(worker_id, worker_type, worker_info)
  File "/home/andrew/miniconda3/envs/ray_nightly_tf_2.2/lib/python3.8/site-packages/ray/state.py", line 733, in add_worker
    return self.global_state_accessor.add_worker_info(
AttributeError: 'NoneType' object has no attribute 'add_worker_info'

Original exception was:
Traceback (most recent call last):
  File "ray_eval_after_training.py", line 28, in <module>
    trainer._evaluate()
  File "/home/andrew/miniconda3/envs/ray_nightly_tf_2.2/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 713, in _evaluate
    self._sync_weights_to_workers(worker_set=self.evaluation_workers)
AttributeError: 'PPO' object has no attribute 'evaluation_workers'

Is there any way to initialise these evaluation workers so that this would work?

sven1977 · November 27, 2020, 3:19pm

Hey Andrew! Thanks for filing this. This exact problem is actually on our tech debt list for 2021: How to do parallel evaluation rollouts (rllib rollout (rollout.py) currently does not allow parallelization either).

I’ll take a look at your script. I think the problem is the loading from the pickled Trainer (which did not have eval workers to begin with).
…

sven1977 · November 27, 2020, 3:22pm

Yeah, the error makes sense: The original Trainer, which we saved, does not contain these eval workers. So even if we have the config correct before restoring, the created Trainer object will not have these workers in place.
Two workarounds:

You create the original Trainer already with eval workers (not great, b/c maybe you don’t need them during training and you also don’t know up front how many you need).
We do a hot-fix for this particular scenario, where we do create these eval workers after restoring.

sven1977 · November 27, 2020, 3:27pm

Here is the solution (the below issue has been also updated and closed). The key is to set evaluation_interval > 0 in the restored Trainer’s config.

The following works fine:

import os
import pickle

import ray
from ray.rllib.agents import ppo


def my_custom_eval(trainer, eval_workers):
    # do some cool evaluation metrics here
    print('Evaluating')
    return {  # <-- HERE: must return a dict
        "a": 1.0,
    }

# previously trained model, can use something like this to generate
# rllib train --run=PPO --env=CartPole-v0  --config="{\"num_workers\": 10, \"train_batch_size\": 1000}" --checkpoint-freq=10
checkpoint_path = os.path.expanduser('~/ray_results/default/PPO_CartPole-v0_c2ec3_00000_0_2020-12-11_15-18-38/checkpoint_9/checkpoint-9')

ray.init()

run_base_dir = os.path.dirname(os.path.dirname(checkpoint_path))
config_path = os.path.join(run_base_dir, 'params.pkl')
with open(config_path, 'rb') as f:
    config = pickle.load(f)

# convert all the training workers to evaluation workers
config['evaluation_num_workers'] = 3
config['evaluation_interval'] = 1  # <-- HERE: must set this to > 0!
config['num_workers'] = 0
# hook in your callback
config['custom_eval_function'] = my_custom_eval

trainer = ppo.PPOTrainer(config=config)
trainer.restore(checkpoint_path)
trainer._evaluate()

Rory · April 21, 2021, 11:20am

Hi @sven1977,

I’m trying to do a similar thing, where I just want to run a bunch of experiments on a pretrained model using the rllib worker system. Admittedly I don’t entirely understand it but have managed to hack my way around to getting it to more or less work.

Only trouble now, is that I only want to use the trainer for evaluation and setting config[‘num_workers’] = 0 doesn’t work it raises a value error:

File ".../ray/rllib/agents/a3c/a3c.py", line 58, in validate_config
    raise ValueError("`num_workers` for A3C must be >= 1!")

Is there any way around this, as I’d really like to use the extra worker in my evaluations

Cheers,

Rory

sven1977 · April 22, 2021, 8:14am

Actually, why don’t you use the “evaluation_workers” (a different worker set defined by the config keys: evaluation_(num_workers|config|interval|num_episodes), and then call Trainer._evaluate() (<- this should use these evaluation workers only, not the “regular” rollout workers). Then you can still set num_workers=1 and e.g. evaluation_num_workers=[any number should work here; 0=local evaluation]; >0=parallel evaluation via ray remote.

hridayns · August 24, 2022, 6:46am

Hello, sorry to bother you but could you maybe paste your hacky solution of simply evaluating a trained model? I am finding it near impossible to do, and is quite frustrating. Thank you.

Ryan · December 8, 2022, 8:40pm

Hello, did you ever get a solution to this working on your end? I’m also facing a similar challenge and would like to perform parallel evaluations on a trained model to collect some statistics. Thank you.

Topic		Replies	Views
Required resources should be shared between train and eval workers RLlib	5	526	March 31, 2021
[RLlib] Questions about loading checkpoint and asynchrone evaluation workers RLlib	3	598	May 26, 2021
Use a remote worker for Evaluation RLlib	5	549	July 5, 2021
Parallel workers compute action RLlib	4	691	June 12, 2021
Evaluation worker won't stop RLlib	3	576	June 19, 2022

RLlib: using evaluation workers on previously trained models

Related topics