Custom evaluation while avoiding unnecessary env creation

loicsacre · November 28, 2022, 12:45pm

I would like to perform a custom evaluation (without any training) following the given script:

ray-project/ray/blob/master/rllib/examples/custom_eval.py

"""Example of customizing evaluation with RLlib.

Pass --custom-eval to run with a custom evaluation function too.

Here we define a custom evaluation method that runs a specific sweep of env
parameters (SimpleCorridor corridor lengths).

------------------------------------------------------------------------
Sample output for `python custom_eval.py`
------------------------------------------------------------------------

INFO algorithm.py:623 -- Evaluating current policy for 10 episodes.
INFO algorithm.py:650 -- Running round 0 of parallel evaluation (2/10 episodes)
INFO algorithm.py:650 -- Running round 1 of parallel evaluation (4/10 episodes)
INFO algorithm.py:650 -- Running round 2 of parallel evaluation (6/10 episodes)
INFO algorithm.py:650 -- Running round 3 of parallel evaluation (8/10 episodes)
INFO algorithm.py:650 -- Running round 4 of parallel evaluation (10/10 episodes)

Result for PG_SimpleCorridor_2c6b27dc:
  ...

This file has been truncated. show original

For instance, this could be the case when re-evaluating a policy with other parameters.

Unfortunately, an environment responsible for sample collection is always created either on the local worker or on a remote worker and is not used at any time… The problem is that my env creation is costly. I just need an for the evaluation part.

Would it be possible to disable this unnecessary env creation when performing an evaluation ? I have digged into ray/rllib/evaluation/rollout_worker.py but and there is no way it could avoided.

mannyv · November 28, 2022, 1:40pm

@loicsacre,

I think I am missing a key part of what you want to do.

How do you intend to evaluate the policy without any environments?

loicsacre · November 28, 2022, 3:39pm

Hi, I will try to give more context.

During a training, I have one env for sampling data and one for evaluating the policy. Once the training is done, I would like to take back a checkpoint and perform an evaluation with it. I only need one env. Nevertheless, I am not able to do so. An env for sampling is always created (even if it is not used). I am just trying to do something like this:

agent = eval_config["trainer"](config=agent_config)
for it, checkpoint_path in enumerate(eval_config["checkpoint_path"]):
    if checkpoint_path is not None:
        agent.restore(checkpoint_path)

    results = agent.evaluate()

mannyv · November 28, 2022, 3:58pm

@loicsacre,

In the example you pointed to it creates 2 evaluation workers. Each worker will create an env.

Can you provide your full config?

loicsacre · November 29, 2022, 12:37pm

The config is defined like this:

config = {
    ...
    "num_workers": 0,
    "num_envs_per_worker": 1,
    "evaluation_num_workers": 1,
    "custom_eval_function": some_eval_fn,
    ...
}

Whatever the value of num_workers, 0 or 1, I notice that the env is initialised twice. There is one env for the remote rollout worker which performs the evaluation, which is fine, and one which is idle as I have explained before.

Topic		Replies	Views
Custom_eval without workers RLlib	0	407	April 5, 2022
[RLlib] Questions about loading checkpoint and asynchrone evaluation workers RLlib	3	593	May 26, 2021
Different Environment for training and evaluation RLlib	5	1207	July 13, 2021
Sample code for custom evaluation Ray Tune	0	363	October 31, 2021
Use a remote worker for Evaluation RLlib	5	538	July 5, 2021

Custom evaluation while avoiding unnecessary env creation

Related topics