Custom evaluation while avoiding unnecessary env creation

I would like to perform a custom evaluation (without any training) following the given script:

For instance, this could be the case when re-evaluating a policy with other parameters.

Unfortunately, an environment responsible for sample collection is always created either on the local worker or on a remote worker and is not used at any time… The problem is that my env creation is costly. I just need an for the evaluation part.

Would it be possible to disable this unnecessary env creation when performing an evaluation ? I have digged into ray/rllib/evaluation/rollout_worker.py but and there is no way it could avoided.

@loicsacre,

I think I am missing a key part of what you want to do.

How do you intend to evaluate the policy without any environments?

Hi, I will try to give more context.

During a training, I have one env for sampling data and one for evaluating the policy. Once the training is done, I would like to take back a checkpoint and perform an evaluation with it. I only need one env. Nevertheless, I am not able to do so. An env for sampling is always created (even if it is not used). I am just trying to do something like this:

agent = eval_config["trainer"](config=agent_config)
for it, checkpoint_path in enumerate(eval_config["checkpoint_path"]):
    if checkpoint_path is not None:
        agent.restore(checkpoint_path)

    results = agent.evaluate()

@loicsacre,

In the example you pointed to it creates 2 evaluation workers. Each worker will create an env.

Can you provide your full config?

The config is defined like this:

config = {
    ...
    "num_workers": 0,
    "num_envs_per_worker": 1,
    "evaluation_num_workers": 1,
    "custom_eval_function": some_eval_fn,
    ...
}

Whatever the value of num_workers, 0 or 1, I notice that the env is initialised twice. There is one env for the remote rollout worker which performs the evaluation, which is fine, and one which is idle as I have explained before.