[RLlib] Questions about loading checkpoint and asynchrone evaluation workers

Hello @sven1977,

I am using evaluation workers as described in this post in order to track some metrics after training. I want evaluate my agent using different workers in parallel, with asynchrone execution (to save some time). However I have 2 concerns with that:

  1. When using multiple evaluation workers, the local worker creates an environment, even when with this config:
config["num_workers"] = 0
config["num_envs_per_worker"] = 1
config["evaluation_interval"] = 1
config["create_env_on_driver"] = False

This is a problem for me because the local worker is loaded first and with asynchronous evaluation, while loading other workers, the agent takes steps in the local workers and I do not want that. How can I force disabling this local worker in the config? I tried to kill it manually but doing so, I cannot load a new checkpoint anymore.

  1. I use a custom evaluation function where the main loop is similar to what is done by default in the trainer, i.e.:
num_rounds = int(math.ceil(self.config["evaluation_num_episodes"] /
                              self.config["evaluation_num_workers"]))
num_workers = len(self.evaluation_workers.remote_workers())
num_episodes = num_rounds * num_workers
for i in range(num_rounds):
    ray.get([w.sample.remote() for w in self.evaluation_workers.remote_workers()])

With asynchrone execution, of course, the number of episodes per worker differs and so num_episodes can be a bit higher that the value set in the config. Is there a way to pause the evaluation workers if they completed num_rounds episodes, or to stop the evaluation if num_episodes is reached? The idea would be to complete exactly num_episodes episodes in total, regardless of the workers used.

Thank you in advance for your reply!

Hey @Fabien-Couthouis , great question! :slight_smile:
Could you try it with this config?

evaluation_num_workers: 1  # Will create a separate evaluation worker set w/ 1 remote worker (w/ env).
evaluation_interval: 1
num_workers: 0
create_env_on_driver = False  # This will not work as num_workers=0

But I get your point (you are probably using an external env :wink: ): Could you try setting your env to:
ray.rllib.examples.env.random_env.RandomEnv(observation_space=..., action_space=...) (using the spaces of your actual external env)?

Note that create_env_on_driver=False only works if you have more than just the local worker (i.e. your num_workers must be > 0 for this setting to be respected). Otherwise, RLlib will have no env at all to work with. I’ll add a warning if create_env_on_driver=True and num_workers=0.

Great catch, though: For external env purposes, we should not force the user to have any env on the local worker (ideally, one would want the external env to provide space information, so even the hack with the RandomEnv is no longer necessary). I’ll fix this restriction and add some more useful warnings.

1 Like

Thanks for the answer @sven1977!

In fact I am not using an external env for this use case but I agree on the external env to provide spaces information. I used to create a hacky “FakeEnv” with only action and observation spaces to get around this.

Concerning the config, I had already set an evaluation_num_workers config key in my config file, I have something like that:

config["num_workers"] = 0
config["evaluation_num_workers"] = 10
config["evaluation_num_episodes"] = 400
config["num_envs_per_worker"] = 1
config["evaluation_interval"] = 1
config["create_env_on_driver"] = False
config["sample_async"] = True

I understand that create_env_on_driver key cannot be set to False with num_workers=0 but what about evaluation_num_workers? If we have an evaluation worker set (evaluation_num_workers > 0), I do not get why we could not set create_env_on_driver to False (i.e. we would have our evaluation environments created on eval workers but without environment on driver).