[RLlib] Questions about loading checkpoint and asynchrone evaluation workers

Fabien-Couthouis · May 24, 2021, 2:32pm

Hello @sven1977,

I am using evaluation workers as described in this post in order to track some metrics after training. I want evaluate my agent using different workers in parallel, with asynchrone execution (to save some time). However I have 2 concerns with that:

When using multiple evaluation workers, the local worker creates an environment, even when with this config:

config["num_workers"] = 0
config["num_envs_per_worker"] = 1
config["evaluation_interval"] = 1
config["create_env_on_driver"] = False

This is a problem for me because the local worker is loaded first and with asynchronous evaluation, while loading other workers, the agent takes steps in the local workers and I do not want that. How can I force disabling this local worker in the config? I tried to kill it manually but doing so, I cannot load a new checkpoint anymore.

I use a custom evaluation function where the main loop is similar to what is done by default in the trainer, i.e.:

num_rounds = int(math.ceil(self.config["evaluation_num_episodes"] /
                              self.config["evaluation_num_workers"]))
num_workers = len(self.evaluation_workers.remote_workers())
num_episodes = num_rounds * num_workers
for i in range(num_rounds):
    ray.get([w.sample.remote() for w in self.evaluation_workers.remote_workers()])

With asynchrone execution, of course, the number of episodes per worker differs and so num_episodes can be a bit higher that the value set in the config. Is there a way to pause the evaluation workers if they completed num_rounds episodes, or to stop the evaluation if num_episodes is reached? The idea would be to complete exactly num_episodes episodes in total, regardless of the workers used.

Thank you in advance for your reply!

sven1977 · May 26, 2021, 1:01pm

Hey @Fabien-Couthouis , great question!
Could you try it with this config?

evaluation_num_workers: 1  # Will create a separate evaluation worker set w/ 1 remote worker (w/ env).
evaluation_interval: 1
num_workers: 0
create_env_on_driver = False  # This will not work as num_workers=0

But I get your point (you are probably using an external env ): Could you try setting your env to:
ray.rllib.examples.env.random_env.RandomEnv(observation_space=..., action_space=...) (using the spaces of your actual external env)?

Note that create_env_on_driver=False only works if you have more than just the local worker (i.e. your num_workers must be > 0 for this setting to be respected). Otherwise, RLlib will have no env at all to work with. I’ll add a warning if create_env_on_driver=True and num_workers=0.

sven1977 · May 26, 2021, 1:03pm

Great catch, though: For external env purposes, we should not force the user to have any env on the local worker (ideally, one would want the external env to provide space information, so even the hack with the RandomEnv is no longer necessary). I’ll fix this restriction and add some more useful warnings.

Fabien-Couthouis · May 26, 2021, 2:07pm

Thanks for the answer @sven1977!

In fact I am not using an external env for this use case but I agree on the external env to provide spaces information. I used to create a hacky “FakeEnv” with only action and observation spaces to get around this.

Concerning the config, I had already set an evaluation_num_workers config key in my config file, I have something like that:

config["num_workers"] = 0
config["evaluation_num_workers"] = 10
config["evaluation_num_episodes"] = 400
config["num_envs_per_worker"] = 1
config["evaluation_interval"] = 1
config["create_env_on_driver"] = False
config["sample_async"] = True

I understand that create_env_on_driver key cannot be set to False with num_workers=0 but what about evaluation_num_workers? If we have an evaluation worker set (evaluation_num_workers > 0), I do not get why we could not set create_env_on_driver to False (i.e. we would have our evaluation environments created on eval workers but without environment on driver).

Topic		Replies	Views
Use a remote worker for Evaluation RLlib	5	538	July 5, 2021
RLlib: using evaluation workers on previously trained models RLlib	7	2240	December 8, 2022
Custom evaluation while avoiding unnecessary env creation Configure Algorithm, Training, Evaluation, Scaling	4	549	November 29, 2022
When num_workers=n , Why the total number of environments is n+1 RLlib	6	398	December 21, 2023
Sample code for custom evaluation Ray Tune	0	363	October 31, 2021

[RLlib] Questions about loading checkpoint and asynchrone evaluation workers

Related topics