Hello! I’m running 36 episodes per evaluation with 9 eval workers. I assume this would mean each worker gets 36/9 = 4 episodes. Is there a way to acces the episode id (1-4) for each worker in a callback? I’d like to set certain parameters for every episode depending on the episode id before the start of the episode.
A workaround is to use 1 worker per episode and acces the worker id with config.worker_index.
Hi there,
Each worker runs 3̶6̶ 36/9 times the number of envs per worker. (check @sven1977’s answer)
I checked the DefaultCallbacks and its episode field has a random episode id. You could still accomplish what you want by using something as follows.
class MyCallback(DefaultCallbacks):
"""
Desc
"""
epoch = 0
def on_episode_step(self,*,worker,base_env,episode,env_index,**kwargs):
....
def on_episode_end(self,*,worker, base_env, policies, episode, env_index, **kwargs):
self.epoch=(self.epoch+1)%36
It’s a bit hacky because I currently do not know when/if the other callback functions are called in an evaluation step.
Edit: By episode id I assumed you meant episode number in the evaluation. Otherwise you could access it using episode.episode_id
Edit2: Correcting number of rounds. Thanks @sven1977 for the correction. I’m sorry for the confusion this might have caused.
Hey @vineet54 and @Sertingolix , thanks for the question and help with this!
One correction: If you have > 1 eval workers (e.g. 3), the episodes (e.g. 36) are indeed split between the different workers, so that each worker would only run 9 episodes. The code that does this is in trainer.py:
num_rounds = int(
math.ceil(self.config["evaluation_num_episodes"] /
self.config["evaluation_num_workers"]))
# num_rounds = 9
num_workers = len(self.evaluation_workers.remote_workers())
# num_workers = 3
num_episodes = num_rounds * num_workers
# num_episodes = 36
# run 9 episodes (num_rounds) per eval worker:
for i in range(num_rounds):
logger.info("Running round {} of parallel evaluation "
"({}/{} episodes)".format(
i, (i + 1) * num_workers, num_episodes))
ray.get([
w.sample.remote()
for w in self.evaluation_workers.remote_workers()
])
Thanks. I should mention that each worker gets a copy of the callback. So it should be self.epoch = (self.epoch + 1) % 4