Get episode id during evaluation

Hello! I’m running 36 episodes per evaluation with 9 eval workers. I assume this would mean each worker gets 36/9 = 4 episodes. Is there a way to acces the episode id (1-4) for each worker in a callback? I’d like to set certain parameters for every episode depending on the episode id before the start of the episode.
A workaround is to use 1 worker per episode and acces the worker id with config.worker_index.

Hi there,
Each worker runs 3̶6̶ 36/9 times the number of envs per worker. (check @sven1977’s answer)

I checked the DefaultCallbacks and its episode field has a random episode id. You could still accomplish what you want by using something as follows.

class MyCallback(DefaultCallbacks):
    """ 
    Desc
    """
    epoch = 0
    
    def on_episode_step(self,*,worker,base_env,episode,env_index,**kwargs):
        ....

    def on_episode_end(self,*,worker, base_env, policies, episode, env_index, **kwargs):

    self.epoch=(self.epoch+1)%36

It’s a bit hacky because I currently do not know when/if the other callback functions are called in an evaluation step.

Edit: By episode id I assumed you meant episode number in the evaluation. Otherwise you could access it using episode.episode_id

Edit2: Correcting number of rounds. Thanks @sven1977 for the correction. I’m sorry for the confusion this might have caused.

2 Likes

Hey @vineet54 and @Sertingolix , thanks for the question and help with this!

One correction: If you have > 1 eval workers (e.g. 3), the episodes (e.g. 36) are indeed split between the different workers, so that each worker would only run 9 episodes. The code that does this is in trainer.py:

                num_rounds = int(
                    math.ceil(self.config["evaluation_num_episodes"] /
                              self.config["evaluation_num_workers"]))
                # num_rounds = 9
                num_workers = len(self.evaluation_workers.remote_workers())
                # num_workers = 3
                num_episodes = num_rounds * num_workers
                # num_episodes = 36

                # run 9 episodes (num_rounds) per eval worker:
                for i in range(num_rounds):
                    logger.info("Running round {} of parallel evaluation "
                                "({}/{} episodes)".format(
                                    i, (i + 1) * num_workers, num_episodes))
                    ray.get([
                        w.sample.remote()
                        for w in self.evaluation_workers.remote_workers()
                    ])
2 Likes

Thanks. I should mention that each worker gets a copy of the callback. So it should be self.epoch = (self.epoch + 1) % 4

1 Like