Is env_id (of an episode) just equal to index of sub-env?

Hi folks,

RLlib always converts each supported env to a vectorized BaseEnv independent of "num_envs_per_worker". Let’s assume that "num_envs_per_worker" = 4, i.e. there are 4 replicates of the env (sub-envs).
Is my assumption correct that, in general, the env_id (of an episode) is just the index of the corresponding sub-env?
If that’s right, then it should be possible to always access all the env’s state and data in on_episode_step() via env_running_this_episode = base_env.envs[env_id]. I ask this, because I look for a way to log any env/episode data I want to.

Hey @klausk55 , great question! Episode IDs are different from (sub) Env IDs.
You are right that the sub Env IDs inside the vectorized BaseEnv are (normally) just ints ranging from 0 to [num_envs_per_worker - 1].
But episodes have independent and rather random IDs, generated when an episode is created:

From the Episode c'tor:

        ...
        self.episode_id: int = random.randrange(2e9)
        self.env_id = env_id  # <- given by caller of the Episode's c'tor
        ....
1 Like

Yeah, you could do this in your implementation of on_episode_step:

on_episode_step(self,
                        *,
                        worker: "RolloutWorker",
                        base_env: BaseEnv,
                        policies: Optional[Dict[PolicyID, Policy]] = None,
                        episode: Episode,
                        **kwargs) -> None:
    ....
    sub_env_running_this_episode = base_env.get_sub_environments()[episode.env_id]
    # `sub_env_running_this_episode` is (most likely) a gym.Env instance now.
1 Like

Thanks @sven1977 for your confirmation!