Hi folks,
RLlib always converts each supported env to a vectorized BaseEnv independent of "num_envs_per_worker"
. Let’s assume that "num_envs_per_worker" = 4
, i.e. there are 4 replicates of the env (sub-envs).
Is my assumption correct that, in general, the env_id (of an episode) is just the index of the corresponding sub-env?
If that’s right, then it should be possible to always access all the env’s state and data in on_episode_step()
via env_running_this_episode = base_env.envs[env_id]
. I ask this, because I look for a way to log any env/episode data I want to.
Hey @klausk55 , great question! Episode IDs are different from (sub) Env IDs.
You are right that the sub Env IDs inside the vectorized BaseEnv are (normally) just ints ranging from 0
to [num_envs_per_worker - 1]
.
But episodes have independent and rather random IDs, generated when an episode is created:
From the Episode c'tor:
...
self.episode_id: int = random.randrange(2e9)
self.env_id = env_id # <- given by caller of the Episode's c'tor
....
1 Like
Yeah, you could do this in your implementation of on_episode_step
:
on_episode_step(self,
*,
worker: "RolloutWorker",
base_env: BaseEnv,
policies: Optional[Dict[PolicyID, Policy]] = None,
episode: Episode,
**kwargs) -> None:
....
sub_env_running_this_episode = base_env.get_sub_environments()[episode.env_id]
# `sub_env_running_this_episode` is (most likely) a gym.Env instance now.
1 Like
Thanks @sven1977 for your confirmation!