This is a follow-up question to this post. In my application, my RL agent needs to pass some information to the environment in order for it to calculate rewards. @mannyv suggested a way that works for me: Using an on_episode_step callback, I can record the variable inside the model and pass it into the environments.
However, I’m not sure how I scale this to multiple rollout workers: Since each environment shares the same model but has different states, the model needs to keep its internal states for each of these envs. But I don’t think there is a way to give models info about which environment it is interacting with? I might be able to pass this through info output of the environment, but there doesn’t seem to be a way to assign an ID to the envs?
Each rollout_worker has an independent copy of the model. Weights are synced between all of them after updates.
The main time you would have multiple envs using the same model is if you have num_envs_per_worker > 1. But then you get the observations as a batch in the first dimension so you could just save them in the model batched in the same way.
Thank you, this makes a lot of sense. If I understand you correctly, in cases where there are more than one workers but each worker has only one env, each worker has a copy of the policy that corresponds to a single env and thus the numbering wouldn’t be a problem. If I do have num_envs_per_worker>1, will the ordering of the batches be deterministic?
An additional question: I didn’t find any attributes like worker.model, are they in worker.policies_to_train? The type check for the worker argument in on_episode_step callback is not List[RolloutWorker] but only 1 worker, does that mean the BaseEnv is only the one(s) that correspond to the particular policy?