I want to know how to parallelize the learning to learn a fixed scenario in each env correctly. I now use on_environment_created() in Rllibcallback to assign different scenarios to different env_runners (each runner runs 1 env). So the assignment is like following, runner1 always runs scenario1, runner2 always runs scenario2, etc… And it assigns based on env_runner.worker_index. Is this the correct way of doing it or there’s better way, e.g. use connector or etc…?
Yes, assigning different scenarios to each env_runner based on env_runner.worker_index is a correct and recommended approach in RLlib. The env_config passed to each environment instance includes worker_index, which you can use to select the scenario for that environment. This is documented in the official RLlib environment guide, which shows how to use config.worker_index to customize each environment instance, ensuring each EnvRunner always runs its assigned scenario. Using a connector is not necessary for this use case; the current method is standard and efficient for scenario assignment in parallelized training setups, including multi-agent PPO.
For example, you can implement this by customizing your environment’s constructor to select the scenario based on config.worker_index, as shown in the RLlib documentation:
class EnvDependingOnWorkerAndVectorIndex(gym.Env):
def __init__(self, config):
scenario = choose_scenario_for(config.worker_index)
# Initialize env with scenario