I am trying to learn RLlib for training a PPO agent in a vectorized custom gym environment (named MNISTExib-v0
) where each environment is instantiated with a different configuration.
I am currently able to train PPO in a single or vectorized environment using the same environment configuration:
def env_creator(env_config):
return MNISTExib(**env_config)
# Register env
register_env('MNISTExib-v0', env_creator)
# Load a list of environment configurations
with open('env_configs.yaml') as f:
env_configs = yaml.safe_load(f)
# Configure PPO algorithm
config = (
get_trainable_cls('PPO')
.get_default_config()
.environment(
'MNISTExib-v0',
env_config=env_configs[0]
)
.env_runners(
num_env_runners=0,
num_envs_per_env_runner=1,
)
.rl_module(
model_config=DefaultModelConfig(
conv_activation="relu",
head_fcnet_hiddens=[256],
vf_share_layers=True,
conv_filters=[(16, 4, 2), (32, 4, 2)],
)
)
)
# Build PPO agent
agent = config.build_algo()
# Train PPO agent
train_res = agent.train()
Ideally, I would like to set num_envs_per_env_runner=8
and pass a list env_configs
of size 8
to train in parallel on 8 MNISTExib-v0
environments instantiated with different configurations.
Is this someway possible? Or is there any workaround that does not require changes to MNISTExib
class?
Thank you!