How to vary observation space in multi-agent training using tune.run()

I have a custom multi-agent environment class. It supports different types of observation_space for study. For example:


class MyEnvCls(MultiAgentEnv):
  # etc

config = {
  "env": MyEnvCls,
  "env_config": {
    "obs_type" : tune.grid_search(["type1", "type2"]),
  },
}
config["multiagent"] = {
  "policies" : { # (policy_cls, obs_space, act_space, config)
    "agent_{}".format(x): (None, some_observation_space, MyEnvCls.action_space, {}) for x in range(3)
  },
  "policy_mapping_fn": lambda x: "{}".format(x),
}

tune.run(
  "A3C", 
  name="study",
  config=config, 
  stop=stop, 
)

How do I implement some_observation_space so that when tune runs “type1”, it is a different gym.space from when tune runs “type2”?

Hi @RickLan,

You can find more info on sample_from here: Tune Custom/Conditional Search Spaces

Here is one way to do it.

class MyEnvCls(MultiAgentEnv):
    @staticmethod
    def MyEnvCls.get_observation_space(env_type)
        ...

config["multiagent"] = {
  "policies" : tune.sample_from( lambda spec: 
     {"agent_{}".format(x): (None, 
         MyEnvCls.get_observation_space(spec.config.env_config.obs_type),
         MyEnvCls.action_space, {}) for x in range(3)}),
      "policy_mapping_fn": lambda x: "{}".format(x),
}

Thank you @mannyv ! Let me try