How do I get Policy specific parameters?

Let’s say that I’m building my own Policy similar to AlwaysSameHeuristic from the rock_paper_scissors_multiagent.py, but instead of randomly choosing an action, I want to pass in the action as a parameter. So we change the config to:

config[“multiagent”][“policies”][“always_same”] = (AlwaysSameHeuristic, Discrete(3), Discrete(3), {“deterministic_action”: 0})

How does AlwaysSameHeuristic access “deterministic_action”? Policy.__init__() is given a TrainerConfigDict which means we could access the AlwaysSameHeuristic’s parameters by self.config[“multiagent”][“policies”][“always_same”], but if I made two policies “always_same1” and “always_same2” which have different deterministic actions passed in as parameters, how does the policy know whether to select self.config[“multiagent”][“policies”][“always_same1”] or self.config[“multiagent”][“policies”][“always_same2”]?

Hey @kane0058: If you have this config:

multiagent:
    policies:
        always_same: (None, [obs-space], [action-space], {"deterministic_action": 0})
        always_same2: (None, [obs-space], [action-space], {"deterministic_action": 1})

Then your Trainer will have two different policies, each of which will receive a config parameter in its c’tor, which contains the deterministic_action key (with different values (0 and 1) for the two policies).

Does this make sense?

The policy that’s picked is determined by the policy_mapping_fn function that you also specify in your “multiagent” config. It’s a function that takes an agent string (coming from the env) and maps it to one of these 2 policies (policy IDs).

Yes it does. I can’t believe I missed that. Thank you for the help.

1 Like

Cool! :slight_smile: Glad to help!