Let’s say that I’m building my own Policy similar to AlwaysSameHeuristic from the rock_paper_scissors_multiagent.py, but instead of randomly choosing an action, I want to pass in the action as a parameter. So we change the config to:
config[“multiagent”][“policies”][“always_same”] = (AlwaysSameHeuristic, Discrete(3), Discrete(3), {“deterministic_action”: 0})
How does AlwaysSameHeuristic access “deterministic_action”? Policy.__init__() is given a TrainerConfigDict which means we could access the AlwaysSameHeuristic’s parameters by self.config[“multiagent”][“policies”][“always_same”], but if I made two policies “always_same1” and “always_same2” which have different deterministic actions passed in as parameters, how does the policy know whether to select self.config[“multiagent”][“policies”][“always_same1”] or self.config[“multiagent”][“policies”][“always_same2”]?