Finding a solution to the following is critical for project completion.
I have a project that is multiagent where each agent is identical an and has its own mapped policy. The action space is discrete, and the environment is a custom, shared multi agent environment.
The step routine receives, as expected, an action_dict that maps agents to actions.
Is there a way to intercept the creation of the action_dict during manual training (via algo.train, where the algorithm is PPO) such that the actions among agents are unique? I can’t seem to locate where in the code the action dictionary is created, and am clueless as to how to override the standard implementation. AA this
TIA
Jim Griffin