MARL modeling issue

sven1977 · March 31, 2021, 12:54pm

I think if you need to keep the synchronous nature of your agents stepping at the same time, then providing negative rewards for identical actions would be best (question is still, which agent is allowed to decide first and won’t get the penalty).
Otherwise, in case you would like to change your dynamics to be sequential, action masking may also help (agent0 picks a0, agent1 gets the respective action mask in its observation (provided by the env) and uses it to sample, but NOT a0). This would be similar to @RickLan 's postprocess suggestion. I think each of these approaches has its advantages and disadvantages.

Topic		Replies	Views
Adding priority to MARL RLlib	5	695	October 19, 2021
Adding virtual agents in MARL RLlib	1	463	October 3, 2021
Multi-Agent Transformer RLlib	5	1189	September 21, 2022
An example of RLLib used with multiple neural networks RLlib	2	362	June 29, 2022
Marwil : Postprocessing of multi-agent data not implemented yet RLlib	1	324	April 5, 2022

MARL modeling issue

Related topics