I’m not sure how to handle the following case issue I have:
In my MARL use case ready agents can choose to do the identical action at some time, but actually this action only can be executed by one agent and not two or more agents at the same time!
So far, I’m not sure how to best handle such situations? Ideas I’ve thought about are
simply prohibit the agents from doing the identical action at the same time and “reward” these agents with some penalty (hoping the agents will learn it)
maybe use some kind of a conditional action distribution (comparable to this pattern)
alternatively break such situations of more ready agents at the same time and artifically process each ready agent successively in marginal timesteps
I would be really glad about any suggestions or please let me know how you manage it.