@RickLan thanks for your thoughts! I’ll give it a try.
Additionally, my most recent idea is to establish some kind of hierarchical RL with a further agent (“supervisor”) at an upper (or lower) level who decides which of the ready agents may first decide on his next action. Then agents couldn’t choose an identical action at the same time.
Does anyone have experiences with something similar?