Hi dear, I tried to train my agent, but fail into many unnecessary actions.
For example, action space is [Discrete(3) * 3], while indeed [1, 2, 2] and [2, 1, 2] are the same action. So the agent does not need to do it again. And even fall into local optima because of this.
Is there any way to add some constraints to mask these unnecessary actions? Only the value of space matters, not the sequence of its elements.
Thank you!