How to use action masking with MultiAgentEnv? I tried modeling it after this example: ray/rllib/examples/envs/classes/action_mask_env.py at master · ray-project/ray · GitHub. Unfortunately is does not handle multiple agents. I tried this but it does not work:
self.observation_spaces = {
"player1": gym.spaces.Dict({
"board": gym.spaces.Box(low=-1.0, high=1.0, shape=(9,), dtype=np.float32),
"action_mask": gym.spaces.MultiBinary(9),
}),
"player2": gym.spaces.Dict({
"board": gym.spaces.Box(low=-1.0, high=1.0, shape=(9,), dtype=np.float32),
"action_mask": gym.spaces.MultiBinary(9),
}),
}
I am setting up a TicTacToe example so the action space is an array of 9.
Hello and welcome to the Ray community!
How is your action masking set up right now? Each agent should have an observation space that includes both the board state and the action mask. The action should return data on whether or not a move is valid or not, aka, should be updated based on the current board state, indicating which positions are available for a move.
Here is more documentation on the tic-tac-toe example which might help: Multi-Agent Environments — Ray 2.44.1
Thank you for the quick response!
Currently I penalize illegal actions with a reward of -5 just like in the example you linked. What I would like to do instead is to add action masking to my code. What would the correct syntax be? The code I provided in my question seems to fit your description but it seems that the structure is not quite right. Is there an example of a MultiAgentEnv with action masking?
What error are you getting specifically, is there a code snippet you can send where I can reproduce the error? Thank you!