Issues Implementing MultiDiscrete Action Space with Action Masks in Custom Environment

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hello everyone,

I’ve been developing an approach for RL training on PySC2 - StarCraft II Learning Environment using Ray/RLlib and encountered several issues related to implementing a MultiDiscrete action space, when trying to incorporate action masks. I’m seeking advice or solutions from the community to resolve these problems.

Environment Overview:

My environment is based on the PySC2 environment, where I have defined a MultiDiscrete action space with dimensions [number_of_actions, 85, 85] to accommodate various types of actions along with their corresponding x and y coordinates on a grid. I also aimed to include action masks to restrict the available actions in each step dynamically.

Issues Encountered:

Inconsistent Shapes with MultiDiscrete and Action Masks: When trying to utilize action masks with the MultiDiscrete action space, I encountered errors suggesting a mismatch in the expected shapes and formats of the action masks. Specifically, the environment checking module raised errors indicating that the mask structure did not match the expected format for MultiDiscrete spaces.

Error Messages:

  1. AssertionError regarding mask format: When I tried using a tuple for the action masks, I received an error:
AssertionError: Expects the mask to be a tuple for sub_nvec ([15, 85, 85]), actual type: <class 'numpy.ndarray'>`

This error suggested that the action mask format did not align with the expected structure for MultiDiscrete spaces, leading me to consider using numpy arrays instead.

  1. ValueError related to inhomogeneous shapes: Adjusting the mask to a numpy array, or trying different structuring led to a ValueError indicating an issue with array shapes:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimension. The detected shape was (3,) + inhomogeneous part.

This message implied a mismatch between the expected and provided shapes of the action masks.

  1. Mismatch in expected structure: Further adjustments to align with the MultiDiscrete space requirements resulted in errors during environment checking, similar to:
The above error has been found in your environment! ... AssertionError: Expects the mask to be a tuple for sub_nvec ([15, 85, 85]), actual type: <class 'numpy.ndarray'>

And:

Entire first structure: {'screen': ., 'minimap': ., 'action_mask': (., ., .), 'player_info': .}
Entire second structure: OrderedDict([('action_mask', .), ('minimap', .), ('player_info', .), ('screen', .)])

These messages pointed to a fundamental mismatch between the provided action mask and the expected structure or format by the MultiDiscrete action space, without a clear direction on the correct format or structure.

Attempts at Resolution:

  • I have tried different formats and structures for the action masks, including tuples and numpy arrays, attempting to align with the requirements of the MultiDiscrete action space.
  • I’ve reviewed the RLlib and Gymnasium documentation for guidance on implementing action masks with MultiDiscrete spaces but haven’t found a clear solution that addresses the issues I’m facing.

Questions:

  1. Has anyone successfully implemented action masks with a MultiDiscrete action space in a custom environment and can share insights or examples?
  2. Are there specific requirements or formats for action masks when used with MultiDiscrete action spaces that I might be overlooking?
  3. Any suggestions on debugging strategies or alternative approaches to dynamically restrict actions in a MultiDiscrete space?

I appreciate any guidance, insights, or references to documentation/examples that could help resolve these issues. Thank you in advance for your assistance!

Did find a workaround with avoiding MultiDiscrete

My workaround was now:

        "action_mask": gym.spaces.Dict({
            "action_type": gym.spaces.MultiBinary(self.number_of_actions),
            "x": gym.spaces.MultiBinary(85),
            "y": gym.spaces.MultiBinary(85),

    self.action_space = gym.spaces.Dict({
        "action_type": gym.spaces.Discrete(self.number_of_actions),
        "x": gym.spaces.Discrete(85),
        "y": gym.spaces.Discrete(85),
    })

I got the idea from RLlib and gym.space, where a blog entry mentioned problems with MultiBinary and MultiDiscrete. Atleast with MultiBinary I didn’t have problems.