How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hello everyone,
I’ve been developing an approach for RL training on PySC2 - StarCraft II Learning Environment using Ray/RLlib and encountered several issues related to implementing a MultiDiscrete action space, when trying to incorporate action masks. I’m seeking advice or solutions from the community to resolve these problems.
Environment Overview:
My environment is based on the PySC2 environment, where I have defined a MultiDiscrete action space with dimensions [number_of_actions, 85, 85] to accommodate various types of actions along with their corresponding x and y coordinates on a grid. I also aimed to include action masks to restrict the available actions in each step dynamically.
Issues Encountered:
Inconsistent Shapes with MultiDiscrete and Action Masks: When trying to utilize action masks with the MultiDiscrete action space, I encountered errors suggesting a mismatch in the expected shapes and formats of the action masks. Specifically, the environment checking module raised errors indicating that the mask structure did not match the expected format for MultiDiscrete spaces.
Error Messages:
- AssertionError regarding mask format: When I tried using a tuple for the action masks, I received an error:
AssertionError: Expects the mask to be a tuple for sub_nvec ([15, 85, 85]), actual type: <class 'numpy.ndarray'>`
This error suggested that the action mask format did not align with the expected structure for MultiDiscrete spaces, leading me to consider using numpy arrays instead.
- ValueError related to inhomogeneous shapes: Adjusting the mask to a numpy array, or trying different structuring led to a
ValueErrorindicating an issue with array shapes:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimension. The detected shape was (3,) + inhomogeneous part.
This message implied a mismatch between the expected and provided shapes of the action masks.
- Mismatch in expected structure: Further adjustments to align with the
MultiDiscretespace requirements resulted in errors during environment checking, similar to:
The above error has been found in your environment! ... AssertionError: Expects the mask to be a tuple for sub_nvec ([15, 85, 85]), actual type: <class 'numpy.ndarray'>
And:
Entire first structure: {'screen': ., 'minimap': ., 'action_mask': (., ., .), 'player_info': .}
Entire second structure: OrderedDict([('action_mask', .), ('minimap', .), ('player_info', .), ('screen', .)])
These messages pointed to a fundamental mismatch between the provided action mask and the expected structure or format by the MultiDiscrete action space, without a clear direction on the correct format or structure.
Attempts at Resolution:
- I have tried different formats and structures for the action masks, including tuples and numpy arrays, attempting to align with the requirements of the
MultiDiscreteaction space. - I’ve reviewed the RLlib and Gymnasium documentation for guidance on implementing action masks with
MultiDiscretespaces but haven’t found a clear solution that addresses the issues I’m facing.
Questions:
- Has anyone successfully implemented action masks with a
MultiDiscreteaction space in a custom environment and can share insights or examples? - Are there specific requirements or formats for action masks when used with
MultiDiscreteaction spaces that I might be overlooking? - Any suggestions on debugging strategies or alternative approaches to dynamically restrict actions in a
MultiDiscretespace?
I appreciate any guidance, insights, or references to documentation/examples that could help resolve these issues. Thank you in advance for your assistance!