How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hello everyone,
I’ve been developing an approach for RL training on PySC2 - StarCraft II Learning Environment using Ray/RLlib and encountered several issues related to implementing a MultiDiscrete
action space, when trying to incorporate action masks. I’m seeking advice or solutions from the community to resolve these problems.
Environment Overview:
My environment is based on the PySC2 environment, where I have defined a MultiDiscrete
action space with dimensions [number_of_actions, 85, 85]
to accommodate various types of actions along with their corresponding x and y coordinates on a grid. I also aimed to include action masks to restrict the available actions in each step dynamically.
Issues Encountered:
Inconsistent Shapes with MultiDiscrete and Action Masks: When trying to utilize action masks with the MultiDiscrete
action space, I encountered errors suggesting a mismatch in the expected shapes and formats of the action masks. Specifically, the environment checking module raised errors indicating that the mask structure did not match the expected format for MultiDiscrete
spaces.
Error Messages:
- AssertionError regarding mask format: When I tried using a tuple for the action masks, I received an error:
AssertionError: Expects the mask to be a tuple for sub_nvec ([15, 85, 85]), actual type: <class 'numpy.ndarray'>`
This error suggested that the action mask format did not align with the expected structure for MultiDiscrete
spaces, leading me to consider using numpy arrays instead.
- ValueError related to inhomogeneous shapes: Adjusting the mask to a numpy array, or trying different structuring led to a
ValueError
indicating an issue with array shapes:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimension. The detected shape was (3,) + inhomogeneous part.
This message implied a mismatch between the expected and provided shapes of the action masks.
- Mismatch in expected structure: Further adjustments to align with the
MultiDiscrete
space requirements resulted in errors during environment checking, similar to:
The above error has been found in your environment! ... AssertionError: Expects the mask to be a tuple for sub_nvec ([15, 85, 85]), actual type: <class 'numpy.ndarray'>
And:
Entire first structure: {'screen': ., 'minimap': ., 'action_mask': (., ., .), 'player_info': .}
Entire second structure: OrderedDict([('action_mask', .), ('minimap', .), ('player_info', .), ('screen', .)])
These messages pointed to a fundamental mismatch between the provided action mask and the expected structure or format by the MultiDiscrete
action space, without a clear direction on the correct format or structure.
Attempts at Resolution:
- I have tried different formats and structures for the action masks, including tuples and numpy arrays, attempting to align with the requirements of the
MultiDiscrete
action space. - I’ve reviewed the RLlib and Gymnasium documentation for guidance on implementing action masks with
MultiDiscrete
spaces but haven’t found a clear solution that addresses the issues I’m facing.
Questions:
- Has anyone successfully implemented action masks with a
MultiDiscrete
action space in a custom environment and can share insights or examples? - Are there specific requirements or formats for action masks when used with
MultiDiscrete
action spaces that I might be overlooking? - Any suggestions on debugging strategies or alternative approaches to dynamically restrict actions in a
MultiDiscrete
space?
I appreciate any guidance, insights, or references to documentation/examples that could help resolve these issues. Thank you in advance for your assistance!