Training Action Masked PPO - ValueError: all input arrays must have the same shape ok False

Migrating from old api, I have made changes to the action_masking_rlm.py to make the example work (action_masking_rl_module.py). However, in my own environment setup I am getting the error:

I am not sure what is causing it. I have added shape checking to the action masking RL module to enforce the shapes of observations and action masks, but this never returns an error.

Some insight on where to look or how this process is meant to work would be much appreciated.

Similar issue - runtime objects to having a discrete action space. This is in a PPO optimization with action-masking.

Hello, please try to reproduce with latest version of action masking example for new API stack. If it is not working, open issue at GH including your config and module files.

Well, on a fresh install of ray[default], yielding ray2.4.0, try running the following example:

cd ~/anaconda3/lib/python3.12/site-packages/ray/rllib/examples/rl_module
python ./action_masking_rl_module.py --enable-new-api-stack --num-env-runners 2

The error is:
TypeError: ActionMaskingRLModule.init() got an unexpected keyword ‘observation_space’

I have yet to find an example from a fresh install of ray 2.40.0 where action-masking actually works, which is pretty much a show-stopper for me.

For me, this has been a common complaint. I go back to ray 0.8-something. What I routinely find is that examples just don’t work, because the code base has outpaced the examples. I can understand that on daily releases of 3.0.0.dev0, but not for releases like 2.40.0