Same final error message as described in Training Action Masked PPO - ValueError: all input arrays must have the same shape ok False , but different setting, i.e. here multi-agent, and there single-agent action masking. Recommend to open GH issue, the RLModule
API is actively developed at the moment.