Official action mask PPO example can't work

mizhou0309 · April 13, 2025, 10:38pm

1. Severity of the issue: (select one)
High: Completely blocks me.

2. Environment:

Ray version: 2.44.1
Python version: 3.12.3
OS: linux
Cloud/Infrastructure:
Other libs/tools (if relevant): None

3. What happened vs. what you expected:

Expected: Can handle action mask correctly with new API stack
Actual: The new API stack will not work, but the old API stack work (This mixed using will cause problem in future work)

I follow the action mask example to create an action-masked signal agent PPO. https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/action_masking_rl_module.py

However, when I trying to use new API to stack, it seems that the worker didn’t get the mask information form batch[Columns.OBS] (this always be an empty dictionary) and the error is: linear(): argument ‘input’ (position 1) must be Tensor, not dict.

When I use old API to stack (action mask still inherit RLModule), the code can work. However this kinds of “mix” stack will cause potential problem which I don’t want to use.

Even though I run the official code, it still get error.
How can I use action mask environment with new API stack?

2025-04-13 17:27:23,366 WARNING algorithm_config.py:4674 – You have setup a RLModuleSpec (via calling config.rl_module(...)), but have not enabled the new API stack. To enable it, call config.api_stack(enable_rl_module_and_learner=True).
2025-04-13 17:27:23,367 WARNING algorithm_config.py:4674 – You have setup a RLModuleSpec (via calling config.rl_module(...)), but have not enabled the new API stack. To enable it, call config.api_stack(enable_rl_module_and_learner=True).
(PPO pid=2039335) 2025-04-13 17:27:26,139 WARNING deprecation.py:50 – DeprecationWarning: ray.rllib.algorithms.ppo.torch.ppo_torch_rl_module.PPOTorchRLModule has been deprecated. Use ray.rllib.algorithms.ppo.torch.default_ppo_torch_rl_module.DefaultPPOTorchRLModule instead. This will raise an error in the future!
(PPO pid=2039335) 2025-04-13 17:27:26,143 WARNING algorithm_config.py:4674 – You have setup a RLModuleSpec (via calling config.rl_module(...)), but have not enabled the new API stack. To enable it, call config.api_stack(enable_rl_module_and_learner=True).
(PPO pid=2039335) 2025-04-13 17:27:31,377 WARNING algorithm_config.py:4674 – You have setup a RLModuleSpec (via calling config.rl_module(...)), but have not enabled the new API stack. To enable it, call config.api_stack(enable_rl_module_and_learner=True).
(RolloutWorker pid=2039479) 2025-04-13 17:27:29,520 WARNING deprecation.py:50 – DeprecationWarning: ray.rllib.algorithms.ppo.torch.ppo_torch_rl_module.PPOTorchRLModule has been deprecated. Use ray.rllib.algorithms.ppo.torch.default_ppo_torch_rl_module.DefaultPPOTorchRLModule instead. This will raise an error in the future! [repeated 2x across cluster]
(PPO pid=2039335) Install gputil for GPU system monitoring.
(RolloutWorker pid=2039478) 2025-04-13 17:27:35,011 ERROR actor_manager.py:187 – Worker exception caught during apply(): Invalid action (51) sent to env! valid_actions=[0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 1. 1. 0. 1. 1.
(RolloutWorker pid=2039478) 0. 1. 1. 0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 1.
(RolloutWorker pid=2039478) 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 1. 0. 1.
(RolloutWorker pid=2039478) 0. 0. 1. 1. 0. 1. 0. 1. 0. 0. 1. 1. 1. 1. 0. 1. 1. 1. 0. 0. 1. 0. 0. 1.
(RolloutWorker pid=2039478) 0. 0. 0. 0.]
(RolloutWorker pid=2039478) Traceback (most recent call last):
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/rllib/utils/actor_manager.py”, line 183, in apply
(RolloutWorker pid=2039478) return func(self, *args, **kwargs)
(RolloutWorker pid=2039478) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/rllib/execution/rollout_ops.py”, line 108, in
(RolloutWorker pid=2039478) (lambda w: w.sample(**random_action_kwargs))
(RolloutWorker pid=2039478) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/util/tracing/tracing_helper.py”, line 463, in _resume_span
(RolloutWorker pid=2039478) return method(self, *_args, **_kwargs)
(RolloutWorker pid=2039478) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 677, in sample
(RolloutWorker pid=2039478) batches = [self.input_reader.next()]
(RolloutWorker pid=2039478) ^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/rllib/evaluation/sampler.py”, line 59, in next
(RolloutWorker pid=2039478) batches = [self.get_data()]
(RolloutWorker pid=2039478) ^^^^^^^^^^^^^^^
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/rllib/evaluation/sampler.py”, line 225, in get_data
(RolloutWorker pid=2039478) item = next(self._env_runner)
(RolloutWorker pid=2039478) ^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/rllib/evaluation/env_runner_v2.py”, line 329, in run
(RolloutWorker pid=2039478) outputs = self.step()
(RolloutWorker pid=2039478) ^^^^^^^^^^^
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/rllib/evaluation/env_runner_v2.py”, line 385, in step
(RolloutWorker pid=2039478) self._base_env.send_actions(actions_to_send)
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/rllib/env/vector_env.py”, line 462, in send_actions
(RolloutWorker pid=2039478) ) = self.vector_env.vector_step(action_vector)
(RolloutWorker pid=2039478) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/rllib/env/vector_env.py”, line 358, in vector_step
(RolloutWorker pid=2039478) raise e
(RolloutWorker pid=2039478) File “/project/python01/conda/mi.z/CWOS/lib/python3.12/site-packages/ray/rllib/env/vector_env.py”, line 351, in vector_step
(RolloutWorker pid=2039478) results = self.envs[i].step(actions[i])
(RolloutWorker pid=2039478) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RolloutWorker pid=2039478) File “/home/research/mi.z/CWOSbug/results/action_mask_env.py”, line 31, in step
(RolloutWorker pid=2039478) raise ValueError(
(RolloutWorker pid=2039478) ValueError: Invalid action (51) sent to env! valid_actions=[0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 1. 1. 0. 1. 1.
(RolloutWorker pid=2039478) 0. 1. 1. 0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 1.
(RolloutWorker pid=2039478) 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 1. 0. 1.
(RolloutWorker pid=2039478) 0. 0. 1. 1. 0. 1. 0. 1. 0. 0. 1. 1. 1. 1. 0. 1. 1. 1. 0. 0. 1. 0. 0. 1.
(RolloutWorker pid=2039478) 0. 0. 0. 0.]
(RolloutWorker pid=2039479) 0. 0. 1. 1. 1. 0. 0. 1. 1. 0. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0. 1. 1. 1. 1.
(RolloutWorker pid=2039479) 0. 1. 0. 1. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0.
(RolloutWorker pid=2039479) 1. 1. 0. 0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 1. 1. 0. 1.
(RolloutWorker pid=2039479) 0. 0. 1. 0.]
(RolloutWorker pid=2039479) 0. 0. 1. 1. 1. 0. 0. 1. 1. 0. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0. 1. 1. 1. 1.
(RolloutWorker pid=2039479) 0. 1. 0. 1. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0.
(RolloutWorker pid=2039479) 1. 1. 0. 0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 1. 1. 0. 1.
(RolloutWorker pid=2039479) 0. 0. 1. 0.]

Topic		Replies	Views
Training Action Masked PPO - ValueError: all input arrays must have the same shape ok False Configure Algorithm, Training, Evaluation, Scaling	4	70	December 17, 2024
Simple multi agent setup with action masking problems RLlib	1	257	June 3, 2025
Questions and Confusion: Getting started with RLlib Configure Algorithm, Training, Evaluation, Scaling	0	53	February 19, 2025
Example for action_masking_rl_module broken? RLlib	2	282	March 2, 2025
Problem with action masking RLlib	7	2218	May 19, 2022

Official action mask PPO example can't work

Related topics