How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
I have recently updated my ray (2.38.0) to migrate towards using the new api. I have been using the action making example to train a PPO agent in a custom environment. When migrating I was getting errors and found that the example for action masking does not run.
To reproduce the error:
From ray/rllib/examples/rl_modules
python action_masking_rl_module.py --enable-new-api-stack --num-env-runners 2
The error I am getting is:
(SingleAgentEnvRunner pid=84919) module = self.module_class( [repeated 2x across cluster]
(SingleAgentEnvRunner pid=84919) TypeError: ActionMaskingRLModule.__init__() got an unexpected keyword argument 'observation_space' [repeated 2x across cluster]
It seems that in rl_module.py TypeError is not handled for model_config deprecation. However after adding this:
the example still errors:
File "<...>/ray/rllib/examples/rl_modules/classes/action_masking_rlm.py", line 131, in _preprocess_batch
action_mask = batch[Columns.OBS].pop("action_mask")
AttributeError: 'Tensor' object has no attribute 'pop'
I checked the return types and the environment returns OrderedDict’s as its Obs correctly but returns a tensor at some point causing it to crash.
Some notable warnings that I saw:
From Gymnasium:
<...>/gymnasium/envs/registration.py:693: UserWarning: WARN: Overriding environment rllib-single-agent-env-v0 already in registry.
<...>/gymnasium/utils/passive_env_checker.py:275: UserWarning: WARN: The reward returned by `step()` must be a float, int, np.integer or np.floating, actual type: <class 'numpy.ndarray'>
And obviously the ones from RLLib:
WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., learner_only=.., model_config=..)` instead. This will raise an error in the future!