ValueError: `RLModule(config=[RLModuleConfig])` has been deprecated- New API Stack

Ali_Zargarian · March 21, 2025, 8:31am

High: Completely blocks me.

Hello,
During the training my costume environment in gym with Rllib(new API Stack) I get this error and couldn’t solve it up to now I have defined my observations as Dictionary. It works well by old API Stack but not with new one. It completely blocks me to go forward. In order to have a better and simpler understanding I started to train the example in Rllib github in below link:

github.com/ray-project/ray

rllib/examples/envs/classes/cartpole_with_dict_observation_space.py

master

import gymnasium as gym
from gymnasium.envs.classic_control import CartPoleEnv
import numpy as np


class CartPoleWithDictObservationSpace(CartPoleEnv):
    """CartPole gym environment that has a dict observation space.

    However, otherwise, the information content in each observation remains the same.

    https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/classic_control/cartpole.py  # noqa

    The new observation space looks as follows (a little quirky, but this is
    for testing purposes only):

    gym.spaces.Dict({
        "x-pos": [x-pos],
        "angular-pos": gym.spaces.Dict({"test": [angular-pos]}),
        "velocs": gym.spaces.Tuple([x-veloc, angular-veloc]),
    })

This file has been truncated. show original

Unfortunately I get the same error. as clarification I define the config as below :

config = (
PPOConfig()
.api_stack(
enable_rl_module_and_learner=True,
enable_env_runner_and_connector_v2=True,
)
.environment(
env=“CartPoleWithDictObservationSpace”,
)
.training(
lr = agent_params[“lr”], # Learning rate
entropy_coeff= agent_params[“entropy_coeff”], # Encourage exploration with entropy regularization
)
.env_runners(num_env_runners = agent_params[“num_env”]) # Number of parallel environments
.framework(“torch”)

.rl_module(
rl_module_spec=RLModuleSpec(
    module_class=PPOTorchRLModule,
    inference_only=False,
    learner_only=False,
    observation_space=env_instance.observation_space,
    action_space=env_instance.action_space,
    model_config={"hidden_sizes": [128, 128]},  # Optional, passed to RLModule
)
)

Environment:

Ray version: 2.43.0
Python version: 3.12.9

thanks

christina · March 25, 2025, 12:59am

Hi Ali! Have you had a chance to look at the old → new API migration guide? New API stack migration guide — Ray 2.44.0

I took a look and it seems like someone on Github opened up a similar issue, is it similar to the one you’re experiencing? RLlib new API stack false deprecation warning / MultiRLModuleSpec · Issue #51630 · ray-project/ray · GitHub

Ali_Zargarian · March 25, 2025, 8:40am

Hi Christina ,
firstly , thanks for your reply.
yes, I’ve seen that lots of times. As you see I arranged the rlmodule according the new API’s arrangement and also warning deprecation. I think New API module can not handle the dictionary observation. Actually I couldn’t find some resources for training the agent with dict observation and New API stack . For some reasons I need to define my observations as dictionary and I’m eager to train it with new API stack. I would be so appreciable if you give me some hints or resources to address the issue more direct.
Thanks

jiangpeng_li · March 26, 2025, 2:42am

Hi. Have you tried to remove that

observation_space=env_instance.observation_space,
action_space=env_instance.action_space,

from the RLModuleSpec() definition? It seems that those configurations can be figured out by the RLModuleSpec itself.

sven1977 · March 26, 2025, 8:34am

Hey @Ali_Zargarian , could you provide some minimal reproduction script, so we can debug this problem on our end?
Thanks!

Ali_Zargarian · March 26, 2025, 11:15am

Hi @sven1977 ,
Thanks for rapid reply
As I said I took the environment from ray example Github as below:

github.com/ray-project/ray

rllib/examples/envs/classes/cartpole_with_dict_observation_space.py

master

import gymnasium as gym
from gymnasium.envs.classic_control import CartPoleEnv
import numpy as np


class CartPoleWithDictObservationSpace(CartPoleEnv):
    """CartPole gym environment that has a dict observation space.

    However, otherwise, the information content in each observation remains the same.

    https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/classic_control/cartpole.py  # noqa

    The new observation space looks as follows (a little quirky, but this is
    for testing purposes only):

    gym.spaces.Dict({
        "x-pos": [x-pos],
        "angular-pos": gym.spaces.Dict({"test": [angular-pos]}),
        "velocs": gym.spaces.Tuple([x-veloc, angular-veloc]),
    })

This file has been truncated. show original

and my training script is as below:

from cartpole_with_dict_observation_space import CartPoleWithDictObservationSpace

def env_creator(config):
    return CartPoleWithDictObservationSpace()

# Register the custom environment
register_env("CartPoleWithDictObservationSpace", env_creator)


env_instance = CartPoleWithDictObservationSpace()
config = (
    PPOConfig()
    .api_stack(
        enable_rl_module_and_learner=True,
        enable_env_runner_and_connector_v2=True,
    )
    .environment(
        env="CartPoleWithDictObservationSpace",
    )
    .training(
        lr = 0.0001,  # Learning rate
        entropy_coeff= 0.01,  # Encourage exploration with entropy regularization
    )
    .env_runners(num_env_runners = 1)  # Number of parallel environments
    .framework("torch")

)
RLModule(
    rl_module_spec=RLModuleSpec(
        module_class=PPOTorchRLModule,
        inference_only=False,
        learner_only=False,
        observation_space=env_instance.observation_space,
        action_space=env_instance.action_space,
        model_config={"hidden_sizes": [128, 128]},  # Optional, passed to RLModule
        catalog_class=PPOCatalog,
        )
    )

tuner = tune.Tuner(
    "PPO",
    param_space=config,
    run_config=tune.RunConfig(
        stop={"training_iteration": 5},  # Specify when to stop training
        checkpoint_config=train.CheckpointConfig(checkpoint_at_end=True),
    ),
)

# Run the tuner
results = tuner.fit()

# Get the best result based on a particular metric
best_result = results.get_best_result(
    metric="env_runners/episode_return_mean", mode="max"
)

Thanks

Ali_Zargarian · March 26, 2025, 2:53pm

Unfortunately, it doesn’t work. Also it should be there according to deprecation warning.

Ali_Zargarian · April 2, 2025, 9:24am

Hello @sven1977 ,
any update?

Ali_Zargarian · April 24, 2025, 2:33pm

Hello @sven1977
is there any update? As I’m working on another issue related to masking some actions, something as below :

github.com/ray-project/ray

rllib/examples/action_masking.py

7f1bacc7d

"""Example showing how to use "action masking" in RLlib.

"Action masking" allows the agent to select actions based on the current
observation. This is useful in many practical scenarios, where different
actions are available in different time steps.
Blog post explaining action masking: https://boring-guy.sh/posts/masking-rl/

RLlib supports action masking, i.e., disallowing these actions based on the
observation, by slightly adjusting the environment and the model as shown in
this example.

Here, the ActionMaskEnv wraps an underlying environment (here, RandomEnv),
defining only a subset of all actions as valid based on the environment's
observations. If an invalid action is selected, the environment raises an error
- this must not happen!

The environment constructs Dict observations, where obs["observations"] holds
the original observations and obs["action_mask"] holds the valid actions.
To avoid selection invalid actions, the ActionMaskModel is used. This model
takes the original observations, computes the logits of the corresponding

This file has been truncated. show original

it uses Dict observations again. The example explains that is suitable just to train with Old API stack .Because I don’t want to use Old API stack, I’ve been blocked again!!

Thanks in advanced

Lars_Simon_Zehnder · April 24, 2025, 2:40pm

@Ali_Zargarian , this is indeed the old example. The example for the new API stack can be found here: ray/rllib/examples/rl_modules/action_masking_rl_module.py at master · ray-project/ray · GitHub

Ali_Zargarian · April 24, 2025, 2:47pm

@sven1977
Ah.., Nice,
could you please take a look at example for train(or tune) the observation as dictionary(DictObservationSpace) with New API stack?

Thanks

Lars_Simon_Zehnder · April 24, 2025, 2:49pm

@Ali_Zargarian , I don’t understand what you mean exactly. Could you elaborate? The observation_space in action masking is always dictionary. Do you mean using a dictionary observation space under the observations key?

Ali_Zargarian · May 2, 2025, 2:57pm

Hi @Lars_Simon_Zehnder ,
I tried to use a dictionary observation space under the observations key , but I get this error :

(PPO pid=10788) ValueError: This RLModule requires the environment to provide a gym.spaces.Dict observation space of the form: [repeated 2x across cluster]
(PPO pid=10788) {‘action_mask’: Box(0.0, 1.0, shape=(self.action_space.n,)), ‘observation_space’: self.observation_space} [repeated 2x across cluster]

in spite of I get this when I print observation_space’s type :
<class ‘gymnasium.spaces.dict.Dict’>

observation_space is something as below :

    import gymnasium as gym
    self.observation_space = gym.spaces.Dict({
        "action_mask": gym.spaces.Box(
            low=0,
            high=1,
            shape=(self.action_space.n,),
            dtype=np.float32
        ),
        "observations": gym.spaces.Dict({
            "obs_1": gym.spaces.Box(
                low=-100.0,
                high=100.0,
                shape=(4 * 4,),
                dtype=np.float32
            ),
            "obs_2": gym.spaces.Box(
                low=-100.0,
                high=100.0,
                shape=(3,),
                dtype=np.float32
            )
        })
    })

I think the action_masking_rlm.py can’t handle nested “observations” dictionary!

Ali_Zargarian · May 5, 2025, 4:38pm

Hi @Lars_Simon_Zehnder
any update?

Daraan · June 3, 2025, 1:16pm

I only skimmed trough your topic. If it is the same warning message I am aware of you can safely ignore it. Currently there will always be a deprecation warning for the very first RLModuleConfig config created.
There still exists deprecated code that creates a self.config = RLModuleConfig(... from your new stack options for backwards compatibility, this instantiation throws the error

I wrote myself a snippet to run before my code to suppress this misleading warning

from ray.rllib.utils.deprecation import logger as __deprecation_logger

# This suppresses a deprecation warning from RLModuleConfig
__old_level = __deprecation_logger.getEffectiveLevel()
__deprecation_logger.setLevel(logging.ERROR)
RLModuleConfig()
__deprecation_logger.setLevel(__old_level)
del __deprecation_logger

Try if that hides the error. To be sure run it with a debugger and find the location, it could be a false postive and unrelated to your code.

Topic		Replies	Views
Using Dict observation space with custom RLModule RLlib	7	438	January 6, 2025
Trainer.compute_action Error with Dict type observation inputs RLlib	4	901	December 12, 2020
Unable to use Dict observation space RLlib	2	539	March 14, 2022
Error: Custom observation Space not treated correctly RLlib	5	1051	July 15, 2021
[Rllib][Bug] Custom Multi-agent Environment Observation Space " does not contain returned observation after a reset" RLlib	0	325	April 4, 2022

ValueError: `RLModule(config=[RLModuleConfig])` has been deprecated- New API Stack

Related topics