ValueError: `RLModule(config=[RLModuleConfig])` has been deprecated- New API Stack

High: Completely blocks me.

Hello,
During the training my costume environment in gym with Rllib(new API Stack) I get this error and couldn’t solve it up to now :slightly_frowning_face: I have defined my observations as Dictionary. It works well by old API Stack but not with new one. It completely blocks me to go forward. In order to have a better and simpler understanding I started to train the example in Rllib github in below link:

Unfortunately I get the same error. as clarification I define the config as below :

config = (
PPOConfig()
.api_stack(
enable_rl_module_and_learner=True,
enable_env_runner_and_connector_v2=True,
)
.environment(
env=“CartPoleWithDictObservationSpace”,
)
.training(
lr = agent_params[“lr”], # Learning rate
entropy_coeff= agent_params[“entropy_coeff”], # Encourage exploration with entropy regularization
)
.env_runners(num_env_runners = agent_params[“num_env”]) # Number of parallel environments
.framework(“torch”)

.rl_module(
rl_module_spec=RLModuleSpec(
    module_class=PPOTorchRLModule,
    inference_only=False,
    learner_only=False,
    observation_space=env_instance.observation_space,
    action_space=env_instance.action_space,
    model_config={"hidden_sizes": [128, 128]},  # Optional, passed to RLModule
)
)

Environment:

  • Ray version: 2.43.0
  • Python version: 3.12.9

thanks

Hi Ali! Have you had a chance to look at the old → new API migration guide? New API stack migration guide — Ray 2.44.0

I took a look and it seems like someone on Github opened up a similar issue, is it similar to the one you’re experiencing? RLlib new API stack false deprecation warning / MultiRLModuleSpec · Issue #51630 · ray-project/ray · GitHub :thinking:

1 Like

Hi Christina ,
firstly , thanks for your reply.
yes, I’ve seen that lots of times. As you see I arranged the rlmodule according the new API’s arrangement and also warning deprecation. I think New API module can not handle the dictionary observation. Actually I couldn’t find some resources for training the agent with dict observation and New API stack :slightly_frowning_face: . For some reasons I need to define my observations as dictionary and I’m eager to train it with new API stack. I would be so appreciable if you give me some hints or resources to address the issue more direct.
Thanks

Hi. Have you tried to remove that

observation_space=env_instance.observation_space,
action_space=env_instance.action_space,

from the RLModuleSpec() definition? It seems that those configurations can be figured out by the RLModuleSpec itself.

Hey @Ali_Zargarian , could you provide some minimal reproduction script, so we can debug this problem on our end?
Thanks!

Hi @sven1977 ,
Thanks for rapid reply :slightly_smiling_face:
As I said I took the environment from ray example Github as below:

and my training script is as below:

from cartpole_with_dict_observation_space import CartPoleWithDictObservationSpace

def env_creator(config):
    return CartPoleWithDictObservationSpace()

# Register the custom environment
register_env("CartPoleWithDictObservationSpace", env_creator)


env_instance = CartPoleWithDictObservationSpace()
config = (
    PPOConfig()
    .api_stack(
        enable_rl_module_and_learner=True,
        enable_env_runner_and_connector_v2=True,
    )
    .environment(
        env="CartPoleWithDictObservationSpace",
    )
    .training(
        lr = 0.0001,  # Learning rate
        entropy_coeff= 0.01,  # Encourage exploration with entropy regularization
    )
    .env_runners(num_env_runners = 1)  # Number of parallel environments
    .framework("torch")

)
RLModule(
    rl_module_spec=RLModuleSpec(
        module_class=PPOTorchRLModule,
        inference_only=False,
        learner_only=False,
        observation_space=env_instance.observation_space,
        action_space=env_instance.action_space,
        model_config={"hidden_sizes": [128, 128]},  # Optional, passed to RLModule
        catalog_class=PPOCatalog,
        )
    )

tuner = tune.Tuner(
    "PPO",
    param_space=config,
    run_config=tune.RunConfig(
        stop={"training_iteration": 5},  # Specify when to stop training
        checkpoint_config=train.CheckpointConfig(checkpoint_at_end=True),
    ),
)

# Run the tuner
results = tuner.fit()

# Get the best result based on a particular metric
best_result = results.get_best_result(
    metric="env_runners/episode_return_mean", mode="max"
)

Thanks

Unfortunately, it doesn’t work. Also it should be there according to deprecation warning.

Hello @sven1977 ,
any update? :wink:

Hello @sven1977
is there any update? As I’m working on another issue related to masking some actions, something as below :

it uses Dict observations again. The example explains that is suitable just to train with Old API stack .Because I don’t want to use Old API stack, I’ve been blocked again!!

Thanks in advanced

@Ali_Zargarian , this is indeed the old example. The example for the new API stack can be found here: ray/rllib/examples/rl_modules/action_masking_rl_module.py at master · ray-project/ray · GitHub

@sven1977
Ah.., Nice,
could you please take a look at example for train(or tune) the observation as dictionary(DictObservationSpace) with New API stack?

Thanks

@Ali_Zargarian , I don’t understand what you mean exactly. Could you elaborate? The observation_space in action masking is always dictionary. Do you mean using a dictionary observation space under the observations key?

Hi @Lars_Simon_Zehnder ,
I tried to use a dictionary observation space under the observations key , but I get this error :

(PPO pid=10788) ValueError: This RLModule requires the environment to provide a gym.spaces.Dict observation space of the form: [repeated 2x across cluster]
(PPO pid=10788) {‘action_mask’: Box(0.0, 1.0, shape=(self.action_space.n,)), ‘observation_space’: self.observation_space} [repeated 2x across cluster]

in spite of I get this when I print observation_space’s type :
<class ‘gymnasium.spaces.dict.Dict’>

observation_space is something as below :

    import gymnasium as gym
    self.observation_space = gym.spaces.Dict({
        "action_mask": gym.spaces.Box(
            low=0,
            high=1,
            shape=(self.action_space.n,),
            dtype=np.float32
        ),
        "observations": gym.spaces.Dict({
            "obs_1": gym.spaces.Box(
                low=-100.0,
                high=100.0,
                shape=(4 * 4,),
                dtype=np.float32
            ),
            "obs_2": gym.spaces.Box(
                low=-100.0,
                high=100.0,
                shape=(3,),
                dtype=np.float32
            )
        })
    })

I think the action_masking_rlm.py can’t handle nested “observations” dictionary!

Hi @Lars_Simon_Zehnder
any update?

I only skimmed trough your topic. If it is the same warning message I am aware of you can safely ignore it. Currently there will always be a deprecation warning for the very first RLModuleConfig config created.
There still exists deprecated code that creates a self.config = RLModuleConfig(... from your new stack options for backwards compatibility, this instantiation throws the error

I wrote myself a snippet to run before my code to suppress this misleading warning

from ray.rllib.utils.deprecation import logger as __deprecation_logger

# This suppresses a deprecation warning from RLModuleConfig
__old_level = __deprecation_logger.getEffectiveLevel()
__deprecation_logger.setLevel(logging.ERROR)
RLModuleConfig()
__deprecation_logger.setLevel(__old_level)
del __deprecation_logger

Try if that hides the error. To be sure run it with a debugger and find the location, it could be a false postive and unrelated to your code.

1 Like