Any usage case of Using RLModule for 2D input with CNN as encoder?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi

I am trying to use this RL Modules (Alpha) — Ray 2.8.0 to create the custom RLModule using CNN as a encoder. I am using env= ‘ALE/Assault-v5’. The obs space is (214, 160, 3). the following is the code:
import gymnasium as gym
import numpy as np
from ray.rllib.algorithms.ppo.ppo import PPOConfig
from ray.rllib.algorithms.ppo.torch.ppo_torch_rl_module import PPOTorchRLModule
from ray.rllib.core.rl_module.rl_module import SingleAgentRLModuleSpec
from ray.rllib.examples.env.random_env import RandomEnv
from ray.rllib.models.torch.torch_distributions import TorchCategorical

from ray.rllib.core.models.configs import ActorCriticEncoderConfig, MLPHeadConfig, ModelConfig, MLPEncoderConfig, RecurrentEncoderConfig, CNNEncoderConfig
from ray.rllib.core.models.base import Encoder, ENCODER_OUT
from ray.rllib.core.models.torch.base import TorchModel
from ray.rllib.utils.framework import try_import_torch

import ray
from ray import air, tune

ACTION_SPACE = 7
H_DIM = 210
W_DIM = 160
C_INPUT = 3
K_DIM = 2
STRIDE = 1

def cnn_dim_calculater(input_dim, kernal_size, stride):
return (input_dim - kernal_size)/stride + 1

class SeqTorchPPORLModule(PPOTorchRLModule):
“”"A PPORLModules with mobilenet v2 as an encoder.

The idea behind this model is to demonstrate how we can bypass catalog to
take full control over what models and action distribution are being built.
In this example, we do this to modify an existing RLModule with a custom encoder.
"""

def setup(self):
    ## Define a configration here to create a encoder
    encoderConfig = CNNEncoderConfig(
       input_dims=[H_DIM, W_DIM, 3],  # must be 3D tensor (image: w x h x C)
        cnn_filter_specifiers=[
            [16, [2, 2], 1],
        ],
        cnn_activation="relu",
        cnn_use_layernorm=False,
        cnn_use_bias=True,
    )
    #model = config.build(framework="torch")
    num_dim_cnn_output_h = cnn_dim_calculater(H_DIM, K_DIM, STRIDE)
    num_dim_cnn_output_w = cnn_dim_calculater(W_DIM, K_DIM, STRIDE)
    dim_cnn = (num_dim_cnn_output_h + num_dim_cnn_output_w) * 16

    # Since we want to use PPO, which is an actor-critic algorithm, we need to
    # use an ActorCriticEncoderConfig to wrap the base encoder config.
  
    actor_critic_encoder_config = ActorCriticEncoderConfig(
        base_encoder_config=encoderConfig
    )

    ######################################
    # Need self.encoder, self.pi, self.vf
    ######################################
    
    self.encoder = actor_critic_encoder_config.build(framework="torch")
    print('pass encoder!')
    
    
    output_dims = [int(112896)]
    print('output_dims: ', output_dims)

    pi_config = MLPHeadConfig(
        input_dims=output_dims,
        output_layer_dim=7,
    )

    vf_config = MLPHeadConfig(
        input_dims=output_dims, output_layer_dim=1
    )

    self.pi = pi_config.build(framework="torch")
    self.vf = vf_config.build(framework="torch")

    self.action_dist_cls = TorchCategorical

config = (
PPOConfig().rl_module(_enable_rl_module_api=True)
.rl_module(
rl_module_spec=SingleAgentRLModuleSpec(module_class=SeqTorchPPORLModule)
)
.environment(
env = ‘ALE/Assault-v5’,
#observation_space = spaces.Box(low=0, high=255, shape=(210, INPUT_DIM, 3), dtype=np.uint8),
#action_space = gym.spaces.Discrete(ACTION_SPACE)
)
.training(train_batch_size=156, sgd_minibatch_size=32, num_sgd_iter=30)
)
config = config.rollouts(num_rollout_workers=5)

stop = {
“training_iteration”: 1000,
#“timesteps_total”: 1000,
#“episode_reward_mean”: 1500,
}
tuner = tune.Tuner(
“PPO”,
run_config=air.RunConfig(
stop=stop,
),
param_space=config.to_dict(),
)
results = tuner.fit()

I am seeing this error:
(pid=7984) 2023-12-01 12:04:14,705 WARNING init.py:10 – PG has/have been moved to rllib_contrib and will no longer be maintained by the RLlib team. You can still use it/them normally inside RLlib util Ray 2.8, but from Ray 2.9 on, all rllib_contrib algorithms will no longer be part of the core repo, and will therefore have to be installed separately with pinned dependencies for e.g. ray[rllib] and other packages! See https://github.com/ray-project/ray/tree/master/rllib_contrib#rllib-contrib for more information on the RLlib contrib effort.
(RolloutWorker pid=8022) A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
(RolloutWorker pid=8022) [Powered by Stella]
(RolloutWorker pid=8022) 2023-12-01 12:04:22,151 WARNING init.py:10 – PG has/have been moved to rllib_contrib and will no longer be maintained by the RLlib team. You can still use it/them normally inside RLlib util Ray 2.8, but from Ray 2.9 on, all rllib_contrib algorithms will no longer be part of the core repo, and will therefore have to be installed separately with pinned dependencies for e.g. ray[rllib] and other packages! See https://github.com/ray-project/ray/tree/master/rllib_contrib#rllib-contrib for more information on the RLlib contrib effort.
(RolloutWorker pid=8024) 2023-12-01 12:04:22,151 WARNING init.py:10 – PG has/have been moved to rllib_contrib and will no longer be maintained by the RLlib team. You can still use it/them normally inside RLlib util Ray 2.8, but from Ray 2.9 on, all rllib_contrib algorithms will no longer be part of the core repo, and will therefore have to be installed separately with pinned dependencies for e.g. ray[rllib] and other packages! See https://github.com/ray-project/ray/tree/master/rllib_contrib#rllib-contrib for more information on the RLlib contrib effort.
(RolloutWorker pid=8022) pass encoder!
(RolloutWorker pid=8022) output_dims: [112896]
(RolloutWorker pid=8022) 2023-12-01 12:04:22,662 ERROR checker.py:258 – Exception Given groups=1, weight of size [16, 3, 2, 2], expected input[32, 4, 85, 85] to have 3 channels, but got 4 channels instead raised on function call without checkin input specs. RLlib will now attempt to check the spec before calling the function again.
(RolloutWorker pid=8022) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=8022, ip=127.0.0.1, actor_id=7b476b84cc591accb075914401000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x15f949f30>)
(RolloutWorker pid=8022) File “/Users/wangmi/opt/miniconda3/envs/ray/lib/python3.10/site-packages/ray/rllib/core/models/specs/specs_base.py”, line 243, in validate
(RolloutWorker pid=8022) raise ValueError(_INVALID_SHAPE.format(self._expected_shape, shape))
(RolloutWorker pid=8022) ValueError: Expected shape (‘b’, 210, 160, 3) but found (32, 84, 84, 4)
(RolloutWorker pid=8022)

Can anyone help me to debug? Or can provide a usage case for 2d input?

many thanks!