Context
I’m training a PPO agent in RLlib (Ray 2.48.0) with a custom Gymnasium environment that returns a Dict observation space containing both pixels and vector features:
import gymnasium as gym
import numpy as np
obs_space = gym.spaces.Dict({
"pixels": gym.spaces.Box(0.0, 1.0, (84, 84, 4), dtype=np.float32),
"features": gym.spaces.Box(-1.0, 1.0, (9,), dtype=np.float32),
})
The step()
returns:
{
"pixels": np.zeros((84, 84, 4), np.float32),
"features": np.zeros(9, np.float32),
}
Problem
When running PPO with this env:
from ray.rllib.algorithms.ppo import PPOConfig
from ray.tune.registry import register_env
class DummyEnv(gym.Env):
def __init__(self, cfg=None):
self.observation_space = obs_space
self.action_space = gym.spaces.Discrete(4)
def reset(self, *, seed=None, options=None):
return { "pixels": np.zeros((84,84,4), np.float32),
"features": np.zeros(9, np.float32) }, {}
def step(self, action):
return self.reset()[0], 0.0, False, False, {}
register_env("dummy", lambda cfg: DummyEnv())
cfg = (PPOConfig()
.environment("dummy")
.framework("torch"))
algo = cfg.build()
I get:
ValueError: No default encoder config for obs space=Dict('features': Box(-1.0, 1.0, (9,), float32),
'pixels': Box(0.0, 1.0, (84, 84, 4), float32)), lstm=False found.
Question
-
What is the recommended way in Ray 2.48.0 to handle such Dict spaces (CNN for
"pixels"
and MLP for"features"
, then concatenate)? -
Do I need to manually define a custom
RLModuleSpec
/Catalog
for this, or is there a built-in default? -
If a manual config is required, could you provide a minimal example (Torch backend, PPO)?
System Info
-
Ray 2.48.0
-
Python 3.10
Workaround tested
Flattening the Dict works, but then "pixels"
are treated as a flat vector and CNN processing is lost. Ideally, I’d like RLlib to auto-create a CNN branch for "pixels"
and an MLP branch for "features"
.