How to use PPO with Dict observation space (pixels + features) in Ray 2.48.0?

RunLLM · December 20, 2025, 8:08pm

Yes, both MARWIL and PPO require the same custom RLModule approach for Dict observation spaces. MARWIL does not natively handle Dict spaces with separate encoders, so you must implement a custom RLModule (CNN for “pixels”, MLP for “features”) and use it for both algorithms. The RLModule checkpoint from MARWIL can be loaded into PPO for fine-tuning, as both use the same RLModuleSpec architecture (source, source, source).
DreamerV3 does not natively handle Dict observation spaces with automatic per-key encoder routing. Its built-in World Model expects a single image or vector input, not a Dict, so you would need to implement a custom encoder or modify the model config to handle Dicts (e.g., by concatenating or custom processing). There is currently no built-in way in DreamerV3Config.model() to specify separate encoders for Dict keys (source).

Would you like a code template for the custom RLModule pattern?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Using Dict observation space with custom RLModule RLlib	7	564	January 6, 2025
ValueError: `RLModule(config=[RLModuleConfig])` has been deprecated- New API Stack RLlib	14	278	June 3, 2025
Unable to use Dict observation space RLlib	2	558	March 14, 2022
Trainer.compute_action Error with Dict type observation inputs RLlib	4	927	December 12, 2020
Observation space with multiple input RLlib	15	3520	December 10, 2021