-
Yes, both MARWIL and PPO require the same custom RLModule approach for Dict observation spaces. MARWIL does not natively handle Dict spaces with separate encoders, so you must implement a custom RLModule (CNN for “pixels”, MLP for “features”) and use it for both algorithms. The RLModule checkpoint from MARWIL can be loaded into PPO for fine-tuning, as both use the same RLModuleSpec architecture (source, source, source).
-
DreamerV3 does not natively handle Dict observation spaces with automatic per-key encoder routing. Its built-in World Model expects a single image or vector input, not a Dict, so you would need to implement a custom encoder or modify the model config to handle Dicts (e.g., by concatenating or custom processing). There is currently no built-in way in DreamerV3Config.model() to specify separate encoders for Dict keys (source).
Would you like a code template for the custom RLModule pattern?
Sources:
- How to use PPO with Dict observation space (pixels + features) in Ray 2.48.0?
- Using Dict observation space with custom RLModule
- RLlib GitHub Issue: Native Dict observation space support with automatic encoder routing in new API stack (RLModule/Learner)
- https://github.com/ray-project/ray/issues/46631
Hint: Mention @RunLLM in the post for followups.