How to use PPO with Dict observation space (pixels + features) in Ray 2.48.0?

  1. Yes, both MARWIL and PPO require the same custom RLModule approach for Dict observation spaces. MARWIL does not natively handle Dict spaces with separate encoders, so you must implement a custom RLModule (CNN for “pixels”, MLP for “features”) and use it for both algorithms. The RLModule checkpoint from MARWIL can be loaded into PPO for fine-tuning, as both use the same RLModuleSpec architecture (source, source, source).

  2. DreamerV3 does not natively handle Dict observation spaces with automatic per-key encoder routing. Its built-in World Model expects a single image or vector input, not a Dict, so you would need to implement a custom encoder or modify the model config to handle Dicts (e.g., by concatenating or custom processing). There is currently no built-in way in DreamerV3Config.model() to specify separate encoders for Dict keys (source).

Would you like a code template for the custom RLModule pattern?

Sources:

Hint: Mention @RunLLM in the post for followups.