MARWIL with gymnasium Dict as action Space

The persistent “0.00 row/s” issue in MARWIL with Dict action/observation spaces and a Parquet dataset is due to how Ray Data and Parquet handle nested structures: if your action or observation columns contain dicts with numpy arrays or non-JSON-serializable types, Ray Data may silently fail to read/process the rows, causing the pipeline to stall with no progress. Even after fixing the MultiDiscrete.from_jsonable error, if the Parquet files still store numpy arrays or objects inside dicts, the OfflinePreLearner cannot deserialize them, resulting in zero rows processed (Ray Data nested serialization issue).

To fully resolve this, you must ensure that before writing to Parquet, all nested dict values (in both actions and observations) are converted to native Python types: use lists for Box, ints for Discrete, and avoid numpy arrays or objects. Only then will Ray Data and RLlib’s offline pipeline be able to read and process the dataset, allowing MARWIL to train. The code provided in my previous answer addresses this by converting all nested values before saving.

Sources:

Hint: Mention @RunLLM in the post for followups.