MARWIL with gymnasium Dict as action Space

To make your offline dataset compatible with a Gymnasium Dict action space (with Discrete and Box subspaces), you must ensure that each action in your dataset is a dict where each value is a native Python type or a numpy array, matching the expected type for each subspace. Critically, when saving to JSON or numpy, Box subspaces should be lists (not numpy arrays), and Discrete subspaces should be ints. This ensures that Gymnasium’s Dict space from_jsonable method can correctly reconstruct the action.

Here’s a minimal example for generating such a dataset:

import numpy as np
import json

# Example Dict action space: {"rotate": Discrete(3), "thrust": Box(-1, 1, (1,), float32)}
def generate_offline_dataset(num_samples=100):
    data = {
        "obs": [],
        "actions": [],
        "rewards": [],
        "new_obs": [],
        "dones": [],
    }
    for _ in range(num_samples):
        obs = np.random.rand(4).astype(np.float32)
        action = {
            "rotate": int(np.random.randint(0, 3)),  # Discrete as int
            "thrust": [float(np.random.uniform(-1, 1))]  # Box as list of floats
        }
        reward = float(np.random.randn())
        new_obs = np.random.rand(4).astype(np.float32)
        done = bool(np.random.choice([False, True], p=[0.9, 0.1]))

        data["obs"].append(obs.tolist())
        data["actions"].append(action)
        data["rewards"].append(reward)
        data["new_obs"].append(new_obs.tolist())
        data["dones"].append(done)

    # Save as JSON for compatibility
    with open("offline_data.json", "w") as f:
        json.dump(data, f)

generate_offline_dataset()

Key points:

  • Use int for Discrete actions, and list of floats for Box actions.
  • When saving, use .tolist() for numpy arrays and Box values.
  • Avoid numpy arrays inside the action dict; use native Python types.

This format will be compatible with RLlib’s offline pipeline and Gymnasium’s Dict space expectations, avoiding the TypeError: 'int' object is not iterable error. Let me know if you want a version for saving as Parquet or with Ray Datasets.

Sources:

Hint: Mention @RunLLM in the post for followups.