MARWIL with gymnasium Dict as action Space

1. Severity of the issue:
High: Completely blocks me.

2. Environment:

  • Ray version: 2.48.0
  • Python version: 3.11.4
  • OS: Windows 11

3. What happened vs. what you expected:

Hello RLlib community,

I’m trying to implement the following example:
https://github.com/ray-project/ray/blob/master/rllib/examples/offline_rl/train_w_bc_finetune_w_ppo.py
but with a few small modifications. Specifically, I want to:

  • Use MARWIL instead of BC (which should not be a major change).
  • Work with a custom environment.

The particularity of this custom environment is that it uses a Gymnasium Dict both as the observation space and the action space.

To make the error reproducible, I’ve created a minimal custom environment and a simple dummy dataset generator. You can find them here: https://github.com/IdairaRodYanez/RLlib-experiments. First you should execute create_offline_dataset.py and then train_marwil.py.

The error I’m encountering is the following:

File ".../ray/rllib/offline/offline_prelearner.py", line 213, in __call__
    episodes: List[SingleAgentEpisode] = self._map_to_episodes(  # WHAT DOES THIS LINE DO?
  File ".../ray/rllib/offline/offline_prelearner.py", line 438, in _map_to_episodes
    else convert(batch[schema[Columns.ACTIONS]][i], action_space)
  File ".../ray/rllib/utils/spaces/space_utils.py", line 115, in from_jsonable_if_needed
    return space.from_jsonable(sample)[0]
  File ".../gymnasium/spaces/dict.py", line 226, in from_jsonable
    dict_of_list = {key: space.from_jsonable(sample_n[key]) for key, space in self.spaces.items()}
  File ".../gymnasium/spaces/multi_discrete.py", line 189, in from_jsonable
    return [np.array(sample, dtype=np.int64) for sample in sample_n]
TypeError: 'int' object is not iterable

This is followed by:

ray::MapBatches(OfflinePreLearner).submit()
...
ray.exceptions.UserCodeException: Failed to process the following data block: {
  'obs': array([...], dtype=float32),
  'actions': array([{'rotate': 0, 'thrust': array([-0.17720357], dtype=float32)}, ...], dtype=object),
  'rewards': array([...]),
  'new_obs': array([...], dtype=float32),
  'dones': array([...])
}

I’ve been trying to debug this issue to understand what’s going on, but I can’t step into the PlanExecutor logic once the different RayTasks are launched inside ray.data.dataset (within the _executor_to_iterator method).

I don’t understand why the error indicates that it expects a list of elements in multi_discrete.py, since I never define a MultiDiscrete type in my action space. My intuition is that somewhere in RLlib’s internal logic, the space type might be converted from Discrete to MultiDiscrete, but I haven’t been able to debug deeply enough to confirm this.

Does anyone have advice on how to debug these RayTasks, or any idea what could be causing this issue?

Apologies for the long message, and thank you in advance for your help!

The error occurs because RLlib’s offline data pipeline expects actions in a format compatible with the Gymnasium Dict space’s from_jsonable method, but your dataset provides actions as a list of dicts (with Discrete and Box subspaces), which leads to a type mismatch. Specifically, when RLlib tries to convert the action using from_jsonable_if_needed, it calls space.from_jsonable(sample)[0], but the sample is not in the expected format for nested Dict+Discrete/Box spaces, resulting in the 'int' object is not iterable error. This is a known issue when using composite (Dict) action spaces with nested Discrete or Box subspaces in RLlib’s offline pipeline, as described in this GitHub issue.

To debug Ray tasks, you can use Ray’s postmortem debugging with VS Code by setting the RAY_DEBUG_POST_MORTEM environment variable, which allows you to attach a debugger to paused tasks and inspect their state, as shown in this image-based guide. Would you like more detail on workarounds or how to adapt your dataset/actions to avoid this error?

Sources:

Hint: Mention @RunLLM in the post for followups.

Yes @RunLLM, could you provide the correct code to generate the dataset so that it’s compatible with the Dict action space and avoids this error?

Thank you in advance!

To make your offline dataset compatible with a Gymnasium Dict action space (with Discrete and Box subspaces), you must ensure that each action in your dataset is a dict where each value is a native Python type or a numpy array, matching the expected type for each subspace. Critically, when saving to JSON or numpy, Box subspaces should be lists (not numpy arrays), and Discrete subspaces should be ints. This ensures that Gymnasium’s Dict space from_jsonable method can correctly reconstruct the action.

Here’s a minimal example for generating such a dataset:

import numpy as np
import json

# Example Dict action space: {"rotate": Discrete(3), "thrust": Box(-1, 1, (1,), float32)}
def generate_offline_dataset(num_samples=100):
    data = {
        "obs": [],
        "actions": [],
        "rewards": [],
        "new_obs": [],
        "dones": [],
    }
    for _ in range(num_samples):
        obs = np.random.rand(4).astype(np.float32)
        action = {
            "rotate": int(np.random.randint(0, 3)),  # Discrete as int
            "thrust": [float(np.random.uniform(-1, 1))]  # Box as list of floats
        }
        reward = float(np.random.randn())
        new_obs = np.random.rand(4).astype(np.float32)
        done = bool(np.random.choice([False, True], p=[0.9, 0.1]))

        data["obs"].append(obs.tolist())
        data["actions"].append(action)
        data["rewards"].append(reward)
        data["new_obs"].append(new_obs.tolist())
        data["dones"].append(done)

    # Save as JSON for compatibility
    with open("offline_data.json", "w") as f:
        json.dump(data, f)

generate_offline_dataset()

Key points:

  • Use int for Discrete actions, and list of floats for Box actions.
  • When saving, use .tolist() for numpy arrays and Box values.
  • Avoid numpy arrays inside the action dict; use native Python types.

This format will be compatible with RLlib’s offline pipeline and Gymnasium’s Dict space expectations, avoiding the TypeError: 'int' object is not iterable error. Let me know if you want a version for saving as Parquet or with Ray Datasets.

Sources:

Hint: Mention @RunLLM in the post for followups.

Thanks @RunLLM! Yes, could you provide me the code saving data as Parquet instead of JSON or the corresponding train_marwil.py code using JSON format?

Certainly! Here’s how to generate and save a compatible offline dataset for a Dict action space as Parquet using Ray Datasets, and how to load it for MARWIL training. This approach ensures actions are stored as dicts with native Python types (int for Discrete, list of float for Box), which is required for RLlib’s offline pipeline.

Dataset Generation and Parquet Saving:

import numpy as np
import ray
import pandas as pd

# Example: Dict action space {"rotate": Discrete(3), "thrust": Box(-1, 1, (1,), float32)}
def generate_parquet_dataset(num_samples=100, filename="offline_data.parquet"):
    data = []
    for _ in range(num_samples):
        obs = np.random.rand(4).astype(np.float32).tolist()
        action = {
            "rotate": int(np.random.randint(0, 3)),  # Discrete as int
            "thrust": [float(np.random.uniform(-1, 1))]  # Box as list
        }
        reward = float(np.random.randn())
        new_obs = np.random.rand(4).astype(np.float32).tolist()
        done = bool(np.random.choice([False, True], p=[0.9, 0.1]))
        data.append({
            "obs": obs,
            "actions": action,
            "rewards": reward,
            "new_obs": new_obs,
            "dones": done,
        })
    # Convert to Ray Dataset and write as Parquet
    ray.init(ignore_reinit_error=True)
    ds = ray.data.from_items(data)
    ds.write_parquet(filename)
    ray.shutdown()

generate_parquet_dataset()

Loading Parquet for MARWIL Training:

In your MARWIL config, set:

from ray.rllib.algorithms.marwil import MARWILConfig

config = (
    MARWILConfig()
    .environment(env="your_custom_env")
    .offline_data(
        input_="offline_data.parquet",
        input_read_episodes=False,  # For tabular data
    )
)

If you want to use JSON instead, just save the data as JSON (as in the previous answer) and set input_="offline_data.json".

References:

Would you like a full MARWIL training script example as well?

Sources:

Hint: Mention @RunLLM in the post for followups.

Thank you again @RunLLM

I’ve managed to fix the previous error:

File ".../ray/rllib/offline/offline_prelearner.py", line 213, in __call__
    episodes: List[SingleAgentEpisode] = self._map_to_episodes( 
File ".../ray/rllib/offline/offline_prelearner.py", line 438, in _map_to_episodes
    else convert(batch[schema[Columns.ACTIONS]][i], action_space)
File ".../ray/rllib/utils/spaces/space_utils.py", line 115, in from_jsonable_if_needed
    return space.from_jsonable(sample)[0]
File ".../gymnasium/spaces/dict.py", line 226, in from_jsonable
    dict_of_list = {key: space.from_jsonable(sample_n[key]) for key, space in self.spaces.items()}
File ".../gymnasium/spaces/multi_discrete.py", line 189, in from_jsonable
    return [np.array(sample, dtype=np.int64) for sample in sample_n]
TypeError: 'int' object is not iterable

That issue is now resolved, but I’m still seeing repeated log messages like this:

Running Dataset: dataset_3_0. Active & requested resources: 1/8 CPU, 256.0MB/196.9MB object store: : 0.00 row [00:13, ? row/s]
(MapWorker(MapBatches(OfflinePreLearner)) pid=27308) array([...])
...

It seems like MARWIL is running but no rows are actually being read from the dataset — the progress stays stuck at 0.00 row/s indefinitely.

Could you help me understand what might be causing this?

I’ve updated my full code here:
:link: https://github.com/IdairaRodYanez/RLlib-experiments/blob/main/create_offline_dataset.py
(You can check the latest commit for the most recent version.)

I suspect it could be related to how the Parquet dataset is being written or how nested structures (like my Dict action space) are stored, but I’m not sure which part is preventing Ray from reading the rows properly.

Any insights on what could cause Ray’s OfflinePreLearner to show 0.00 row/s would be really helpful!

This issue—where MARWIL appears to run but the dataset progress stays at 0.00 row/s—often indicates a problem with how the Parquet file encodes nested structures, especially with Dict action spaces. If the action column contains dicts with numpy arrays or non-JSON-serializable types, Ray Data may silently fail to read or parse the rows, resulting in no data being processed. This is a common pitfall when saving nested dicts with numpy arrays directly to Parquet, as Parquet expects flat, serializable data types (Ray Data Parquet limitations).

To resolve this, ensure that before writing to Parquet, all nested values in your action dicts are converted to native Python types (e.g., lists for Box, ints for Discrete), and avoid numpy arrays or objects. You can preprocess your data like this before saving:

def to_serializable_action(action):
    return {
        "rotate": int(action["rotate"]),
        "thrust": [float(x) for x in action["thrust"]]
    }

for row in data:
    row["actions"] = to_serializable_action(row["actions"])

Then write the dataset as before. This ensures Ray Data can read and batch the rows, and MARWIL will process them. Would you like more detail or a code snippet for re-writing your dataset?

Sources:

Hint: Mention @RunLLM in the post for followups.

Yes @RunLLM , could you provide me all the code that I should write in train_marwil.py and in create_dataset.py to solve this problem?

It seems likely that the main issues are (1) ensuring your dataset is written with only native Python types (no numpy arrays) for all nested Dict fields, and (2) reading the dataset in a way compatible with RLlib’s MARWIL offline pipeline. Below is a minimal, end-to-end example for both create_dataset.py and train_marwil.py that should work for a Dict action space with Discrete and Box subspaces.


create_dataset.py

import numpy as np
import ray

def to_serializable_action(action):
    return {
        "rotate": int(action["rotate"]),  # Discrete as int
        "thrust": [float(x) for x in action["thrust"]]  # Box as list of floats
    }

def generate_parquet_dataset(num_samples=100, filename="offline_data.parquet"):
    data = []
    for _ in range(num_samples):
        obs = np.random.rand(4).astype(np.float32).tolist()
        action = {
            "rotate": int(np.random.randint(0, 3)),
            "thrust": [float(np.random.uniform(-1, 1))]
        }
        reward = float(np.random.randn())
        new_obs = np.random.rand(4).astype(np.float32).tolist()
        done = bool(np.random.choice([False, True], p=[0.9, 0.1]))
        data.append({
            "obs": obs,
            "actions": to_serializable_action(action),
            "rewards": reward,
            "new_obs": new_obs,
            "dones": done,
        })
    ray.init(ignore_reinit_error=True)
    ds = ray.data.from_items(data)
    ds.write_parquet(filename)
    ray.shutdown()

if __name__ == "__main__":
    generate_parquet_dataset()

train_marwil.py

import gymnasium as gym
import numpy as np
from ray.rllib.algorithms.marwil import MARWILConfig
from ray.tune.registry import register_env

# Minimal custom env for Dict obs/action space
class DummyDictEnv(gym.Env):
    def __init__(self, config=None):
        self.observation_space = gym.spaces.Box(low=0, high=1, shape=(4,), dtype=np.float32)
        self.action_space = gym.spaces.Dict({
            "rotate": gym.spaces.Discrete(3),
            "thrust": gym.spaces.Box(-1, 1, (1,), dtype=np.float32)
        })
        self.state = self.observation_space.sample()
        self.steps = 0

    def reset(self, *, seed=None, options=None):
        self.state = self.observation_space.sample()
        self.steps = 0
        return self.state, {}

    def step(self, action):
        self.state = self.observation_space.sample()
        reward = float(np.random.randn())
        done = self.steps > 10
        self.steps += 1
        return self.state, reward, done, False, {}

register_env("DummyDictEnv", lambda config: DummyDictEnv(config))

config = (
    MARWILConfig()
    .environment(env="DummyDictEnv")
    .offline_data(
        input_="offline_data.parquet",
        input_read_episodes=False,  # For tabular data
    )
    .training(train_batch_size_per_learner=32)
)

if __name__ == "__main__":
    algo = config.build()
    for i in range(5):
        result = algo.train()
        print(f"Iteration {i}: reward={result['episode_reward_mean']}")
    algo.stop()

This setup should avoid serialization issues and allow MARWIL to process your offline data. If you use a different observation/action space, adjust the env and dataset accordingly.

Would you like more detail on customizing the environment or dataset schema?

Sources:

Hint: Mention @RunLLM in the post for followups.

Thanks a bunch for bringing this up. Please file an issue in the Ray Open Source project.

Done! MARWIL OfflinePreLearner stuck at 0.00 row/s when reading Parquet dataset with Dict action space · Issue #57794 · ray-project/ray · GitHub

@RunLLM , following the discussion I saw an issue opened on GitHub, but I was wondering if your answer could be useful to solve it. I attach the comment below:

”I’m running an offline MARWIL experiment using RLlib and a custom environment with a Dict action space and a Dict observation space generated using gymnasium and a parquet offline dataset generated with Ray Data.
After fixing a previous TypeError related to MultiDiscrete.from_jsonable, MARWIL now starts training — but it gets stuck indefinitely with messages like:

Running Dataset: dataset_3_0. Active & requested resources: 1/8 CPU, 256.0MB/196.9MB object store
0.00 row [00:13, ? row/s]
(MapWorker(MapBatches(OfflinePreLearner)) pid=27308) array([…])

The dataset progress stays at 0.00 row/s, suggesting that MARWIL / OfflinePreLearner is not reading any rows from the dataset

Environment: custom with Dict action and observation space

Dataset format: Parquet written with Ray Data

Repo with full reproduction:
https://github.com/IdairaRodYanez/RLlib-experiments/

Forum discussion (full context):
https://discuss.ray.io/t/marwil-with-gymnasium-dict-as-action-space/23236

Versions / Dependencies

  • Ray version: 2.48.0

  • Python version: 3.11.4

  • OS: Windows 11

  • Environment: Gymnasium 1.0.0”

    I was wondering why the error persisted after the fixes given in this forum.

The persistent “0.00 row/s” issue in MARWIL with Dict action/observation spaces and a Parquet dataset is due to how Ray Data and Parquet handle nested structures: if your action or observation columns contain dicts with numpy arrays or non-JSON-serializable types, Ray Data may silently fail to read/process the rows, causing the pipeline to stall with no progress. Even after fixing the MultiDiscrete.from_jsonable error, if the Parquet files still store numpy arrays or objects inside dicts, the OfflinePreLearner cannot deserialize them, resulting in zero rows processed (Ray Data nested serialization issue).

To fully resolve this, you must ensure that before writing to Parquet, all nested dict values (in both actions and observations) are converted to native Python types: use lists for Box, ints for Discrete, and avoid numpy arrays or objects. Only then will Ray Data and RLlib’s offline pipeline be able to read and process the dataset, allowing MARWIL to train. The code provided in my previous answer addresses this by converting all nested values before saving.

Sources:

Hint: Mention @RunLLM in the post for followups.