MARWIL with gymnasium Dict as action Space

Idaira_Rodriguez_Yan · October 10, 2025, 8:55am

1. Severity of the issue:
High: Completely blocks me.

2. Environment:

Ray version: 2.48.0
Python version: 3.11.4
OS: Windows 11

3. What happened vs. what you expected:

Hello RLlib community,

I’m trying to implement the following example:
https://github.com/ray-project/ray/blob/master/rllib/examples/offline_rl/train_w_bc_finetune_w_ppo.py
but with a few small modifications. Specifically, I want to:

Use MARWIL instead of BC (which should not be a major change).
Work with a custom environment.

The particularity of this custom environment is that it uses a Gymnasium Dict both as the observation space and the action space.

To make the error reproducible, I’ve created a minimal custom environment and a simple dummy dataset generator. You can find them here: https://github.com/IdairaRodYanez/RLlib-experiments. First you should execute create_offline_dataset.py and then train_marwil.py.

The error I’m encountering is the following:

File ".../ray/rllib/offline/offline_prelearner.py", line 213, in __call__
    episodes: List[SingleAgentEpisode] = self._map_to_episodes(  # WHAT DOES THIS LINE DO?
  File ".../ray/rllib/offline/offline_prelearner.py", line 438, in _map_to_episodes
    else convert(batch[schema[Columns.ACTIONS]][i], action_space)
  File ".../ray/rllib/utils/spaces/space_utils.py", line 115, in from_jsonable_if_needed
    return space.from_jsonable(sample)[0]
  File ".../gymnasium/spaces/dict.py", line 226, in from_jsonable
    dict_of_list = {key: space.from_jsonable(sample_n[key]) for key, space in self.spaces.items()}
  File ".../gymnasium/spaces/multi_discrete.py", line 189, in from_jsonable
    return [np.array(sample, dtype=np.int64) for sample in sample_n]
TypeError: 'int' object is not iterable

This is followed by:

ray::MapBatches(OfflinePreLearner).submit()
...
ray.exceptions.UserCodeException: Failed to process the following data block: {
  'obs': array([...], dtype=float32),
  'actions': array([{'rotate': 0, 'thrust': array([-0.17720357], dtype=float32)}, ...], dtype=object),
  'rewards': array([...]),
  'new_obs': array([...], dtype=float32),
  'dones': array([...])
}

I’ve been trying to debug this issue to understand what’s going on, but I can’t step into the PlanExecutor logic once the different RayTasks are launched inside ray.data.dataset (within the _executor_to_iterator method).

I don’t understand why the error indicates that it expects a list of elements in multi_discrete.py, since I never define a MultiDiscrete type in my action space. My intuition is that somewhere in RLlib’s internal logic, the space type might be converted from Discrete to MultiDiscrete, but I haven’t been able to debug deeply enough to confirm this.

Does anyone have advice on how to debug these RayTasks, or any idea what could be causing this issue?

Apologies for the long message, and thank you in advance for your help!

RunLLM · October 10, 2025, 8:56am

The error occurs because RLlib’s offline data pipeline expects actions in a format compatible with the Gymnasium Dict space’s from_jsonable method, but your dataset provides actions as a list of dicts (with Discrete and Box subspaces), which leads to a type mismatch. Specifically, when RLlib tries to convert the action using from_jsonable_if_needed, it calls space.from_jsonable(sample)[0], but the sample is not in the expected format for nested Dict+Discrete/Box spaces, resulting in the 'int' object is not iterable error. This is a known issue when using composite (Dict) action spaces with nested Discrete or Box subspaces in RLlib’s offline pipeline, as described in this GitHub issue.

To debug Ray tasks, you can use Ray’s postmortem debugging with VS Code by setting the RAY_DEBUG_POST_MORTEM environment variable, which allows you to attach a debugger to paused tasks and inspect their state, as shown in this image-based guide. Would you like more detail on workarounds or how to adapt your dataset/actions to avoid this error?

Sources:

Hint: Mention @RunLLM in the post for followups.

Idaira_Rodriguez_Yan · October 10, 2025, 3:08pm

Yes @RunLLM, could you provide the correct code to generate the dataset so that it’s compatible with the Dict action space and avoids this error?

Thank you in advance!

RunLLM · October 10, 2025, 3:08pm

To make your offline dataset compatible with a Gymnasium Dict action space (with Discrete and Box subspaces), you must ensure that each action in your dataset is a dict where each value is a native Python type or a numpy array, matching the expected type for each subspace. Critically, when saving to JSON or numpy, Box subspaces should be lists (not numpy arrays), and Discrete subspaces should be ints. This ensures that Gymnasium’s Dict space from_jsonable method can correctly reconstruct the action.

Here’s a minimal example for generating such a dataset:

import numpy as np
import json

# Example Dict action space: {"rotate": Discrete(3), "thrust": Box(-1, 1, (1,), float32)}
def generate_offline_dataset(num_samples=100):
    data = {
        "obs": [],
        "actions": [],
        "rewards": [],
        "new_obs": [],
        "dones": [],
    }
    for _ in range(num_samples):
        obs = np.random.rand(4).astype(np.float32)
        action = {
            "rotate": int(np.random.randint(0, 3)),  # Discrete as int
            "thrust": [float(np.random.uniform(-1, 1))]  # Box as list of floats
        }
        reward = float(np.random.randn())
        new_obs = np.random.rand(4).astype(np.float32)
        done = bool(np.random.choice([False, True], p=[0.9, 0.1]))

        data["obs"].append(obs.tolist())
        data["actions"].append(action)
        data["rewards"].append(reward)
        data["new_obs"].append(new_obs.tolist())
        data["dones"].append(done)

    # Save as JSON for compatibility
    with open("offline_data.json", "w") as f:
        json.dump(data, f)

generate_offline_dataset()

Key points:

Use int for Discrete actions, and list of floats for Box actions.
When saving, use .tolist() for numpy arrays and Box values.
Avoid numpy arrays inside the action dict; use native Python types.

This format will be compatible with RLlib’s offline pipeline and Gymnasium’s Dict space expectations, avoiding the TypeError: 'int' object is not iterable error. Let me know if you want a version for saving as Parquet or with Ray Datasets.

Sources:

Hint: Mention @RunLLM in the post for followups.

Idaira_Rodriguez_Yan · October 10, 2025, 10:26pm

Thanks @RunLLM! Yes, could you provide me the code saving data as Parquet instead of JSON or the corresponding train_marwil.py code using JSON format?

RunLLM · October 10, 2025, 10:27pm

Certainly! Here’s how to generate and save a compatible offline dataset for a Dict action space as Parquet using Ray Datasets, and how to load it for MARWIL training. This approach ensures actions are stored as dicts with native Python types (int for Discrete, list of float for Box), which is required for RLlib’s offline pipeline.

Dataset Generation and Parquet Saving:

import numpy as np
import ray
import pandas as pd

# Example: Dict action space {"rotate": Discrete(3), "thrust": Box(-1, 1, (1,), float32)}
def generate_parquet_dataset(num_samples=100, filename="offline_data.parquet"):
    data = []
    for _ in range(num_samples):
        obs = np.random.rand(4).astype(np.float32).tolist()
        action = {
            "rotate": int(np.random.randint(0, 3)),  # Discrete as int
            "thrust": [float(np.random.uniform(-1, 1))]  # Box as list
        }
        reward = float(np.random.randn())
        new_obs = np.random.rand(4).astype(np.float32).tolist()
        done = bool(np.random.choice([False, True], p=[0.9, 0.1]))
        data.append({
            "obs": obs,
            "actions": action,
            "rewards": reward,
            "new_obs": new_obs,
            "dones": done,
        })
    # Convert to Ray Dataset and write as Parquet
    ray.init(ignore_reinit_error=True)
    ds = ray.data.from_items(data)
    ds.write_parquet(filename)
    ray.shutdown()

generate_parquet_dataset()

Loading Parquet for MARWIL Training:

In your MARWIL config, set:

from ray.rllib.algorithms.marwil import MARWILConfig

config = (
    MARWILConfig()
    .environment(env="your_custom_env")
    .offline_data(
        input_="offline_data.parquet",
        input_read_episodes=False,  # For tabular data
    )
)

If you want to use JSON instead, just save the data as JSON (as in the previous answer) and set input_="offline_data.json".

References:

Would you like a full MARWIL training script example as well?

Sources:

Hint: Mention @RunLLM in the post for followups.

Idaira_Rodriguez_Yan · October 10, 2025, 10:40pm

Thank you again @RunLLM

I’ve managed to fix the previous error:

File ".../ray/rllib/offline/offline_prelearner.py", line 213, in __call__
    episodes: List[SingleAgentEpisode] = self._map_to_episodes( 
File ".../ray/rllib/offline/offline_prelearner.py", line 438, in _map_to_episodes
    else convert(batch[schema[Columns.ACTIONS]][i], action_space)
File ".../ray/rllib/utils/spaces/space_utils.py", line 115, in from_jsonable_if_needed
    return space.from_jsonable(sample)[0]
File ".../gymnasium/spaces/dict.py", line 226, in from_jsonable
    dict_of_list = {key: space.from_jsonable(sample_n[key]) for key, space in self.spaces.items()}
File ".../gymnasium/spaces/multi_discrete.py", line 189, in from_jsonable
    return [np.array(sample, dtype=np.int64) for sample in sample_n]
TypeError: 'int' object is not iterable

That issue is now resolved, but I’m still seeing repeated log messages like this:

Running Dataset: dataset_3_0. Active & requested resources: 1/8 CPU, 256.0MB/196.9MB object store: : 0.00 row [00:13, ? row/s]
(MapWorker(MapBatches(OfflinePreLearner)) pid=27308) array([...])
...

It seems like MARWIL is running but no rows are actually being read from the dataset — the progress stays stuck at 0.00 row/s indefinitely.

Could you help me understand what might be causing this?

I’ve updated my full code here:
https://github.com/IdairaRodYanez/RLlib-experiments/blob/main/create_offline_dataset.py
(You can check the latest commit for the most recent version.)

I suspect it could be related to how the Parquet dataset is being written or how nested structures (like my Dict action space) are stored, but I’m not sure which part is preventing Ray from reading the rows properly.

Any insights on what could cause Ray’s OfflinePreLearner to show 0.00 row/s would be really helpful!

RunLLM · October 10, 2025, 10:47pm

This issue—where MARWIL appears to run but the dataset progress stays at 0.00 row/s—often indicates a problem with how the Parquet file encodes nested structures, especially with Dict action spaces. If the action column contains dicts with numpy arrays or non-JSON-serializable types, Ray Data may silently fail to read or parse the rows, resulting in no data being processed. This is a common pitfall when saving nested dicts with numpy arrays directly to Parquet, as Parquet expects flat, serializable data types (Ray Data Parquet limitations).

To resolve this, ensure that before writing to Parquet, all nested values in your action dicts are converted to native Python types (e.g., lists for Box, ints for Discrete), and avoid numpy arrays or objects. You can preprocess your data like this before saving:

def to_serializable_action(action):
    return {
        "rotate": int(action["rotate"]),
        "thrust": [float(x) for x in action["thrust"]]
    }

for row in data:
    row["actions"] = to_serializable_action(row["actions"])

Then write the dataset as before. This ensures Ray Data can read and batch the rows, and MARWIL will process them. Would you like more detail or a code snippet for re-writing your dataset?

Sources:

Ray Data: Nested structure serialization

Hint: Mention @RunLLM in the post for followups.

Idaira_Rodriguez_Yan · October 10, 2025, 10:58pm

Yes @RunLLM , could you provide me all the code that I should write in train_marwil.py and in create_dataset.py to solve this problem?

RunLLM · October 10, 2025, 10:59pm

It seems likely that the main issues are (1) ensuring your dataset is written with only native Python types (no numpy arrays) for all nested Dict fields, and (2) reading the dataset in a way compatible with RLlib’s MARWIL offline pipeline. Below is a minimal, end-to-end example for both create_dataset.py and train_marwil.py that should work for a Dict action space with Discrete and Box subspaces.

create_dataset.py

import numpy as np
import ray

def to_serializable_action(action):
    return {
        "rotate": int(action["rotate"]),  # Discrete as int
        "thrust": [float(x) for x in action["thrust"]]  # Box as list of floats
    }

def generate_parquet_dataset(num_samples=100, filename="offline_data.parquet"):
    data = []
    for _ in range(num_samples):
        obs = np.random.rand(4).astype(np.float32).tolist()
        action = {
            "rotate": int(np.random.randint(0, 3)),
            "thrust": [float(np.random.uniform(-1, 1))]
        }
        reward = float(np.random.randn())
        new_obs = np.random.rand(4).astype(np.float32).tolist()
        done = bool(np.random.choice([False, True], p=[0.9, 0.1]))
        data.append({
            "obs": obs,
            "actions": to_serializable_action(action),
            "rewards": reward,
            "new_obs": new_obs,
            "dones": done,
        })
    ray.init(ignore_reinit_error=True)
    ds = ray.data.from_items(data)
    ds.write_parquet(filename)
    ray.shutdown()

if __name__ == "__main__":
    generate_parquet_dataset()

train_marwil.py

import gymnasium as gym
import numpy as np
from ray.rllib.algorithms.marwil import MARWILConfig
from ray.tune.registry import register_env

# Minimal custom env for Dict obs/action space
class DummyDictEnv(gym.Env):
    def __init__(self, config=None):
        self.observation_space = gym.spaces.Box(low=0, high=1, shape=(4,), dtype=np.float32)
        self.action_space = gym.spaces.Dict({
            "rotate": gym.spaces.Discrete(3),
            "thrust": gym.spaces.Box(-1, 1, (1,), dtype=np.float32)
        })
        self.state = self.observation_space.sample()
        self.steps = 0

    def reset(self, *, seed=None, options=None):
        self.state = self.observation_space.sample()
        self.steps = 0
        return self.state, {}

    def step(self, action):
        self.state = self.observation_space.sample()
        reward = float(np.random.randn())
        done = self.steps > 10
        self.steps += 1
        return self.state, reward, done, False, {}

register_env("DummyDictEnv", lambda config: DummyDictEnv(config))

config = (
    MARWILConfig()
    .environment(env="DummyDictEnv")
    .offline_data(
        input_="offline_data.parquet",
        input_read_episodes=False,  # For tabular data
    )
    .training(train_batch_size_per_learner=32)
)

if __name__ == "__main__":
    algo = config.build()
    for i in range(5):
        result = algo.train()
        print(f"Iteration {i}: reward={result['episode_reward_mean']}")
    algo.stop()

This setup should avoid serialization issues and allow MARWIL to process your offline data. If you use a different observation/action space, adjust the env and dataset accordingly.

Would you like more detail on customizing the environment or dataset schema?

Sources:

Ray Data: Nested structure serialization

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Trainer.compute_action Error with Dict type observation inputs RLlib	4	899	December 12, 2020
RLlib and gym.space RLlib	4	740	November 14, 2021
Initial action for Dict action space RLlib	5	1351	July 23, 2021
Using random action policy with dict action space RLlib	0	279	April 12, 2021
Error: Custom observation Space not treated correctly RLlib	5	1047	July 15, 2021

MARWIL with gymnasium Dict as action Space

Related topics