1. Severity of the issue:
High: Completely blocks me.
2. Environment:
- Ray version: 2.48.0
- Python version: 3.11.4
- OS: Windows 11
3. What happened vs. what you expected:
Hello RLlib community,
I’m trying to implement the following example:
https://github.com/ray-project/ray/blob/master/rllib/examples/offline_rl/train_w_bc_finetune_w_ppo.py
but with a few small modifications. Specifically, I want to:
- Use MARWIL instead of BC (which should not be a major change).
- Work with a custom environment.
The particularity of this custom environment is that it uses a Gymnasium Dict both as the observation space and the action space.
To make the error reproducible, I’ve created a minimal custom environment and a simple dummy dataset generator. You can find them here: https://github.com/IdairaRodYanez/RLlib-experiments. First you should execute create_offline_dataset.py and then train_marwil.py.
The error I’m encountering is the following:
File ".../ray/rllib/offline/offline_prelearner.py", line 213, in __call__
episodes: List[SingleAgentEpisode] = self._map_to_episodes( # WHAT DOES THIS LINE DO?
File ".../ray/rllib/offline/offline_prelearner.py", line 438, in _map_to_episodes
else convert(batch[schema[Columns.ACTIONS]][i], action_space)
File ".../ray/rllib/utils/spaces/space_utils.py", line 115, in from_jsonable_if_needed
return space.from_jsonable(sample)[0]
File ".../gymnasium/spaces/dict.py", line 226, in from_jsonable
dict_of_list = {key: space.from_jsonable(sample_n[key]) for key, space in self.spaces.items()}
File ".../gymnasium/spaces/multi_discrete.py", line 189, in from_jsonable
return [np.array(sample, dtype=np.int64) for sample in sample_n]
TypeError: 'int' object is not iterable
This is followed by:
ray::MapBatches(OfflinePreLearner).submit()
...
ray.exceptions.UserCodeException: Failed to process the following data block: {
'obs': array([...], dtype=float32),
'actions': array([{'rotate': 0, 'thrust': array([-0.17720357], dtype=float32)}, ...], dtype=object),
'rewards': array([...]),
'new_obs': array([...], dtype=float32),
'dones': array([...])
}
I’ve been trying to debug this issue to understand what’s going on, but I can’t step into the PlanExecutor logic once the different RayTasks are launched inside ray.data.dataset (within the _executor_to_iterator method).
I don’t understand why the error indicates that it expects a list of elements in multi_discrete.py, since I never define a MultiDiscrete type in my action space. My intuition is that somewhere in RLlib’s internal logic, the space type might be converted from Discrete to MultiDiscrete, but I haven’t been able to debug deeply enough to confirm this.
Does anyone have advice on how to debug these RayTasks, or any idea what could be causing this issue?
Apologies for the long message, and thank you in advance for your help!