How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hello ! I’m trying to use RLLib new API on a custom multi-agent env (although there is only one agent for now).
My observation is an image with multiple channels, and I managed to train it using PPO and the default parameters and NN architecture (so CNN encoder and MLP heads for policy and value function).
However, I would now like to add more info in my observations, in the form of a vector (for instance some onehot encoding). What I want to do is something simple, along what SB3 does for dict obs spaces (sorry cannot put a link as new users are restricted with 2 links max per post).
So I updated my obs space to be a dict with two components, one for the image part to be processed by a CNN, and one for the vector part to be processed by an MLP. I did not find any existing examples in RLLib new API, so I tried to write a custom RLModule, based on the TinyAtariCNN example.
I wanted to do something like this in the _forward() method of RLModule:
obs_img = batch[Columns.OBS][“image”]
obs_vec = batch[Columns.OBS][“vector”]
then process stuff using torch.nn
However, an error occurs before anything on the RLModule is called :
(MultiAgentEnvRunner pid=164203) 2024-12-04 15:57:31,156 ERROR actor_manager.py:187 – Worker exception caught during apply()
: all input arrays must have the same shape
(MultiAgentEnvRunner pid=164203) Traceback (most recent call last):
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py”, line 183, in apply
(MultiAgentEnvRunner pid=164203) return func(self, *args, **kwargs)
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/rllib/execution/rollout_ops.py”, line 110, in
(MultiAgentEnvRunner pid=164203) else (lambda w: (w.sample(**random_action_kwargs), w.get_metrics()))
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/util/tracing/tracing_helper.py”, line 467, in _resume_span
(MultiAgentEnvRunner pid=164203) return method(self, *_args, **_kwargs)
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/rllib/env/multi_agent_env_runner.py”, line 179, in sample
(MultiAgentEnvRunner pid=164203) samples = self._sample_timesteps(
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/util/tracing/tracing_helper.py”, line 467, in _resume_span
(MultiAgentEnvRunner pid=164203) return method(self, *_args, **_kwargs)
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/rllib/env/multi_agent_env_runner.py”, line 377, in _sample_timesteps
(MultiAgentEnvRunner pid=164203) self._episode.finalize(drop_zero_len_single_agent_episodes=True)
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/rllib/env/multi_agent_episode.py”, line 794, in finalize
(MultiAgentEnvRunner pid=164203) agent_eps.finalize()
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/rllib/env/single_agent_episode.py”, line 576, in finalize
(MultiAgentEnvRunner pid=164203) self.observations.finalize()
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/rllib/env/utils/infinite_lookback_buffer.py”, line 161, in finalize
(MultiAgentEnvRunner pid=164203) self.data = batch(self.data)
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/rllib/utils/spaces/space_utils.py”, line 373, in batch
(MultiAgentEnvRunner pid=164203) ret = tree.map_structure(lambda *s: np_func(s, axis=0), *list_of_structs)
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/tree/init.py”, line 435, in map_structure
(MultiAgentEnvRunner pid=164203) [func(*args) for args in zip(*map(flatten, structures))])
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/tree/init.py”, line 435, in
(MultiAgentEnvRunner pid=164203) [func(*args) for args in zip(*map(flatten, structures))])
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/ray/rllib/utils/spaces/space_utils.py”, line 373, in
(MultiAgentEnvRunner pid=164203) ret = tree.map_structure(lambda *s: np_func(s, axis=0), *list_of_structs)
(MultiAgentEnvRunner pid=164203) File “/home/adrien/envs/rllib_env/lib/python3.10/site-packages/numpy/core/shape_base.py”, line 449, in stack
(MultiAgentEnvRunner pid=164203) raise ValueError(‘all input arrays must have the same shape’)
(MultiAgentEnvRunner pid=164203) ValueError: all input arrays must have the same shape
The two closest things I could find related to my problem are the ComplexInputNet for the old API stack but this is the old API that I’m not familiar with, and the FlattenObservations connector , but this seems quite complex and I don’t want to flatten anything, just pass it as is to my RLModule and then process it inside.
Does anyone have an idea on how I can handle this ? I’m sure this is a common problem and I might have overlooked some important information, but at this point I feel like I have read the docs and relevant source code in and out, and I’m quite lost… : (
Thanks in advance !