How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi,
I am following the documentation (Environments — Ray 2.0.0) to build a custom MultiAgentEnv
(which inherits from ray.rllib.env.multi_agent_env — Ray 2.0.0).
At each step t
, I return obs, rew, done, info
, where obs
contains the observations of the agents which will need to take an action at t+1
(i.e. it does not contain any keys for agents which are done), rew
and done
contain the reward and done variables for step t
(i.e. it can contain agent keys which at step t
became done), and info
contains agent keys which are in obs
.
However, at the final terminal step in my episode, I am getting the following error from RLlib:
2022-09-22 21:28:15,425 ERROR worker.py:399 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::RolloutWorker.sample() (pid=991951, ip=128.40.41.23, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f707ebe1b20>)
File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 806, in sample
batches = [self.input_reader.next()]
File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
batches = [self.get_data()]
File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 282, in get_data
item = next(self._env_runner)
File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 684, in _env_runner
active_envs, to_eval, outputs = _process_observations(
File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/sampler.py", line 1041, in _process_observations
ma_sample_batch = sample_collector.postprocess_episode(
File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/collectors/simple_list_collector.py", line 435, in postprocess_episode
pre_batch = collector.build_for_training(policy.view_requirements)
File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/ray/rllib/evaluation/collectors/agent_collector.py", line 395, in build_for_training
shifted_data_np = np.stack(shifted_data, 0)
File "<__array_function__ internals>", line 180, in stack
File "/scratch/zciccwf/py36/envs/nmmo/lib/python3.9/site-packages/numpy/core/shape_base.py", line 426, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape
This is how my obs, rew, done, info
returned by env.step()
is structured at this final terminal step which is causing the error:
(RolloutWorker pid=991951) RLlibMultiAgentTeamBasedEnv obs keys: 0 dict_keys([])
(RolloutWorker pid=991951) RLlibMultiAgentTeamBasedEnv rew: 1 {20: -1}
(RolloutWorker pid=991951) RLlibMultiAgentTeamBasedEnv done: 2 {20: True, '__all__': True}
(RolloutWorker pid=991951) RLlibMultiAgentTeamBasedEnv info keys: 0 dict_keys([])
My observation_space
is:
Dict(Entity:Dict(Continuous:Box(-1048576.0, 1048576.0, (100, 24), float32), Discrete:Box(0, 4096, (100, 5), int32), N:Box(0, 100, (1,), int32)), Item:Dict(Continuous:Box(-1048576.0, 1048576.0, (170, 16), float32), Discrete:Box(0, 4096, (170, 3), int32), N:Box(0, 170, (1,), int32)), Market:Dict(Continuous:Box(-1048576.0, 1048576.0, (170, 16), float32), Discrete:Box(0, 4096, (170, 3), int32), N:Box(0, 170, (1,), int32)), Tile:Dict(Continuous:Box(-1048576.0, 1048576.0, (225, 4), float32), Discrete:Box(0, 4096, (225, 3), int32), N:Box(0, 15, (1,), int32))), 'action_space': Dict(<class 'nmmo.io.action.Attack'>:Dict(<class 'nmmo.io.action.Style'>:Discrete(3), <class 'nmmo.io.action.Target'>:Discrete(100)), <class 'nmmo.io.action.Buy'>:Dict(<class 'nmmo.io.action.Item'>:Discrete(170)), <class 'nmmo.io.action.Comm'>:Dict(<class 'nmmo.io.action.Token'>:Discrete(170)), <class 'nmmo.io.action.Move'>:Dict(<class 'nmmo.io.action.Direction'>:Discrete(4)), <class 'nmmo.io.action.Sell'>:Dict(<class 'nmmo.io.action.Item'>:Discrete(170), <class 'nmmo.io.action.Price'>:Discrete(100)), <class 'nmmo.io.action.Use'>:Dict(<class 'nmmo.io.action.Item'>:Discrete(170)))
My question is: How should obs, rew, done, info
be structured at the terminal step to avoid this error? From the documentation, which says the keys in obs
can change, I thought what I had done would be fine, but RLlib seems to expect the observation data to be consistently shaped.