I’ve been having some issues with getting a custom env and model to run. The observation spaces and action spaces are both tuples of simpler spaces. The bug is that during learning, some batches have a length of 0. These batches have seq_len
that are normal and containing correct sizes, but the data (obs, act, prev_act…) all have batch sizes of 0. This exact effect can be traced back to this line in _slice
method of SampleBatch
:
data = tree.map_structure_with_path(map_, self)
The function map_
uses the lower bound and upper bound start:stop
for data and start_seq_len:stop_seq_len
for seq_len
. By manually running this a few times, I found that despite start and stop never being identical, this line sometimes still produces batch sizes of 0. Here is the code I used to print diagnostic information right after this line:
print(start, stop)
print(len(data))
print(data[SampleBatch.OBS][0].shape)
And this is one output showing the discrepancy in size:
(PPO pid=29078) 4700 4820
(PPO pid=29078) 120
(PPO pid=29078) (120, 210, 160, 3)
(PPO pid=29078) 5520 5660
(PPO pid=29078) 121
(PPO pid=29078) (0, 210, 160, 3) # Batch size becomes 0
Could this be a bug from my own code? Or is this a bug from inside RLLib about handling complex space? I’m not sure if I should keep looking into the tree
package or in my own code. Any comments/tips would be helpful here, thanks a lot!!