Hierarchical training error

Hi,

I’m doing hierarchical training with high level and low level policy on a custom environment. Only high level is trainable. Using r2d2 algorithm. Getting this error:

_buffers\multi_agent_replay_buffer.py", line 260, in _add_to_underlying_buffer

timeslices = timeslice_along_seq_lens_with_overlap(

File “C:\Users.…\AppData\Local\anaconda3\envs\ray_271\lib\site-packages\ray\rllib\policy\rnn_sequencing.py”, line 536, in timeslice_along_seq_lens_with_overlap

is_last_episode_ids = eps_ids == eps_ids[-1]

IndexError: index -1 is out of bounds for axis 0 with size 0


eps_ids: [174405590013387373 174405590013387373 174405590013387373

eps_ids: [174405590013387373 174405590013387373 174405590013387373

eps_ids suddenly becomes empty list. And above error occurs.

i’ve ran unit test on step() logic and it passes.

ray version

ray 2.7.1

gymnasium 1.0.0

python 3.9.0

I’ve gotten hierarchical training to work for a simple windy maze environment with modifications to env and train script to make it as similar as my own custom environment as possible (using r2d2, high level policy trainable, low level policy not trainable):

Thanks for any help on understanding what is going on and root cause why eps_ids is empty array. Welcome suggestions for something else that I can try to debug this/do process of elimination on.

@mannyv

@christina

@sven1977