Hierarchical training error

christopher · March 12, 2025, 8:53pm

Hi,

I’m doing hierarchical training with high level and low level policy on a custom environment. Only high level is trainable. Using r2d2 algorithm. Getting this error:

_buffers\multi_agent_replay_buffer.py", line 260, in _add_to_underlying_buffer

timeslices = timeslice_along_seq_lens_with_overlap(

File “C:\Users.…\AppData\Local\anaconda3\envs\ray_271\lib\site-packages\ray\rllib\policy\rnn_sequencing.py”, line 536, in timeslice_along_seq_lens_with_overlap

is_last_episode_ids = eps_ids == eps_ids[-1]

IndexError: index -1 is out of bounds for axis 0 with size 0

eps_ids: [174405590013387373 174405590013387373 174405590013387373

…

eps_ids: [174405590013387373 174405590013387373 174405590013387373

eps_ids suddenly becomes empty list. And above error occurs.

i’ve ran unit test on step() logic and it passes.

ray version

ray 2.7.1

gymnasium 1.0.0

python 3.9.0

I’ve gotten hierarchical training to work for a simple windy maze environment with modifications to env and train script to make it as similar as my own custom environment as possible (using r2d2, high level policy trainable, low level policy not trainable):

github.com/ray-project/ray

rllib/examples/hierarchical_training.py

ray-2.7.1


      
                  param_space=(
                      PPOConfig()
                      .environment(WindyMazeEnv)
                      .rollouts(num_rollout_workers=0)
                      .framework(args.framework)
                  ).to_dict(),
              ).fit()
          else:
              maze = WindyMazeEnv(None)
          
              def policy_mapping_fn(agent_id, episode, worker, **kwargs):
                  if agent_id.startswith("low_level_"):
                      return "low_level_policy"
                  else:
                      return "high_level_policy"
          
              config = (
                  PPOConfig()
                  .environment(HierarchicalWindyMazeEnv)
                  .framework(args.framework)
                  .rollouts(num_rollout_workers=0)

github.com/ray-project/ray

rllib/examples/env/windy_maze_env.py

ray-2.7.1

import gymnasium as gym
from gymnasium.spaces import Box, Discrete, Tuple
import logging
import random

from ray.rllib.env import MultiAgentEnv

logger = logging.getLogger(__name__)

# Agent has to traverse the maze from the starting position S -> F
# Observation space [x_pos, y_pos, wind_direction]
# Action space: stay still OR move in current wind direction
MAP_DATA = """
#########
#S      #
####### #
      # #
      # #
####### #
#F      #

This file has been truncated. show original

Thanks for any help on understanding what is going on and root cause why eps_ids is empty array. Welcome suggestions for something else that I can try to debug this/do process of elimination on.

@mannyv

@christina

@sven1977

Topic		Replies	Views
Error occuring repeatedly during training at random time steps RLlib	3	69	January 3, 2025
Error when setting done=true: eval_data[i].env_id yields IndexError: list index out of range RLlib	2	872	February 18, 2021
BUG: Error: IndexError: list index out of range in env_runner_v2.py Configure Algorithm, Training, Evaluation, Scaling	0	133	April 24, 2024
Error in AlphaZero algorithm: The actor died because of an error raised in its creation task RLlib	1	453	May 24, 2023
Error: IndexError: list index out of range RLlib	1	294	March 22, 2024

Hierarchical training error

Related topics