PER Buffer throws KeyError during training of SAC

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
[* ] Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

  • Ray version: 2.43.0
  • Python version: 3.12
  • OS: Ubuntu 22.04
  • Cloud/Infrastructure: N/A
  • Other libs/tools (if relevant): N/A

3. What happened vs. what you expected:

  • Expected: I am training an SAC agent on a custom environment (as shown in the code snippet below). I have put the algo.train() in a for loop for 5000 epochs of training.
SACConfig()
            #.environment("LunarLanderContinuous-v3")
            .environment(env = "LiveStreaming",
                          env_config=env_config,
                          disable_env_checking=True)
            #.env_runners(num_env_runners=5)
            .resources(num_gpus=1)
            .learners(num_learners=1,
                      num_gpus_per_learner=1,
                      )
            .framework("torch")
            .training(actor_lr = CONFIG['sac']['actor_lr'],
                      critic_lr = CONFIG['sac']['critic_lr'],
                      gamma = CONFIG['sac']['gamma'],
                      train_batch_size = CONFIG['sac']['batch_size'],
                      tau = CONFIG['sac']['tau'],
                      initial_alpha=0.2,
                      target_entropy='auto',
                      twin_q=True,
                      num_steps_sampled_before_learning_starts=180*500,
                      )
            .rl_module(model_config={
                "fcnet_hiddens": [512, 256, 128, 64],
                "fcnet_activation":"tanh"
            })
            .evaluation(evaluation_num_workers=1,
                        evaluation_interval=CONFIG['training']['evaluation_interval'])
  • Actual: What I see is that, after several variable number of epochs, the training exists with an error that points to somewhere in the prioritized_episode_replay_buffer.py code. The error is pasted below. Different runs of the training code results in the same KeyError of 2097151 but it pops up after different number of training epochs.
  File "/home/cc/workspace/project/train_rllib.py", line 237, in <module>
    main(CONFIG['general']['total_episodes'])
  File "/home/cc/workspace/project/train_rllib.py", line 223, in main
    train_result = algo.train()
                   ^^^^^^^^^^^^
  File "/home/cc/.pyenv/versions/project/lib/python3.12/site-packages/ray/tune/trainable/trainable.py", line 330, in train
    raise skipped from exception_cause(skipped)
  File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/tune/trainable/trainable.py", line 327, in train
    result = self.step()
             ^^^^^^^^^^^
  File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/rllib/algorithms/algorithm.py", line 964, in step
    train_results, train_iter_ctx = self._run_one_training_iteration()
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/rllib/algorithms/algorithm.py", line 2991, in _run_one_training_iteration
    training_step_return_value = self.training_step()
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/rllib/algorithms/dqn/dqn.py", line 630, in training_step
    return self._training_step_new_api_stack()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/rllib/algorithms/dqn/dqn.py", line 674, in _training_step_new_api_stack
    episodes = self.local_replay_buffer.sample(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py", line 477, in sample
    index_triple = self._indices[self._tree_idx_to_sample_idx[idx]]
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
KeyError: 2097151
Traceback (most recent call last):