1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
[* ] Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.43.0
- Python version: 3.12
- OS: Ubuntu 22.04
- Cloud/Infrastructure: N/A
- Other libs/tools (if relevant): N/A
3. What happened vs. what you expected:
- Expected: I am training an SAC agent on a custom environment (as shown in the code snippet below). I have put the
algo.train()
in a for loop for 5000 epochs of training.
SACConfig()
#.environment("LunarLanderContinuous-v3")
.environment(env = "LiveStreaming",
env_config=env_config,
disable_env_checking=True)
#.env_runners(num_env_runners=5)
.resources(num_gpus=1)
.learners(num_learners=1,
num_gpus_per_learner=1,
)
.framework("torch")
.training(actor_lr = CONFIG['sac']['actor_lr'],
critic_lr = CONFIG['sac']['critic_lr'],
gamma = CONFIG['sac']['gamma'],
train_batch_size = CONFIG['sac']['batch_size'],
tau = CONFIG['sac']['tau'],
initial_alpha=0.2,
target_entropy='auto',
twin_q=True,
num_steps_sampled_before_learning_starts=180*500,
)
.rl_module(model_config={
"fcnet_hiddens": [512, 256, 128, 64],
"fcnet_activation":"tanh"
})
.evaluation(evaluation_num_workers=1,
evaluation_interval=CONFIG['training']['evaluation_interval'])
- Actual: What I see is that, after several variable number of epochs, the training exists with an error that points to somewhere in the
prioritized_episode_replay_buffer.py
code. The error is pasted below. Different runs of the training code results in the same KeyError of 2097151 but it pops up after different number of training epochs.
File "/home/cc/workspace/project/train_rllib.py", line 237, in <module>
main(CONFIG['general']['total_episodes'])
File "/home/cc/workspace/project/train_rllib.py", line 223, in main
train_result = algo.train()
^^^^^^^^^^^^
File "/home/cc/.pyenv/versions/project/lib/python3.12/site-packages/ray/tune/trainable/trainable.py", line 330, in train
raise skipped from exception_cause(skipped)
File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/tune/trainable/trainable.py", line 327, in train
result = self.step()
^^^^^^^^^^^
File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/rllib/algorithms/algorithm.py", line 964, in step
train_results, train_iter_ctx = self._run_one_training_iteration()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/rllib/algorithms/algorithm.py", line 2991, in _run_one_training_iteration
training_step_return_value = self.training_step()
^^^^^^^^^^^^^^^^^^^^
File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/rllib/algorithms/dqn/dqn.py", line 630, in training_step
return self._training_step_new_api_stack()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/rllib/algorithms/dqn/dqn.py", line 674, in _training_step_new_api_stack
episodes = self.local_replay_buffer.sample(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cc/.pyenv/versions/livegabr/lib/python3.12/site-packages/ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py", line 477, in sample
index_triple = self._indices[self._tree_idx_to_sample_idx[idx]]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
KeyError: 2097151
Traceback (most recent call last):