Error occuring repeatedly during training at random time steps

@Lars_Simon_Zehnder Thank you very much for your response.

Unfortunately, I am not able to share a reproducable example, since the example would contain other code that is part of ongoing research.

However, I tried to train a model without using tune (i.e. only using algo.train()) and got a similar error during trainig:

C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\utils\replay_buffers\prioritized_replay_buffer.py:140: RuntimeWarning: divide by zero encountered in scalar power
  weight = (p_sample * len(self)) ** (-beta)
Traceback (most recent call last):
  File "c:\Users\ahmi_ke\Git\gym-pfc-ks\gym_pfc_ks\envs\pfc_ks_v2\pfc_ks_v2_train.py", line 50, in <module>
    result = algo.train()
  File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\tune\trainable\trainable.py", line 331, in train
    raise skipped from exception_cause(skipped)
  File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\tune\trainable\trainable.py", line 328, in train
    result = self.step()
  File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 949, in step
    train_results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 3598, in _run_one_training_iteration
    training_step_results = self.training_step()
  File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\algorithms\sac\sac.py", line 578, in training_step
    return self._training_step_old_and_hybrid_api_stack()
  File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\algorithms\dqn\dqn.py", line 847, in _training_step_old_and_hybrid_api_stack
    train_batch = sample_min_n_steps_from_buffer(
  File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\utils\replay_buffers\utils.py", line 193, in sample_min_n_steps_from_buffer
    batch = replay_buffer.sample(num_items=1)
  File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\utils\replay_buffers\multi_agent_replay_buffer.py", line 331, in sample
    samples[policy_id] = replay_buffer.sample(num_items, **kwargs)
  File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\utils\replay_buffers\prioritized_replay_buffer.py", line 141, in sample
    count = self._storage[idx].count
IndexError: list index out of range

In the first and second line, it now also states the following: RuntimeWarning: divide by zero encountered in scalar power & weight = (p_sample * len(self)) ** (-beta). It seems like either p_sample is zero or len(self) returns zero. In which cases does this happen?

I also upgraded to rllib 2.40 and tried it with the new API stack as you suggested (i.e. with enable_rl_module_and_learner=True &enable_env_runner_and_connector_v2=True). I get a similar error (i.e. RuntimeWarning: divide by zero encountered in scalar power & weight = (p_sample * self.get_num_timesteps()) ** (-beta)) and also a KeyError:

C:\ProgramData\Miniforge3\envs\rllib-2-40\lib\site-packages\ray\rllib\utils\replay_buffers\prioritized_episode_buffer.py:413: RuntimeWarning: divide by zero encountered in scalar power
  weight = (p_sample * self.get_num_timesteps()) ** (-beta)
Traceback (most recent call last):
  File "c:\Users\ahmi_ke\Git\gym-pfc-ks\gym_pfc_ks\envs\pfc_ks_v2\pfc_ks_v2_train.py", line 49, in <module>
    result = algo.train()
  File "C:\ProgramData\Miniforge3\envs\rllib-2-40\lib\site-packages\ray\tune\trainable\trainable.py", line 331, in train
    raise skipped from exception_cause(skipped)
  File "C:\ProgramData\Miniforge3\envs\rllib-2-40\lib\site-packages\ray\tune\trainable\trainable.py", line 328, in train
    result = self.step()
  File "C:\ProgramData\Miniforge3\envs\rllib-2-40\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 936, in step
    train_results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\ProgramData\Miniforge3\envs\rllib-2-40\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 3201, in _run_one_training_iteration
    training_step_return_value = self.training_step()
  File "C:\ProgramData\Miniforge3\envs\rllib-2-40\lib\site-packages\ray\rllib\algorithms\sac\sac.py", line 609, in training_step
    return self._training_step_new_api_stack(with_noise_reset=False)
  File "C:\ProgramData\Miniforge3\envs\rllib-2-40\lib\site-packages\ray\rllib\algorithms\dqn\dqn.py", line 707, in _training_step_new_api_stack
    episodes = self.local_replay_buffer.sample(
  File "C:\ProgramData\Miniforge3\envs\rllib-2-40\lib\site-packages\ray\rllib\utils\replay_buffers\prioritized_episode_buffer.py", line 415, in sample
    index_triple = self._indices[self._tree_idx_to_sample_idx[idx]]
KeyError: 2097151

I would like to emphasize that the training runs for a certain period of time (i.e. approx. 50000 environment steps) and that the checkpoints that are saved up to that point also achieve good to very good performances in the environment. However, I would still like to train a bit longer in order to achieve an even better performance. Therefore, this is a bit frustrating, and I would be very grateful for any help.