How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
During training, at random timesteps (mostly between approx. 20000 and 100000 time steps in the environment), the following error occurs:
File "python\ray\_raylet.pyx", line 1859, in ray._raylet.execute_task
File "python\ray\_raylet.pyx", line 1800, in ray._raylet.execute_task.function_executor
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\_private\function_manager.py", line 696, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\tune\trainable\trainable.py", line 331, in train
raise skipped from exception_cause(skipped)
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\tune\trainable\trainable.py", line 328, in train
result = self.step()
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 949, in step
train_results, train_iter_ctx = self._run_one_training_iteration()
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 3598, in _run_one_training_iteration
training_step_results = self.training_step()
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\algorithms\sac\sac.py", line 578, in training_step
return self._training_step_old_and_hybrid_api_stack()
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\algorithms\dqn\dqn.py", line 847, in _training_step_old_and_hybrid_api_stack
train_batch = sample_min_n_steps_from_buffer(
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\utils\replay_buffers\utils.py", line 193, in sample_min_n_steps_from_buffer
batch = replay_buffer.sample(num_items=1)
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\utils\replay_buffers\multi_agent_replay_buffer.py", line 331, in sample
samples[policy_id] = replay_buffer.sample(num_items, **kwargs)
File "C:\ProgramData\Miniforge3\envs\rllib-2-37-nn\lib\site-packages\ray\rllib\utils\replay_buffers\prioritized_replay_buffer.py", line 141, in sample
count = self._storage[idx].count
IndexError: list index out of range
Unfortunately, I do not understand why this error occurs.
I would highly appreciate your help.
I start the training using the following tuner-config:
tune.register_env(my_env, lambda config: myEnv())
config = (
SACConfig()
.environment(my_env)
.training(
optimization_config = {
"actor_learning_rate": tune.grid_search([0.0003, 0.0001, 0.00008, 0.00006, 0.00004]),
}
)
)
tuner = tune.Tuner(
"SAC",
param_space=config,
run_config=train.RunConfig(
name=run_id,
storage_path=storage_path,
stop={"timesteps_total": 200000},
)
)
tuner.fit()
I am using Rllib 2.37.0 with Python 3.9.20.
I also tried other Rllib versions. However, the same errors still occurs.