Error when setting done=true: eval_data[i].env_id yields IndexError: list index out of range

Hello,

I have upgraded ray from 0.8.0 to 2.0.0.dev and am now getting this error while training in my multi-agent environment:

  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 1468, in _process_policy_eval_results
    env_id: int = eval_data[i].env_id
IndexError: list index out of range
Full stack
2021-02-12 23:14:51,675 ERROR trial_runner.py:708 -- Trial PPO_0_train_and_sgd_batch_sizes=1000: Error processing event.
Traceback (most recent call last):
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 678, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 597, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/worker.py", line 1458, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(IndexError): ray::PPO.train_buffered() (pid=81810, ip=172.20.10.3)
  File "python/ray/_raylet.pyx", line 486, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/tune/trainable.py", line 167, in train_buffered
    result = self.train()
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 535, in train
    raise e
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 524, in train
    result = Trainable.train(self)
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/tune/trainable.py", line 226, in train
    result = self.step()
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 148, in step
    res = next(self.train_exec_impl)
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 756, in __next__
    return next(self.built_iterator)
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  [Previous line repeated 1 more time]
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 876, in apply_flatten
    for item in it:
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 828, in add_wait_hooks
    item = next(it)
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  [Previous line repeated 1 more time]
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 471, in base_iterator
    yield ray.get(futures, timeout=timeout)
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
ray.exceptions.RayTaskError(IndexError): ray::RolloutWorker.par_iter_next() (pid=81809, ip=172.20.10.3)
  File "python/ray/_raylet.pyx", line 486, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/util/iter.py", line 1152, in par_iter_next
    return next(self.local_it)
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 327, in gen_rollouts
    yield self.sample()
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 678, in sample
    batches = [self.input_reader.next()]
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 98, in next
    batches = [self.get_data()]
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 232, in get_data
    item = next(self.rollout_provider)
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 694, in _env_runner
    sample_collector=sample_collector,
  File "/Users/nathan/opt/anaconda3/envs/cc/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 1468, in _process_policy_eval_results
    env_id: int = eval_data[i].env_id
IndexError: list index out of range

The value of no_done_at_end doesn’t seem to change much. The error happens when I set done[rl_id] = True for an agent. It is failing around here:

    actions: List[EnvActionType] = unbatch(actions)
    # type: int, EnvActionType
    for i, action in enumerate(actions):
        # Clip if necessary.
        if clip_actions:
            clipped_action = clip_action(action,
                                         policy.action_space_struct)
        else:
            clipped_action = action

        env_id: int = eval_data[i].env_id

It seems that the code has actions even for agents that are done, ie. len(actions) = len(eval_data) + n_dones where n_dones is the number of dones[rl_id] that I set to true in that iteration. Leading to the index error.

I have already spent quite some time trying to debug that so I figured I would ask here, in case it is something trivial that changed when upgrading ray.

Thanks!

Edit: I ended up setting config['no_done_at_end'] = True and removing all the dones[rl_id] = True that were in my code. Now the error still happens but much less frequently and doesn’t seem systematic, my environment sometimes has time to do several episodes/resets before it happens. But still always happens within the first 10 minutes of training, always right after a reset.

Alright, I finally figured it out.

If a new agent enters the environment at the very last step of the episode, it doesn’t have a last observation and sample collector adds an initial observation (line 1099 of rllib/evaluation/sampler.py). Thus the _add_to_next_inference_call function is called right before the reset and appends one element to self.forward_pass_agent_keys[pid] (in rllib/evaluation/collectors/simple_list_collector.py line ~775. Then the reset is called, and an initial observation is given to all the new agents, which results in self.forward_pass_agent_keys[pid] being of len n+1 when there’s only n agents in the environment, which then propagates into the error.

Not sure if I did something wrong with the agent’s dones or if it’s a bug though. As a temporary fix, I’m not adding the agent id to the states/rewards/dones/info data if we’re at the last step of the episode and it’s a new agent.

1 Like

Hey @nathanlct , yeah, there was a similar bug in RLlib that was fixed here:

I think this should fix your problem as well. Yes, it happened when a new(!) agent enters the episode and at the same time step, the episode terminates, such that this agent has an initial obs, but no action had to be calculated.