Assert agent_key not in self.agent_collectors

GattiPinheiro · March 31, 2021, 12:35pm

I’m using ray/rllib 1.2.0 and I’m hitting this error after 100 training iterations.

 Failure # 1 (occurred at 2021-03-31_11-10-22)
Traceback (most recent call last):
  File "/opt/miniconda/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 586, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/opt/miniconda/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 609, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/opt/miniconda/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
  File "/opt/miniconda/lib/python3.7/site-packages/ray/worker.py", line 1456, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AssertionError): e[36mray::PPO.train_buffered()e[39m (pid=264, ip=10.1.0.8)
  File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
  File "/opt/miniconda/lib/python3.7/site-packages/ray/tune/trainable.py", line 167, in train_buffered
    result = self.train()
  File "/opt/miniconda/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 526, in train
    raise e
  File "/opt/miniconda/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 515, in train
    result = Trainable.train(self)
  File "/opt/miniconda/lib/python3.7/site-packages/ray/tune/trainable.py", line 226, in train
    result = self.step()
  File "/opt/miniconda/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 157, in step
    evaluation_metrics = self._evaluate()
  File "/opt/miniconda/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 778, in _evaluate
    for w in self.evaluation_workers.remote_workers()
  File "/opt/miniconda/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
ray.exceptions.RayTaskError(AssertionError): e[36mray::RolloutWorker.sample()e[39m (pid=375, ip=10.1.0.8)
  File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
  File "/opt/miniconda/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 662, in sample
    batches = [self.input_reader.next()]
  File "/opt/miniconda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 95, in next
    batches = [self.get_data()]
  File "/opt/miniconda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 224, in get_data
    item = next(self.rollout_provider)
  File "/opt/miniconda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 620, in _env_runner
    sample_collector=sample_collector,
  File "/opt/miniconda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 1198, in _process_observations_w_trajectory_view_api
    new_episode.length - 1, filtered_obs)
  File "/opt/miniconda/lib/python3.7/site-packages/ray/rllib/evaluation/collectors/simple_list_collector.py", line 487, in add_init_obs
    assert agent_key not in self.agent_collectors
AssertionError

There is nothing very particular going on in my code, except few customizations (model, env, action distribution). I’ve checked memory usage and it is mostly below 20%. Does anyone know what it can be? Shall I open a github ticket?

sven1977 · March 31, 2021, 12:48pm

Hey @GattiPinheiro . Could you try this on the latest master? This was fixed and merged yesterday.
PR: [RLlib] Issue: Agent_id -> Policy_id mapping should not need to be fixed between episodes. by sven1977 · Pull Request #15020 · ray-project/ray · GitHub

GattiPinheiro · April 1, 2021, 7:00am

Hello,

thanks for the feedback! I’m trying to install the nightly run following instructions from Installation, i.e.

$ python -m pip install https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl

but I can’t make it work. The job is in the PENDING state forever without any errors. The best feedback I have is the following warning

/opt/miniconda/lib/python3.7/site-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via pip install ‘ray[full]’. Please update your install command. "update your install command.", FutureWarning)

What am I missing?

sven1977 · April 9, 2021, 7:04pm

I think this is a warning you can ignore. If the dashboard is not installed, you can always (re)start ray on your server via:
ray start --head --include-dashboard false

GattiPinheiro · April 14, 2021, 11:13am

Hello @sven1977, I finally had some time to do the testing. As suggested, I installed ray from the today’s nightly build, but the error is still being raised after a couple of training iterations. I will try to downgrade to version 1.1.0 to check if it helps.

GattiPinheiro · April 14, 2021, 1:29pm

I created the issue at [rllib] Assert agent_key not in self.agent_collectors · Issue #15297 · ray-project/ray · GitHub. Good news is that I can reproduce the issue using only standard optimization algorithms and environment implementation.

stefanbschneider · June 4, 2021, 9:35am

I also run into the same error message. I added some comments/observations/questions in the issue.

ekblad · October 7, 2021, 10:22pm

I am also running into this. I added a comment in the issue.

Topic		Replies	Views
RayTaskError(AttributeError) : ray::RolloutWorker.par_iter_next() RLlib	12	1403	February 21, 2022
'AgentId' object has no attribute 'shape' RLlib	2	443	May 13, 2021
Error when setting done=true: eval_data[i].env_id yields IndexError: list index out of range RLlib	2	841	February 18, 2021
Action masking Problem RLlib	0	357	July 11, 2022
Cannot concat data under key RLlib	1	336	January 11, 2024

Assert agent_key not in self.agent_collectors

Related topics