How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I had a RNN model inheriting modelV2 that had worked well with ray 2.2.
In ray 2.9, I set the option as required:
config.experimental(_enable_new_api_stack=False).build()
The configuration can be built, but the problem occurred when I call
tuner = tune.Tuner(“PPO”, param_space=config, run_config=run_config, )
The error information is attached below here, and I tried to look into the functions that have been mentioned. I noticed that the “sample_batches_by_policy” did not contain “state_out_1” when running “ray/rllib/evaluation/env_runner_v2.py”. When calling the next function build_for_inference at line 326 of ray/rllib/connectors/agent/view_requirement.py, self.view_requirements created “state_in_1” with an empty list, which finally caused the IndexError.
self.view_requirements[‘state_in_1’] viewed in debug mode looks like this:
ViewRequirement(data_col='state_out_1', space=Box(-1.0, 1.0, (256,), float32), shift=-1, index=None, batch_repeat_value=20, used_for_compute_actions=True, used_for_training=True, shift_arr=array([-1]))
Please advise how to proceed! I am willing to provide more information.
2024-02-05 05:07:41,742 ERROR tune_controller.py:1374 -- Trial task failed for trial PPO_MultiAgentArena_v3_85c05_00000
Traceback (most recent call last):
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/_private/worker.py", line 2624, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(IndexError): ray::PPO.train() (pid=987409, ip=10.47.57.189, actor_id=da518257234fa0c302d5fd4d01000000, repr=PPO)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 342, in train
raise skipped from exception_cause(skipped)
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 339, in train
result = self.step()
^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 852, in step
results, train_iter_ctx = self._run_one_training_iteration()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 3042, in _run_one_training_iteration
results = self.training_step()
^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 407, in training_step
train_batch = synchronous_parallel_sample(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/execution/rollout_ops.py", line 83, in synchronous_parallel_sample
sample_batches = worker_set.foreach_worker(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/worker_set.py", line 705, in foreach_worker
handle_remote_call_result_errors(remote_results, self._ignore_worker_failures)
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/worker_set.py", line 78, in handle_remote_call_result_errors
raise r.get()
ray.exceptions.RayTaskError(IndexError): ray::RolloutWorker.apply() (pid=987409, ip=10.47.57.189, actor_id=d64b201bd95cea973cd5da4701000000, repr=<ray.rllib.evaluation.rollout_worker._modify_class.<locals>.Class object at 0x7fd832842e10>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/utils/actor_manager.py", line 189, in apply
raise e
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/utils/actor_manager.py", line 178, in apply
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/execution/rollout_ops.py", line 84, in <lambda>
lambda w: w.sample(), local_worker=False, healthy_only=True
^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 694, in sample
batches = [self.input_reader.next()]
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/sampler.py", line 91, in next
batches = [self.get_data()]
^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/sampler.py", line 276, in get_data
item = next(self._env_runner)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 344, in run
outputs = self.step()
^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 370, in step
active_envs, to_eval, outputs = self._process_observations(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 637, in _process_observations
processed = policy.agent_connectors(acd_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/connectors/agent/pipeline.py", line 41, in __call__
ret = c(ret)
^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/connectors/connector.py", line 265, in __call__
return [self.transform(d) for d in acd_list]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/connectors/connector.py", line 265, in <listcomp>
return [self.transform(d) for d in acd_list]
^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/connectors/agent/view_requirement.py", line 118, in transform
sample_batch = agent_collector.build_for_inference()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/collectors/agent_collector.py", line 366, in build_for_inference
self._cache_in_np(np_data, data_col)
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/collectors/agent_collector.py", line 613, in _cache_in_np
cache_dict[key] = [_to_float_np_array(d) for d in self.buffers[key]]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/collectors/agent_collector.py", line 613, in <listcomp>
cache_dict[key] = [_to_float_np_array(d) for d in self.buffers[key]]
^^^^^^^^^^^^^^^^^^^^^
File "/home/lime/miniconda3/envs/ray29/lib/python3.11/site-packages/ray/rllib/evaluation/collectors/agent_collector.py", line 32, in _to_float_np_array
if torch and torch.is_tensor(v[0]):
~^^^
IndexError: list index out of range