How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am running into the following error (which seems to be stemming from a function call within the package version: ray 2.8.1 pypi_0 pypi
) and am not sure what I need to do to fix this. Any suggestions would be greatly appreciated.
PPO pid=14214) 2023-12-03 19:57:56,079 INFO algorithm_config.py:3679 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(PPO pid=14214) Trainable.setup took 17.078 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(PPO pid=14214) Install gputil for GPU system monitoring.
(PPO pid=14214) 2023-12-03 19:57:54,368 INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `action_prob` to view-reqs. [repeated 2x across cluster]
(PPO pid=14214) 2023-12-03 19:57:54,368 INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `action_logp` to view-reqs. [repeated 2x across cluster]
(PPO pid=14214) 2023-12-03 19:57:54,369 INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs. [repeated 2x across cluster]
(PPO pid=14214) 2023-12-03 19:57:54,369 INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `vf_preds` to view-reqs. [repeated 2x across cluster]
(PPO pid=14214) 2023-12-03 19:57:54,369 INFO dynamic_tf_policy_v2.py:722 -- Testing `postprocess_trajectory` w/ dummy batch. [repeated 2x across cluster]
(RolloutWorker pid=14263) 2023-12-03 19:57:56,310 INFO rollout_worker.py:690 -- Generating sample batch of size 100
2023-12-03 19:57:56,446 ERROR tune_controller.py:1383 -- Trial task failed for trial PPO_meltingpot_32fc7_00000
Traceback (most recent call last):
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/_private/worker.py", line 2563, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): ray::PPO.train() (pid=14214, ip=192.168.1.15, actor_id=17553ad788b3f5db04b48e7d01000000, repr=PPO)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 342, in train
raise skipped from exception_cause(skipped)
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 339, in train
result = self.step()
^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 853, in step
results, train_iter_ctx = self._run_one_training_iteration()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 2854, in _run_one_training_iteration
results = self.training_step()
^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 429, in training_step
train_batch = synchronous_parallel_sample(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/execution/rollout_ops.py", line 85, in synchronous_parallel_sample
sample_batches = worker_set.foreach_worker(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/evaluation/worker_set.py", line 680, in foreach_worker
handle_remote_call_result_errors(remote_results, self._ignore_worker_failures)
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/evaluation/worker_set.py", line 76, in handle_remote_call_result_errors
raise r.get()
ray.exceptions.RayTaskError(TypeError): ray::RolloutWorker.apply() (pid=14263, ip=192.168.1.15, actor_id=c07aa3e62e7bae9ab5f2e48301000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f3cc999c550>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/utils/actor_manager.py", line 185, in apply
raise e
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/utils/actor_manager.py", line 176, in apply
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in <lambda>
lambda w: w.sample(), local_worker=False, healthy_only=True
^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/evaluation/rollout_worker.py", line 696, in sample
batches = [self.input_reader.next()]
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
batches = [self.get_data()]
^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/evaluation/sampler.py", line 277, in get_data
item = next(self._env_runner)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 344, in run
outputs = self.step()
^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 370, in step
active_envs, to_eval, outputs = self._process_observations(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 536, in _process_observations
policy_id: PolicyID = episode.policy_for(agent_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/anaconda3/envs/mp/lib/python3.11/site-packages/ray/rllib/evaluation/episode_v2.py", line 120, in policy_for
policy_id = self._agent_to_policy[agent_id] = self.policy_mapping_fn(
^^^^^^^^^^^^^^^^^^^^^^^
TypeError: get_config.<locals>.policy_mapping_fn() takes 1 positional argument but 2 were given
(PPO pid=14214) 2023-12-03 19:57:56,441 ERROR actor_manager.py:500 -- Ray error, taking actor 1 out of service. ray::RolloutWorker.apply() (pid=14263, ip=192.168.1.15, actor_id=c07aa3e62e7bae9ab5f2e48301000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f3cc999c550>)