I’m trying to run the AlphaZero examples shown here: https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#alphazero and when I try to execute them I get the following error:
runfile('/home/joaquin/TFM/Doom_RL/RLlib_test.py', wdir='/home/joaquin/TFM/Doom_RL')
True
2023-05-16 01:35:36,559 INFO worker.py:1616 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
2023-05-16 01:35:40,942 INFO tune.py:218 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
<IPython.core.display.HTML object>
(raylet) E0516 01:35:40.985175255 199150 fork_posix.cc:76] Other threads are currently calling into gRPC, skipping fork() handlers
(AlphaZero pid=199655) 2023-05-16 01:35:44,420 WARNING algorithm_config.py:635 -- Cannot create AlphaZeroConfig from given `config_dict`! Property __stdout_file__ not supported.
(AlphaZero pid=199655) 2023-05-16 01:35:44,667 INFO algorithm.py:527 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(AlphaZero pid=199655) 2023-05-16 01:35:49,448 ERROR actor_manager.py:507 -- Ray error, taking actor 1 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199798, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0da1f57d30>)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
(AlphaZero pid=199655) self._update_policy_map(policy_dict=self.policy_dict)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
(AlphaZero pid=199655) self._build_policy_map(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
(AlphaZero pid=199655) new_policy = create_policy_for_framework(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
(AlphaZero pid=199655) return policy_class(observation_space, action_space, merged_config)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
(AlphaZero pid=199655) super().__init__(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
(AlphaZero pid=199655) self.env = self.env_creator()
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
(AlphaZero pid=199655) return env_cls(config["env_config"])
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
(AlphaZero pid=199655) self._initialize_buffer(r2_config["num_init_rewards"])
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
(AlphaZero pid=199655) mask = obs["action_mask"]
(AlphaZero pid=199655) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199655) 2023-05-16 01:35:49,449 ERROR actor_manager.py:507 -- Ray error, taking actor 2 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199799, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fcd0d1a0df0>)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
(AlphaZero pid=199655) self._update_policy_map(policy_dict=self.policy_dict)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
(AlphaZero pid=199655) self._build_policy_map(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
(AlphaZero pid=199655) new_policy = create_policy_for_framework(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
(AlphaZero pid=199655) return policy_class(observation_space, action_space, merged_config)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
(AlphaZero pid=199655) super().__init__(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
(AlphaZero pid=199655) self.env = self.env_creator()
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
(AlphaZero pid=199655) return env_cls(config["env_config"])
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
(AlphaZero pid=199655) self._initialize_buffer(r2_config["num_init_rewards"])
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
(AlphaZero pid=199655) mask = obs["action_mask"]
(AlphaZero pid=199655) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199655) 2023-05-16 01:35:49,451 ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in _setup
(AlphaZero pid=199655) self.add_workers(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 635, in add_workers
(AlphaZero pid=199655) raise result.get()
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/actor_manager.py", line 488, in __fetch_result
(AlphaZero pid=199655) result = ray.get(r)
(AlphaZero pid=199655) ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199798, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0da1f57d30>)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
(AlphaZero pid=199655) self._update_policy_map(policy_dict=self.policy_dict)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
(AlphaZero pid=199655) self._build_policy_map(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
(AlphaZero pid=199655) new_policy = create_policy_for_framework(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
(AlphaZero pid=199655) return policy_class(observation_space, action_space, merged_config)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
(AlphaZero pid=199655) super().__init__(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
(AlphaZero pid=199655) self.env = self.env_creator()
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
(AlphaZero pid=199655) return env_cls(config["env_config"])
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
(AlphaZero pid=199655) self._initialize_buffer(r2_config["num_init_rewards"])
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
(AlphaZero pid=199655) mask = obs["action_mask"]
(AlphaZero pid=199655) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199655)
(AlphaZero pid=199655) During handling of the above exception, another exception occurred:
(AlphaZero pid=199655)
(AlphaZero pid=199655) ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 466, in __init__
(AlphaZero pid=199655) super().__init__(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__
(AlphaZero pid=199655) self.setup(copy.deepcopy(self.config))
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 592, in setup
(AlphaZero pid=199655) self.workers = WorkerSet(
(AlphaZero pid=199655) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__
(AlphaZero pid=199655) raise e.args[0].args[2]
(AlphaZero pid=199655) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(RolloutWorker pid=199798) 2023-05-16 01:35:49,424 ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199798, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0da1f57d30>)
(AlphaZero pid=199916) 2023-05-16 01:35:53,752 WARNING algorithm_config.py:635 -- Cannot create AlphaZeroConfig from given `config_dict`! Property __stdout_file__ not supported.
(AlphaZero pid=199916) 2023-05-16 01:35:54,004 INFO algorithm.py:527 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
2023-05-16 01:35:58,712 ERROR trial_runner.py:1450 -- Trial AlphaZero_CartPole-v1_34201_00001: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.tune.error._TuneNoNextExecutorEventError: Traceback (most recent call last):
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/execution/ray_trial_executor.py", line 1231, in get_next_executor_event
future_result = ray.get(ready_future)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/worker.py", line 2523, in get
raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::AlphaZero.__init__() (pid=199916, ip=192.168.18.9, repr=AlphaZero)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in _setup
self.add_workers(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 635, in add_workers
raise result.get()
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/actor_manager.py", line 488, in __fetch_result
result = ray.get(r)
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199970, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f988c1c4c40>)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
self._update_policy_map(policy_dict=self.policy_dict)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
self._build_policy_map(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
new_policy = create_policy_for_framework(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
super().__init__(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
self.env = self.env_creator()
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
return env_cls(config["env_config"])
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
self._initialize_buffer(r2_config["num_init_rewards"])
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
mask = obs["action_mask"]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
During handling of the above exception, another exception occurred:
ray::AlphaZero.__init__() (pid=199916, ip=192.168.18.9, repr=AlphaZero)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 466, in __init__
super().__init__(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__
self.setup(copy.deepcopy(self.config))
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 592, in setup
self.workers = WorkerSet(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__
raise e.args[0].args[2]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
<IPython.core.display.HTML object>
(AlphaZero pid=199916) 2023-05-16 01:35:58,696 ERROR actor_manager.py:507 -- Ray error, taking actor 2 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199971, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fccc9d5ea60>) [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(AlphaZero pid=199916) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__ [repeated 23x across cluster]
(AlphaZero pid=199916) self._update_policy_map(policy_dict=self.policy_dict) [repeated 5x across cluster]
(AlphaZero pid=199916) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map [repeated 5x across cluster]
(AlphaZero pid=199916) self._build_policy_map( [repeated 5x across cluster]
(AlphaZero pid=199916) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map [repeated 5x across cluster]
(AlphaZero pid=199916) new_policy = create_policy_for_framework( [repeated 5x across cluster]
(AlphaZero pid=199916) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework [repeated 5x across cluster]
(AlphaZero pid=199916) return policy_class(observation_space, action_space, merged_config) [repeated 5x across cluster]
(AlphaZero pid=199916) super().__init__( [repeated 6x across cluster]
(AlphaZero pid=199916) self.env = self.env_creator() [repeated 5x across cluster]
(AlphaZero pid=199916) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator [repeated 5x across cluster]
(AlphaZero pid=199916) return env_cls(config["env_config"]) [repeated 5x across cluster]
(AlphaZero pid=199916) [repeated 7x across cluster]
(AlphaZero pid=199916) File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer [repeated 5x across cluster]
(AlphaZero pid=199916) mask = obs["action_mask"] [repeated 5x across cluster]
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices [repeated 6x across cluster]
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(RolloutWorker pid=199799) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(RolloutWorker pid=199970) 2023-05-16 01:35:58,656 ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199970, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f988c1c4c40>)
2023-05-16 01:35:58,725 ERROR trial_runner.py:1450 -- Trial AlphaZero_CartPole-v1_34201_00000: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.tune.error._TuneNoNextExecutorEventError: Traceback (most recent call last):
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/execution/ray_trial_executor.py", line 1231, in get_next_executor_event
future_result = ray.get(ready_future)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/worker.py", line 2523, in get
raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in _setup
self.add_workers(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 635, in add_workers
raise result.get()
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/actor_manager.py", line 488, in __fetch_result
result = ray.get(r)
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199798, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0da1f57d30>)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
self._update_policy_map(policy_dict=self.policy_dict)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
self._build_policy_map(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
new_policy = create_policy_for_framework(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
super().__init__(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
self.env = self.env_creator()
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
return env_cls(config["env_config"])
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
self._initialize_buffer(r2_config["num_init_rewards"])
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
mask = obs["action_mask"]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
During handling of the above exception, another exception occurred:
ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 466, in __init__
super().__init__(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__
self.setup(copy.deepcopy(self.config))
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 592, in setup
self.workers = WorkerSet(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__
raise e.args[0].args[2]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
2023-05-16 01:35:58,733 ERROR ray_trial_executor.py:883 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/execution/ray_trial_executor.py", line 874, in _resolve_stop_event
ray.get(future, timeout=timeout)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/worker.py", line 2523, in get
raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in _setup
self.add_workers(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 635, in add_workers
raise result.get()
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/actor_manager.py", line 488, in __fetch_result
result = ray.get(r)
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199798, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0da1f57d30>)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
self._update_policy_map(policy_dict=self.policy_dict)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
self._build_policy_map(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
new_policy = create_policy_for_framework(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
super().__init__(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
self.env = self.env_creator()
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
return env_cls(config["env_config"])
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
self._initialize_buffer(r2_config["num_init_rewards"])
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
mask = obs["action_mask"]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
During handling of the above exception, another exception occurred:
ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 466, in __init__
super().__init__(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__
self.setup(copy.deepcopy(self.config))
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 592, in setup
self.workers = WorkerSet(
File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__
raise e.args[0].args[2]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
2023-05-16 01:35:58,735 ERROR tune.py:941 -- Trials did not complete: [AlphaZero_CartPole-v1_34201_00000, AlphaZero_CartPole-v1_34201_00001]
2023-05-16 01:35:58,735 INFO tune.py:945 -- Total run time: 17.79 seconds (17.76 seconds for the tuning loop).
I’m using:
-
Ray 2.4.0 (Installed the default, air, tune, rllib and serve via pip following the instructions here without any errors
-
Gymnasium 0.28.1 (I need it for compatibility reasons with VizDoom 1.2.0)
-
Spyder IDE running in Anaconda, using Python 3.9
Things I have tried:
- Using a different gymnasium version
- Using gym library instead of gymnasium
- Modifying the enviroment (importing gymnasium and creating the same enviroment with
env.make
- Using other algorithms; the PPO one seems to work fine.
What’s happening with the AlphaZero algorithm? I need it for my thesis. Thanks in advance.