Error in AlphaZero algorithm: The actor died because of an error raised in its creation task

I’m trying to run the AlphaZero examples shown here: https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#alphazero and when I try to execute them I get the following error:

runfile('/home/joaquin/TFM/Doom_RL/RLlib_test.py', wdir='/home/joaquin/TFM/Doom_RL')
True
2023-05-16 01:35:36,559	INFO worker.py:1616 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
2023-05-16 01:35:40,942	INFO tune.py:218 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
<IPython.core.display.HTML object>
(raylet) E0516 01:35:40.985175255  199150 fork_posix.cc:76]           Other threads are currently calling into gRPC, skipping fork() handlers
(AlphaZero pid=199655) 2023-05-16 01:35:44,420	WARNING algorithm_config.py:635 -- Cannot create AlphaZeroConfig from given `config_dict`! Property __stdout_file__ not supported.
(AlphaZero pid=199655) 2023-05-16 01:35:44,667	INFO algorithm.py:527 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(AlphaZero pid=199655) 2023-05-16 01:35:49,448	ERROR actor_manager.py:507 -- Ray error, taking actor 1 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199798, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0da1f57d30>)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
(AlphaZero pid=199655)     self._update_policy_map(policy_dict=self.policy_dict)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
(AlphaZero pid=199655)     self._build_policy_map(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
(AlphaZero pid=199655)     new_policy = create_policy_for_framework(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
(AlphaZero pid=199655)     return policy_class(observation_space, action_space, merged_config)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
(AlphaZero pid=199655)     super().__init__(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
(AlphaZero pid=199655)     self.env = self.env_creator()
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
(AlphaZero pid=199655)     return env_cls(config["env_config"])
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
(AlphaZero pid=199655)     self._initialize_buffer(r2_config["num_init_rewards"])
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
(AlphaZero pid=199655)     mask = obs["action_mask"]
(AlphaZero pid=199655) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199655) 2023-05-16 01:35:49,449	ERROR actor_manager.py:507 -- Ray error, taking actor 2 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199799, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fcd0d1a0df0>)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
(AlphaZero pid=199655)     self._update_policy_map(policy_dict=self.policy_dict)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
(AlphaZero pid=199655)     self._build_policy_map(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
(AlphaZero pid=199655)     new_policy = create_policy_for_framework(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
(AlphaZero pid=199655)     return policy_class(observation_space, action_space, merged_config)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
(AlphaZero pid=199655)     super().__init__(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
(AlphaZero pid=199655)     self.env = self.env_creator()
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
(AlphaZero pid=199655)     return env_cls(config["env_config"])
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
(AlphaZero pid=199655)     self._initialize_buffer(r2_config["num_init_rewards"])
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
(AlphaZero pid=199655)     mask = obs["action_mask"]
(AlphaZero pid=199655) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199655) 2023-05-16 01:35:49,451	ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in _setup
(AlphaZero pid=199655)     self.add_workers(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 635, in add_workers
(AlphaZero pid=199655)     raise result.get()
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/actor_manager.py", line 488, in __fetch_result
(AlphaZero pid=199655)     result = ray.get(r)
(AlphaZero pid=199655) ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199798, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0da1f57d30>)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
(AlphaZero pid=199655)     self._update_policy_map(policy_dict=self.policy_dict)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
(AlphaZero pid=199655)     self._build_policy_map(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
(AlphaZero pid=199655)     new_policy = create_policy_for_framework(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
(AlphaZero pid=199655)     return policy_class(observation_space, action_space, merged_config)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
(AlphaZero pid=199655)     super().__init__(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
(AlphaZero pid=199655)     self.env = self.env_creator()
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
(AlphaZero pid=199655)     return env_cls(config["env_config"])
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
(AlphaZero pid=199655)     self._initialize_buffer(r2_config["num_init_rewards"])
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
(AlphaZero pid=199655)     mask = obs["action_mask"]
(AlphaZero pid=199655) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199655) 
(AlphaZero pid=199655) During handling of the above exception, another exception occurred:
(AlphaZero pid=199655) 
(AlphaZero pid=199655) ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 466, in __init__
(AlphaZero pid=199655)     super().__init__(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__
(AlphaZero pid=199655)     self.setup(copy.deepcopy(self.config))
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 592, in setup
(AlphaZero pid=199655)     self.workers = WorkerSet(
(AlphaZero pid=199655)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__
(AlphaZero pid=199655)     raise e.args[0].args[2]
(AlphaZero pid=199655) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(RolloutWorker pid=199798) 2023-05-16 01:35:49,424	ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199798, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0da1f57d30>)
(AlphaZero pid=199916) 2023-05-16 01:35:53,752	WARNING algorithm_config.py:635 -- Cannot create AlphaZeroConfig from given `config_dict`! Property __stdout_file__ not supported.
(AlphaZero pid=199916) 2023-05-16 01:35:54,004	INFO algorithm.py:527 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
2023-05-16 01:35:58,712	ERROR trial_runner.py:1450 -- Trial AlphaZero_CartPole-v1_34201_00001: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.tune.error._TuneNoNextExecutorEventError: Traceback (most recent call last):
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/execution/ray_trial_executor.py", line 1231, in get_next_executor_event
    future_result = ray.get(ready_future)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::AlphaZero.__init__() (pid=199916, ip=192.168.18.9, repr=AlphaZero)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in _setup
    self.add_workers(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 635, in add_workers
    raise result.get()
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/actor_manager.py", line 488, in __fetch_result
    result = ray.get(r)
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199970, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f988c1c4c40>)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
    self._build_policy_map(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
    new_policy = create_policy_for_framework(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
    super().__init__(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
    self.env = self.env_creator()
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
    return env_cls(config["env_config"])
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
    self._initialize_buffer(r2_config["num_init_rewards"])
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
    mask = obs["action_mask"]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

During handling of the above exception, another exception occurred:

ray::AlphaZero.__init__() (pid=199916, ip=192.168.18.9, repr=AlphaZero)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 466, in __init__
    super().__init__(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 592, in setup
    self.workers = WorkerSet(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__
    raise e.args[0].args[2]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

<IPython.core.display.HTML object>
(AlphaZero pid=199916) 2023-05-16 01:35:58,696	ERROR actor_manager.py:507 -- Ray error, taking actor 2 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199971, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7fccc9d5ea60>) [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(AlphaZero pid=199916)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__ [repeated 23x across cluster]
(AlphaZero pid=199916)     self._update_policy_map(policy_dict=self.policy_dict) [repeated 5x across cluster]
(AlphaZero pid=199916)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map [repeated 5x across cluster]
(AlphaZero pid=199916)     self._build_policy_map( [repeated 5x across cluster]
(AlphaZero pid=199916)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map [repeated 5x across cluster]
(AlphaZero pid=199916)     new_policy = create_policy_for_framework( [repeated 5x across cluster]
(AlphaZero pid=199916)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework [repeated 5x across cluster]
(AlphaZero pid=199916)     return policy_class(observation_space, action_space, merged_config) [repeated 5x across cluster]
(AlphaZero pid=199916)     super().__init__( [repeated 6x across cluster]
(AlphaZero pid=199916)     self.env = self.env_creator() [repeated 5x across cluster]
(AlphaZero pid=199916)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator [repeated 5x across cluster]
(AlphaZero pid=199916)     return env_cls(config["env_config"]) [repeated 5x across cluster]
(AlphaZero pid=199916)  [repeated 7x across cluster]
(AlphaZero pid=199916)   File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer [repeated 5x across cluster]
(AlphaZero pid=199916)     mask = obs["action_mask"] [repeated 5x across cluster]
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices [repeated 6x across cluster]
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(AlphaZero pid=199916) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(RolloutWorker pid=199799) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
(RolloutWorker pid=199970) 2023-05-16 01:35:58,656	ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199970, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f988c1c4c40>)
2023-05-16 01:35:58,725	ERROR trial_runner.py:1450 -- Trial AlphaZero_CartPole-v1_34201_00000: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.tune.error._TuneNoNextExecutorEventError: Traceback (most recent call last):
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/execution/ray_trial_executor.py", line 1231, in get_next_executor_event
    future_result = ray.get(ready_future)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in _setup
    self.add_workers(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 635, in add_workers
    raise result.get()
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/actor_manager.py", line 488, in __fetch_result
    result = ray.get(r)
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199798, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0da1f57d30>)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
    self._build_policy_map(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
    new_policy = create_policy_for_framework(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
    super().__init__(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
    self.env = self.env_creator()
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
    return env_cls(config["env_config"])
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
    self._initialize_buffer(r2_config["num_init_rewards"])
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
    mask = obs["action_mask"]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

During handling of the above exception, another exception occurred:

ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 466, in __init__
    super().__init__(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 592, in setup
    self.workers = WorkerSet(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__
    raise e.args[0].args[2]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

2023-05-16 01:35:58,733	ERROR ray_trial_executor.py:883 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/execution/ray_trial_executor.py", line 874, in _resolve_stop_event
    ray.get(future, timeout=timeout)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/_private/worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in _setup
    self.add_workers(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 635, in add_workers
    raise result.get()
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/actor_manager.py", line 488, in __fetch_result
    result = ray.get(r)
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=199798, ip=192.168.18.9, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0da1f57d30>)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
    self._build_policy_map(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
    new_policy = create_policy_for_framework(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 323, in __init__
    super().__init__(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero_policy.py", line 38, in __init__
    self.env = self.env_creator()
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/alpha_zero.py", line 313, in _env_creator
    return env_cls(config["env_config"])
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 43, in __init__
    self._initialize_buffer(r2_config["num_init_rewards"])
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/alpha_zero/ranked_rewards.py", line 51, in _initialize_buffer
    mask = obs["action_mask"]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

During handling of the above exception, another exception occurred:

ray::AlphaZero.__init__() (pid=199655, ip=192.168.18.9, repr=AlphaZero)
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 466, in __init__
    super().__init__(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 592, in setup
    self.workers = WorkerSet(
  File "/home/joaquin/anaconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__
    raise e.args[0].args[2]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

2023-05-16 01:35:58,735	ERROR tune.py:941 -- Trials did not complete: [AlphaZero_CartPole-v1_34201_00000, AlphaZero_CartPole-v1_34201_00001]
2023-05-16 01:35:58,735	INFO tune.py:945 -- Total run time: 17.79 seconds (17.76 seconds for the tuning loop).

I’m using:

  • Ray 2.4.0 (Installed the default, air, tune, rllib and serve via pip following the instructions here without any errors

  • Gymnasium 0.28.1 (I need it for compatibility reasons with VizDoom 1.2.0)

  • Spyder IDE running in Anaconda, using Python 3.9

Things I have tried:

  • Using a different gymnasium version
  • Using gym library instead of gymnasium
  • Modifying the enviroment (importing gymnasium and creating the same enviroment with env.make
  • Using other algorithms; the PPO one seems to work fine.

What’s happening with the AlphaZero algorithm? I need it for my thesis. Thanks in advance.

Hi, we’re currently upgrading to gymnasium 0.28.1 in this PR:[RLlib]: bump Gymnasium to 0.28.1 by Rohan138 · Pull Request #35698 · ray-project/ray · GitHub

Once that’s merged, could you try again using the Ray nightly wheel? Should be a few days, thanks in advance.