[RLlib] Assertion error on connect four with action masking

george_sk · June 16, 2023, 4:53pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hello,
I am using the connect four environmnet from pettingzoo and I try to implement action masking.
I am getting the following error:

File “C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py”, line 194, in init
raise e.args[0].args[2]
AssertionError

I am using ray 2.4.0, torch 1.13.1, pettingzoo 1.22.3 and supersuit 3.7.1.

My code is:


from ray import air, tune
from ray.tune.registry import register_env
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from ray.rllib.examples.models.action_mask_model import TorchActionMaskModel
from ray.rllib.models import ModelCatalog
from pettingzoo.classic import connect_four_v3
import gymnasium
import supersuit
import numpy as np

if __name__ == "__main__":

    def convert_box(convert_obs_fn, old_box):
        new_low = convert_obs_fn(old_box.low)
        new_high = convert_obs_fn(old_box.high)
        return gymnasium.spaces.Box(low=new_low, high=new_high, dtype=new_low.dtype)

    # Changes observation space from (6,7,2) to (84,)
    def flattened_pettingzoo_env():
        my_env = connect_four_v3.env()
        shape = my_env.observation_space(my_env.possible_agents[0])["observation"].shape
        newshape = np.product(shape).reshape(1)

        def change_obs_space_fn(obs_space):
            obs_space["observation"] = convert_box(
                lambda obs: obs.reshape(newshape), old_box=obs_space["observation"]
            )
            return obs_space

        def change_observation_fn(observation, old_obs_space):
            observation["observation"] = observation["observation"].reshape(newshape)
            return observation

        my_env = supersuit.lambda_wrappers.observation_lambda_v0(
            my_env,
            change_obs_space_fn=change_obs_space_fn,
            change_observation_fn=change_observation_fn,
        )
        return my_env


    def env_creator(args):
        return PettingZooEnv(flattened_pettingzoo_env())


    env = env_creator({})
    register_env("connect_four", env_creator)

    ModelCatalog.register_custom_model(
        "am_model", TorchActionMaskModel)

    obs_space = env.observation_space
    act_spc = env.action_space

    policies = {"shared_policy_1": (None, obs_space, act_spc, {}),
                "shared_policy_2": (None, obs_space, act_spc, {})
                }

    policy_ids = list(policies.keys())


    def policy_mapping_fn(agent_id, episode, worker, **kwargs):
        if agent_id == "player_0":
            return "shared_policy_1"
        else:
            return "shared_policy_2"


    config = (
        PPOConfig()
            .environment("connect_four")
            .resources(num_gpus=0, num_cpus_for_local_worker=2)
            .rollouts(num_rollout_workers=4)  # default = 2 (I should try it)
            .framework("torch")
            .training(model={
                "custom_model": "am_model"}, )
            .multi_agent(
            policies=policies,
            policy_mapping_fn=policy_mapping_fn,
        )
    )

    tune.Tuner(
        "PPO",
        run_config=air.RunConfig(
            name="c4 baseline masked ppo trial 1",
            stop={"training_iteration": 1500},
            checkpoint_config=air.CheckpointConfig(
                checkpoint_frequency=100,
            ),
        ),
        param_space=config.to_dict(),
    ).fit()

I am following the cartpole example. Please let me know if I am doing something wrong, because I cannot understand why the error is happening.
Thanks in advance.
The full error is:

C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\python.exe C:/Users/s315635/PycharmProjects/ray_2_4_0_tCuda_older/connect4_masked.py
2023-06-16 17:42:27,461	INFO worker.py:1616 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
2023-06-16 17:42:30,901	INFO tune.py:218 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
(pid=17856) 
(PPO pid=14776) 2023-06-16 17:42:37,424	WARNING algorithm_config.py:635 -- Cannot create PPOConfig from given `config_dict`! Property __stdout_file__ not supported.
(PPO pid=14776) 2023-06-16 17:42:37,530	INFO algorithm.py:527 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=22120) 
(pid=30408) 
(RolloutWorker pid=22904) 2023-06-16 17:42:45,814	WARNING env.py:285 -- Your MultiAgentEnv <PettingZooEnv instance> does not have some or all of the needed base-class attributes! Make sure you call `super().__init__()` from within your MutiAgentEnv's constructor. This will raise an error in the future.
(RolloutWorker pid=31224) 2023-06-16 17:42:46,559	ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=31224, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000001546E1C6940>)
(RolloutWorker pid=31224)   File "python\ray\_raylet.pyx", line 877, in ray._raylet.execute_task
(RolloutWorker pid=31224)   File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task
(RolloutWorker pid=31224)   File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor
(RolloutWorker pid=31224)     return method(__ray_actor, *args, **kwargs)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
(RolloutWorker pid=31224)     return method(self, *_args, **_kwargs)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 738, in __init__
(RolloutWorker pid=31224)     self._update_policy_map(policy_dict=self.policy_dict)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
(RolloutWorker pid=31224)     return method(self, *_args, **_kwargs)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1985, in _update_policy_map
(RolloutWorker pid=31224)     self._build_policy_map(
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
(RolloutWorker pid=31224)     return method(self, *_args, **_kwargs)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 2097, in _build_policy_map
(RolloutWorker pid=31224)     new_policy = create_policy_for_framework(
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\policy.py", line 142, in create_policy_for_framework
(RolloutWorker pid=31224)     return policy_class(observation_space, action_space, merged_config)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 51, in __init__
(RolloutWorker pid=31224)     TorchPolicyV2.__init__(
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 93, in __init__
(RolloutWorker pid=31224)     model, dist_class = self._init_model_and_dist_class()
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 484, in _init_model_and_dist_class
(RolloutWorker pid=31224)     model = ModelCatalog.get_model_v2(
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\models\catalog.py", line 606, in get_model_v2
(RolloutWorker pid=31224)     instance = model_cls(
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\examples\models\action_mask_model.py", line 82, in __init__
(RolloutWorker pid=31224)     assert (
(RolloutWorker pid=31224) AssertionError
(PPO pid=14776) 2023-06-16 17:42:46,579	ERROR actor_manager.py:507 -- Ray error, taking actor 1 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=22904, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000023B2BBE6940>)
(PPO pid=14776) 2023-06-16 17:42:46,580	ERROR actor_manager.py:507 -- Ray error, taking actor 2 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=29288, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002A752C36940>)
(PPO pid=14776) 2023-06-16 17:42:46,581	ERROR actor_manager.py:507 -- Ray error, taking actor 3 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=31224, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000001546E1C6940>)
(PPO pid=14776) 2023-06-16 17:42:46,582	ERROR actor_manager.py:507 -- Ray error, taking actor 4 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=30144, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000021701A968B0>)
== Status ==
Current time: 2023-06-16 17:42:46 (running for 00:00:15.67)
Using FIFO scheduling algorithm.
Logical resource usage: 6.0/12 CPUs, 0/1 GPUs
Result logdir: C:\Users\s315635\ray_results\c4 baseline masked ppo trial 1
Number of trials: 1/1 (1 RUNNING)
+------------------------------+----------+-------+
| Trial name                   | status   | loc   |
|------------------------------+----------+-------|
| PPO_connect_four_c954d_00000 | RUNNING  |       |
+------------------------------+----------+-------+


(PPO pid=14776) 2023-06-16 17:42:46,591	ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 242, in _setup
(PPO pid=14776)     self.add_workers(
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 635, in add_workers
(PPO pid=14776)     raise result.get()
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\actor_manager.py", line 488, in __fetch_result
(PPO pid=14776)     result = ray.get(r)
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
(PPO pid=14776)     return func(*args, **kwargs)
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\worker.py", line 2523, in get
(PPO pid=14776)     raise value
(PPO pid=14776) ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=22904, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000023B2BBE6940>)
(PPO pid=14776) During handling of the above exception, another exception occurred:
(PPO pid=14776) ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
(PPO pid=14776)     super().__init__(
(PPO pid=14776)     self.setup(copy.deepcopy(self.config))
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 592, in setup
(PPO pid=14776)     self.workers = WorkerSet(
(PPO pid=14776)     raise e.args[0].args[2]
2023-06-16 17:42:46,612	ERROR trial_runner.py:1450 -- Trial PPO_connect_four_c954d_00000: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.tune.error._TuneNoNextExecutorEventError: Traceback (most recent call last):
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\tune\execution\ray_trial_executor.py", line 1231, in get_next_executor_event
    future_result = ray.get(ready_future)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 242, in _setup
    self.add_workers(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 635, in add_workers
    raise result.get()
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\actor_manager.py", line 488, in __fetch_result
    result = ray.get(r)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=22904, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000023B2BBE6940>)
  File "python\ray\_raylet.pyx", line 877, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 738, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1985, in _update_policy_map
    self._build_policy_map(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 2097, in _build_policy_map
    new_policy = create_policy_for_framework(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\policy.py", line 142, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 51, in __init__
    TorchPolicyV2.__init__(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 93, in __init__
    model, dist_class = self._init_model_and_dist_class()
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 484, in _init_model_and_dist_class
    model = ModelCatalog.get_model_v2(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\models\catalog.py", line 606, in get_model_v2
    instance = model_cls(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\examples\models\action_mask_model.py", line 82, in __init__
    assert (
AssertionError

During handling of the above exception, another exception occurred:

ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
  File "python\ray\_raylet.pyx", line 870, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 921, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 877, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 466, in __init__
    super().__init__(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\tune\trainable\trainable.py", line 169, in __init__
    self.setup(copy.deepcopy(self.config))
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 592, in setup
    self.workers = WorkerSet(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 194, in __init__
    raise e.args[0].args[2]
AssertionError

Result for PPO_connect_four_c954d_00000:
  trial_id: c954d_00000
  
== Status ==
Current time: 2023-06-16 17:42:46 (running for 00:00:15.69)
Using FIFO scheduling algorithm.
Logical resource usage: 0/12 CPUs, 0/1 GPUs
Result logdir: C:\Users\s315635\ray_results\c4 baseline masked ppo trial 1
Number of trials: 1/1 (1 ERROR)
+------------------------------+----------+-------+
| Trial name                   | status   | loc   |
|------------------------------+----------+-------|
| PPO_connect_four_c954d_00000 | ERROR    |       |
+------------------------------+----------+-------+
Number of errored trials: 1
+------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------+
| Trial name                   |   # failures | error file                                                                                                               |
|------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------|
| PPO_connect_four_c954d_00000 |            1 | C:\Users\s315635\ray_results\c4 baseline masked ppo trial 1\PPO_connect_four_c954d_00000_0_2023-06-16_17-42-31\error.txt |
+------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------+

2023-06-16 17:42:46,634	ERROR ray_trial_executor.py:883 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\tune\execution\ray_trial_executor.py", line 874, in _resolve_stop_event
    ray.get(future, timeout=timeout)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 242, in _setup
    self.add_workers(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 635, in add_workers
    raise result.get()
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\actor_manager.py", line 488, in __fetch_result
    result = ray.get(r)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=22904, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000023B2BBE6940>)
  File "python\ray\_raylet.pyx", line 877, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 738, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1985, in _update_policy_map
    self._build_policy_map(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 2097, in _build_policy_map
    new_policy = create_policy_for_framework(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\policy.py", line 142, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 51, in __init__
    TorchPolicyV2.__init__(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 93, in __init__
    model, dist_class = self._init_model_and_dist_class()
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 484, in _init_model_and_dist_class
    model = ModelCatalog.get_model_v2(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\models\catalog.py", line 606, in get_model_v2
    instance = model_cls(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\examples\models\action_mask_model.py", line 82, in __init__
    assert (
AssertionError

During handling of the above exception, another exception occurred:

ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
  File "python\ray\_raylet.pyx", line 870, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 921, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 877, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 466, in __init__
    super().__init__(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\tune\trainable\trainable.py", line 169, in __init__
    self.setup(copy.deepcopy(self.config))
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 592, in setup
    self.workers = WorkerSet(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 194, in __init__
    raise e.args[0].args[2]
AssertionError

2023-06-16 17:42:46,649	ERROR tune.py:941 -- Trials did not complete: [PPO_connect_four_c954d_00000]
2023-06-16 17:42:46,649	INFO tune.py:945 -- Total run time: 15.75 seconds (15.69 seconds for the tuning loop).
== Status ==
Current time: 2023-06-16 17:42:46 (running for 00:00:15.69)
Using FIFO scheduling algorithm.
Logical resource usage: 0/12 CPUs, 0/1 GPUs
Result logdir: C:\Users\s315635\ray_results\c4 baseline masked ppo trial 1
Number of trials: 1/1 (1 ERROR)
+------------------------------+----------+-------+
| Trial name                   | status   | loc   |
|------------------------------+----------+-------|
| PPO_connect_four_c954d_00000 | ERROR    |       |
+------------------------------+----------+-------+
Number of errored trials: 1
+------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------+
| Trial name                   |   # failures | error file                                                                                                               |
|------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------|
| PPO_connect_four_c954d_00000 |            1 | C:\Users\s315635\ray_results\c4 baseline masked ppo trial 1\PPO_connect_four_c954d_00000_0_2023-06-16_17-42-31\error.txt |
+------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------+

(RolloutWorker pid=30144)  [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(RolloutWorker pid=30144) 2023-06-16 17:42:46,562	ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=30144, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000021701A968B0>) [repeated 3x across cluster]
(PPO pid=14776)   File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task [repeated 20x across cluster]
(PPO pid=14776)   File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor [repeated 9x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor [repeated 9x across cluster]
(PPO pid=14776)     return method(__ray_actor, *args, **kwargs) [repeated 9x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span [repeated 26x across cluster]
(PPO pid=14776)     return method(self, *_args, **_kwargs) [repeated 26x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 194, in __init__ [repeated 35x across cluster]
(PPO pid=14776)     self._update_policy_map(policy_dict=self.policy_dict) [repeated 8x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1985, in _update_policy_map [repeated 8x across cluster]
(PPO pid=14776)     self._build_policy_map( [repeated 8x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 2097, in _build_policy_map [repeated 8x across cluster]
(PPO pid=14776)     new_policy = create_policy_for_framework( [repeated 8x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\policy.py", line 142, in create_policy_for_framework [repeated 8x across cluster]
(PPO pid=14776)     return policy_class(observation_space, action_space, merged_config) [repeated 8x across cluster]
(PPO pid=14776)  [repeated 10x across cluster]
(PPO pid=14776)     model, dist_class = self._init_model_and_dist_class() [repeated 8x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 484, in _init_model_and_dist_class [repeated 8x across cluster]
(PPO pid=14776)     model = ModelCatalog.get_model_v2( [repeated 8x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\models\catalog.py", line 606, in get_model_v2 [repeated 8x across cluster]
(PPO pid=14776)     instance = model_cls( [repeated 8x across cluster]
(PPO pid=14776)     assert ( [repeated 8x across cluster]
(PPO pid=14776) AssertionError [repeated 9x across cluster]

Process finished with exit code 0

kourosh · June 16, 2023, 5:29pm

It seems like the error is coming from these lines of code:

github.com

ray-project/ray/blob/master/rllib/examples/models/action_mask_model.py#L28-L32


      
          assert (
              isinstance(orig_space, Dict)
              and "action_mask" in orig_space.spaces
              and "observations" in orig_space.spaces
          )

So the problem seems to be the observation space def. on the environment side.

PhilippWillms · June 16, 2023, 6:59pm

Adding up to @kourosh’s comment: Recall that in that way action masking is implemented, the observation or state space of the environment needs to be a nested dictionary.

self.observation_space = gym.spaces.Dict(
                {
                    "action_mask": ....
                    "observations": ....
}

george_sk · June 17, 2023, 7:33pm

Hi!, @PhilippWillms @kourosh

Yes I know that and the observation space is a dict if you look here:

github.com

Farama-Foundation/PettingZoo/blob/bbf00ccacc6dea3a1f746f885b8f871c2890bfe0/pettingzoo/classic/connect_four/connect_four.py#L117


      
          # flat representation in row major order
          self.screen = None
          self.render_mode = render_mode
          
          
self.board = [0] * (6 * 7)
          
          
self.agents = ["player_0", "player_1"]
          self.possible_agents = self.agents[:]
          
          
self.action_spaces = {i: spaces.Discrete(7) for i in self.agents}
          self.observation_spaces = {
              i: spaces.Dict(
                  {
                      "observation": spaces.Box(
                          low=0, high=1, shape=(6, 7, 2), dtype=np.int8
                      ),
                      "action_mask": spaces.Box(low=0, high=1, shape=(7,), dtype=np.int8),
                  }
              )
              for i in self.agents
          }

So that’s why it seems weird to me to have this error.
If you don’t have any ideas I suppose I will upload it on github.
Thanks,
George

mannyv · June 17, 2023, 11:53pm

@george_sk,

Your key is singular, "observation". The model is looking for "observations".

george_sk · June 20, 2023, 9:44am

Thanks @mannyv i didn’t notice the typo,

I removed the s and now it works. I will inform them in Pettingzoo in case they want to change it so that it is compatible.

Topic		Replies	Views
Simple multi agent setup with action masking problems RLlib	1	264	June 3, 2025
Issue creating custom action mask enviorment RLlib	14	2221	October 11, 2023
Action mask works in Petting Zoo tests but does not while training with rllib RLlib	2	111	June 10, 2024
Action masking Problem RLlib	0	359	July 11, 2022
AttributeError: 'APPO' object has no attribute '_warmup_time' RLlib	1	840	March 22, 2023

[RLlib] Assertion error on connect four with action masking

Related topics