[RLlib] Assertion error on connect four with action masking

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hello,
I am using the connect four environmnet from pettingzoo and I try to implement action masking.
I am getting the following error:

File “C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py”, line 194, in init
raise e.args[0].args[2]
AssertionError

I am using ray 2.4.0, torch 1.13.1, pettingzoo 1.22.3 and supersuit 3.7.1.

My code is:


from ray import air, tune
from ray.tune.registry import register_env
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from ray.rllib.examples.models.action_mask_model import TorchActionMaskModel
from ray.rllib.models import ModelCatalog
from pettingzoo.classic import connect_four_v3
import gymnasium
import supersuit
import numpy as np

if __name__ == "__main__":

    def convert_box(convert_obs_fn, old_box):
        new_low = convert_obs_fn(old_box.low)
        new_high = convert_obs_fn(old_box.high)
        return gymnasium.spaces.Box(low=new_low, high=new_high, dtype=new_low.dtype)

    # Changes observation space from (6,7,2) to (84,)
    def flattened_pettingzoo_env():
        my_env = connect_four_v3.env()
        shape = my_env.observation_space(my_env.possible_agents[0])["observation"].shape
        newshape = np.product(shape).reshape(1)

        def change_obs_space_fn(obs_space):
            obs_space["observation"] = convert_box(
                lambda obs: obs.reshape(newshape), old_box=obs_space["observation"]
            )
            return obs_space

        def change_observation_fn(observation, old_obs_space):
            observation["observation"] = observation["observation"].reshape(newshape)
            return observation

        my_env = supersuit.lambda_wrappers.observation_lambda_v0(
            my_env,
            change_obs_space_fn=change_obs_space_fn,
            change_observation_fn=change_observation_fn,
        )
        return my_env


    def env_creator(args):
        return PettingZooEnv(flattened_pettingzoo_env())


    env = env_creator({})
    register_env("connect_four", env_creator)

    ModelCatalog.register_custom_model(
        "am_model", TorchActionMaskModel)

    obs_space = env.observation_space
    act_spc = env.action_space

    policies = {"shared_policy_1": (None, obs_space, act_spc, {}),
                "shared_policy_2": (None, obs_space, act_spc, {})
                }

    policy_ids = list(policies.keys())


    def policy_mapping_fn(agent_id, episode, worker, **kwargs):
        if agent_id == "player_0":
            return "shared_policy_1"
        else:
            return "shared_policy_2"


    config = (
        PPOConfig()
            .environment("connect_four")
            .resources(num_gpus=0, num_cpus_for_local_worker=2)
            .rollouts(num_rollout_workers=4)  # default = 2 (I should try it)
            .framework("torch")
            .training(model={
                "custom_model": "am_model"}, )
            .multi_agent(
            policies=policies,
            policy_mapping_fn=policy_mapping_fn,
        )
    )

    tune.Tuner(
        "PPO",
        run_config=air.RunConfig(
            name="c4 baseline masked ppo trial 1",
            stop={"training_iteration": 1500},
            checkpoint_config=air.CheckpointConfig(
                checkpoint_frequency=100,
            ),
        ),
        param_space=config.to_dict(),
    ).fit()

I am following the cartpole example. Please let me know if I am doing something wrong, because I cannot understand why the error is happening.
Thanks in advance.
The full error is:

C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\python.exe C:/Users/s315635/PycharmProjects/ray_2_4_0_tCuda_older/connect4_masked.py
2023-06-16 17:42:27,461	INFO worker.py:1616 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
2023-06-16 17:42:30,901	INFO tune.py:218 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
(pid=17856) 
(PPO pid=14776) 2023-06-16 17:42:37,424	WARNING algorithm_config.py:635 -- Cannot create PPOConfig from given `config_dict`! Property __stdout_file__ not supported.
(PPO pid=14776) 2023-06-16 17:42:37,530	INFO algorithm.py:527 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=22120) 
(pid=30408) 
(RolloutWorker pid=22904) 2023-06-16 17:42:45,814	WARNING env.py:285 -- Your MultiAgentEnv <PettingZooEnv instance> does not have some or all of the needed base-class attributes! Make sure you call `super().__init__()` from within your MutiAgentEnv's constructor. This will raise an error in the future.
(RolloutWorker pid=31224) 2023-06-16 17:42:46,559	ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=31224, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000001546E1C6940>)
(RolloutWorker pid=31224)   File "python\ray\_raylet.pyx", line 877, in ray._raylet.execute_task
(RolloutWorker pid=31224)   File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task
(RolloutWorker pid=31224)   File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor
(RolloutWorker pid=31224)     return method(__ray_actor, *args, **kwargs)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
(RolloutWorker pid=31224)     return method(self, *_args, **_kwargs)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 738, in __init__
(RolloutWorker pid=31224)     self._update_policy_map(policy_dict=self.policy_dict)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
(RolloutWorker pid=31224)     return method(self, *_args, **_kwargs)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1985, in _update_policy_map
(RolloutWorker pid=31224)     self._build_policy_map(
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
(RolloutWorker pid=31224)     return method(self, *_args, **_kwargs)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 2097, in _build_policy_map
(RolloutWorker pid=31224)     new_policy = create_policy_for_framework(
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\policy.py", line 142, in create_policy_for_framework
(RolloutWorker pid=31224)     return policy_class(observation_space, action_space, merged_config)
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 51, in __init__
(RolloutWorker pid=31224)     TorchPolicyV2.__init__(
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 93, in __init__
(RolloutWorker pid=31224)     model, dist_class = self._init_model_and_dist_class()
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 484, in _init_model_and_dist_class
(RolloutWorker pid=31224)     model = ModelCatalog.get_model_v2(
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\models\catalog.py", line 606, in get_model_v2
(RolloutWorker pid=31224)     instance = model_cls(
(RolloutWorker pid=31224)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\examples\models\action_mask_model.py", line 82, in __init__
(RolloutWorker pid=31224)     assert (
(RolloutWorker pid=31224) AssertionError
(PPO pid=14776) 2023-06-16 17:42:46,579	ERROR actor_manager.py:507 -- Ray error, taking actor 1 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=22904, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000023B2BBE6940>)
(PPO pid=14776) 2023-06-16 17:42:46,580	ERROR actor_manager.py:507 -- Ray error, taking actor 2 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=29288, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002A752C36940>)
(PPO pid=14776) 2023-06-16 17:42:46,581	ERROR actor_manager.py:507 -- Ray error, taking actor 3 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=31224, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000001546E1C6940>)
(PPO pid=14776) 2023-06-16 17:42:46,582	ERROR actor_manager.py:507 -- Ray error, taking actor 4 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=30144, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000021701A968B0>)
== Status ==
Current time: 2023-06-16 17:42:46 (running for 00:00:15.67)
Using FIFO scheduling algorithm.
Logical resource usage: 6.0/12 CPUs, 0/1 GPUs
Result logdir: C:\Users\s315635\ray_results\c4 baseline masked ppo trial 1
Number of trials: 1/1 (1 RUNNING)
+------------------------------+----------+-------+
| Trial name                   | status   | loc   |
|------------------------------+----------+-------|
| PPO_connect_four_c954d_00000 | RUNNING  |       |
+------------------------------+----------+-------+


(PPO pid=14776) 2023-06-16 17:42:46,591	ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 242, in _setup
(PPO pid=14776)     self.add_workers(
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 635, in add_workers
(PPO pid=14776)     raise result.get()
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\actor_manager.py", line 488, in __fetch_result
(PPO pid=14776)     result = ray.get(r)
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
(PPO pid=14776)     return func(*args, **kwargs)
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\worker.py", line 2523, in get
(PPO pid=14776)     raise value
(PPO pid=14776) ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=22904, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000023B2BBE6940>)
(PPO pid=14776) During handling of the above exception, another exception occurred:
(PPO pid=14776) ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
(PPO pid=14776)     super().__init__(
(PPO pid=14776)     self.setup(copy.deepcopy(self.config))
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 592, in setup
(PPO pid=14776)     self.workers = WorkerSet(
(PPO pid=14776)     raise e.args[0].args[2]
2023-06-16 17:42:46,612	ERROR trial_runner.py:1450 -- Trial PPO_connect_four_c954d_00000: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.tune.error._TuneNoNextExecutorEventError: Traceback (most recent call last):
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\tune\execution\ray_trial_executor.py", line 1231, in get_next_executor_event
    future_result = ray.get(ready_future)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 242, in _setup
    self.add_workers(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 635, in add_workers
    raise result.get()
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\actor_manager.py", line 488, in __fetch_result
    result = ray.get(r)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=22904, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000023B2BBE6940>)
  File "python\ray\_raylet.pyx", line 877, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 738, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1985, in _update_policy_map
    self._build_policy_map(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 2097, in _build_policy_map
    new_policy = create_policy_for_framework(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\policy.py", line 142, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 51, in __init__
    TorchPolicyV2.__init__(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 93, in __init__
    model, dist_class = self._init_model_and_dist_class()
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 484, in _init_model_and_dist_class
    model = ModelCatalog.get_model_v2(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\models\catalog.py", line 606, in get_model_v2
    instance = model_cls(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\examples\models\action_mask_model.py", line 82, in __init__
    assert (
AssertionError

During handling of the above exception, another exception occurred:

ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
  File "python\ray\_raylet.pyx", line 870, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 921, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 877, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 466, in __init__
    super().__init__(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\tune\trainable\trainable.py", line 169, in __init__
    self.setup(copy.deepcopy(self.config))
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 592, in setup
    self.workers = WorkerSet(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 194, in __init__
    raise e.args[0].args[2]
AssertionError

Result for PPO_connect_four_c954d_00000:
  trial_id: c954d_00000
  
== Status ==
Current time: 2023-06-16 17:42:46 (running for 00:00:15.69)
Using FIFO scheduling algorithm.
Logical resource usage: 0/12 CPUs, 0/1 GPUs
Result logdir: C:\Users\s315635\ray_results\c4 baseline masked ppo trial 1
Number of trials: 1/1 (1 ERROR)
+------------------------------+----------+-------+
| Trial name                   | status   | loc   |
|------------------------------+----------+-------|
| PPO_connect_four_c954d_00000 | ERROR    |       |
+------------------------------+----------+-------+
Number of errored trials: 1
+------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------+
| Trial name                   |   # failures | error file                                                                                                               |
|------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------|
| PPO_connect_four_c954d_00000 |            1 | C:\Users\s315635\ray_results\c4 baseline masked ppo trial 1\PPO_connect_four_c954d_00000_0_2023-06-16_17-42-31\error.txt |
+------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------+

2023-06-16 17:42:46,634	ERROR ray_trial_executor.py:883 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\tune\execution\ray_trial_executor.py", line 874, in _resolve_stop_event
    ray.get(future, timeout=timeout)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 242, in _setup
    self.add_workers(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 635, in add_workers
    raise result.get()
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\actor_manager.py", line 488, in __fetch_result
    result = ray.get(r)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\worker.py", line 2523, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=22904, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000023B2BBE6940>)
  File "python\ray\_raylet.pyx", line 877, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 738, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1985, in _update_policy_map
    self._build_policy_map(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 2097, in _build_policy_map
    new_policy = create_policy_for_framework(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\policy.py", line 142, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 51, in __init__
    TorchPolicyV2.__init__(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 93, in __init__
    model, dist_class = self._init_model_and_dist_class()
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 484, in _init_model_and_dist_class
    model = ModelCatalog.get_model_v2(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\models\catalog.py", line 606, in get_model_v2
    instance = model_cls(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\examples\models\action_mask_model.py", line 82, in __init__
    assert (
AssertionError

During handling of the above exception, another exception occurred:

ray::PPO.__init__() (pid=14776, ip=127.0.0.1, repr=PPO)
  File "python\ray\_raylet.pyx", line 870, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 921, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 877, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 466, in __init__
    super().__init__(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\tune\trainable\trainable.py", line 169, in __init__
    self.setup(copy.deepcopy(self.config))
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 592, in setup
    self.workers = WorkerSet(
  File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 194, in __init__
    raise e.args[0].args[2]
AssertionError

2023-06-16 17:42:46,649	ERROR tune.py:941 -- Trials did not complete: [PPO_connect_four_c954d_00000]
2023-06-16 17:42:46,649	INFO tune.py:945 -- Total run time: 15.75 seconds (15.69 seconds for the tuning loop).
== Status ==
Current time: 2023-06-16 17:42:46 (running for 00:00:15.69)
Using FIFO scheduling algorithm.
Logical resource usage: 0/12 CPUs, 0/1 GPUs
Result logdir: C:\Users\s315635\ray_results\c4 baseline masked ppo trial 1
Number of trials: 1/1 (1 ERROR)
+------------------------------+----------+-------+
| Trial name                   | status   | loc   |
|------------------------------+----------+-------|
| PPO_connect_four_c954d_00000 | ERROR    |       |
+------------------------------+----------+-------+
Number of errored trials: 1
+------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------+
| Trial name                   |   # failures | error file                                                                                                               |
|------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------|
| PPO_connect_four_c954d_00000 |            1 | C:\Users\s315635\ray_results\c4 baseline masked ppo trial 1\PPO_connect_four_c954d_00000_0_2023-06-16_17-42-31\error.txt |
+------------------------------+--------------+--------------------------------------------------------------------------------------------------------------------------+

(RolloutWorker pid=30144)  [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(RolloutWorker pid=30144) 2023-06-16 17:42:46,562	ERROR worker.py:844 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=30144, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000021701A968B0>) [repeated 3x across cluster]
(PPO pid=14776)   File "python\ray\_raylet.pyx", line 881, in ray._raylet.execute_task [repeated 20x across cluster]
(PPO pid=14776)   File "python\ray\_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor [repeated 9x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\_private\function_manager.py", line 670, in actor_method_executor [repeated 9x across cluster]
(PPO pid=14776)     return method(__ray_actor, *args, **kwargs) [repeated 9x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span [repeated 26x across cluster]
(PPO pid=14776)     return method(self, *_args, **_kwargs) [repeated 26x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 194, in __init__ [repeated 35x across cluster]
(PPO pid=14776)     self._update_policy_map(policy_dict=self.policy_dict) [repeated 8x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1985, in _update_policy_map [repeated 8x across cluster]
(PPO pid=14776)     self._build_policy_map( [repeated 8x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 2097, in _build_policy_map [repeated 8x across cluster]
(PPO pid=14776)     new_policy = create_policy_for_framework( [repeated 8x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\utils\policy.py", line 142, in create_policy_for_framework [repeated 8x across cluster]
(PPO pid=14776)     return policy_class(observation_space, action_space, merged_config) [repeated 8x across cluster]
(PPO pid=14776)  [repeated 10x across cluster]
(PPO pid=14776)     model, dist_class = self._init_model_and_dist_class() [repeated 8x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 484, in _init_model_and_dist_class [repeated 8x across cluster]
(PPO pid=14776)     model = ModelCatalog.get_model_v2( [repeated 8x across cluster]
(PPO pid=14776)   File "C:\Users\s315635\Anaconda3\envs\ray_2_4_0_tCuda_older\lib\site-packages\ray\rllib\models\catalog.py", line 606, in get_model_v2 [repeated 8x across cluster]
(PPO pid=14776)     instance = model_cls( [repeated 8x across cluster]
(PPO pid=14776)     assert ( [repeated 8x across cluster]
(PPO pid=14776) AssertionError [repeated 9x across cluster]

Process finished with exit code 0

It seems like the error is coming from these lines of code:

So the problem seems to be the observation space def. on the environment side.

1 Like

Adding up to @kourosh’s comment: Recall that in that way action masking is implemented, the observation or state space of the environment needs to be a nested dictionary.

self.observation_space = gym.spaces.Dict(
                {
                    "action_mask": ....
                    "observations": ....
}

Hi!, @PhilippWillms @kourosh

Yes I know that and the observation space is a dict if you look here:

So that’s why it seems weird to me to have this error.
If you don’t have any ideas I suppose I will upload it on github.
Thanks,
George

@george_sk,

Your key is singular, "observation". The model is looking for "observations".

1 Like

Thanks @mannyv i didn’t notice the typo,

I removed the s and now it works. I will inform them in Pettingzoo in case they want to change it so that it is compatible.