Problem training my custom parallel PettingZoo environment

Every time I try to set up my personalized environment for training, either with the tuner or just with the train, nothing happens, the environment simply doesn’t train and stays stuck for hours without giving me any results! In the last test I tried, I did it as follows.

import argparse

from ray import air, tune
from ray.tune.registry import register_env
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.wrappers.pettingzoo_env import ParallelPettingZooEnv

from Ambiente_SOMN.make_env import make_env

# Based on code from github.com/parametersharingmadrl/parametersharingmadrl

parser = argparse.ArgumentParser()
parser.add_argument(
    "--num-gpus",
    type=int,
    default=1,
    help="Number of GPUs to use for training.",
)
parser.add_argument(
    "--as-test",
    action="store_true",
    help="Whether this script should be run as a test: Only one episode will be "
    "sampled.",
)

if __name__ == "__main__":
    args = parser.parse_args()

    def env_creator(args):
        return ParallelPettingZooEnv(make_env(-1,3,0))

    env = env_creator({})
    register_env("SOMN", env_creator)

    config = (
        PPOConfig()
        .environment("SOMN")
        .resources(num_gpus=0)
        .rollouts(num_rollout_workers=2)
        .training(
            train_batch_size=4000
        )
        .multi_agent(
            policies=env.get_agent_ids(),
            policy_mapping_fn=(lambda agent_id, *args, **kwargs: agent_id),
        )
    )

    algo = config.build()

    print(algo.train())

    if args.as_test:
        # Only a compilation test of running waterworld / independent learning.
        stop = {"training_iteration": 1}
    else:
        stop = {"episodes_total": 60000}

    tune.Tuner(
        "PPO",
        run_config=air.RunConfig(
            stop=stop,
            checkpoint_config=air.CheckpointConfig(
                checkpoint_frequency=10,
            ),
        ),
        param_space=config,
    ).fit()

However, the following message appears on the terminal

2024-01-11 00:23:54,160 ERROR worker.py:405 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::RolloutWorker.apply() (pid=14940, ip=127.0.0.1, actor_id=537b92b82109077a1ab83bb801000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x00000245D1A99450>)
  File "python\ray\_raylet.pyx", line 1813, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 1754, in ray._raylet.execute_task.function_executor
  File "C:\Users\Alysson\Documents\GitHub\OPTSIMFLEX\.venv\lib\site-packages\ray\_private\function_manager.py", line 726, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\Alysson\Documents\GitHub\OPTSIMFLEX\.venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\Alysson\Documents\GitHub\OPTSIMFLEX\.venv\lib\site-packages\ray\rllib\utils\actor_manager.py", line 189, in apply
    raise e
  File "C:\Users\Alysson\Documents\GitHub\OPTSIMFLEX\.venv\lib\site-packages\ray\rllib\utils\actor_manager.py", line 178, in apply
    return func(self, *args, **kwargs)
  File "C:\Users\Alysson\Documents\GitHub\OPTSIMFLEX\.venv\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 412, in set_weight
    w.set_weights(weights, global_vars)
  File "C:\Users\Alysson\Documents\GitHub\OPTSIMFLEX\.venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\Alysson\Documents\GitHub\OPTSIMFLEX\.venv\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1551, in set_weights
    self.policy_map[pid].set_weights(w)
  File "C:\Users\Alysson\Documents\GitHub\OPTSIMFLEX\.venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1024, in set_weights
    self.model.load_state_dict(weights)
  File "C:\Users\Alysson\Documents\GitHub\OPTSIMFLEX\.venv\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ComplexInputNetwork:
        size mismatch for logits_layer._model.0.weight: copying a param with shape torch.Size([103, 2560]) from checkpoint, the shape in current model is torch.Size([105, 2560]).
        size mismatch for logits_layer._model.0.bias: copying a param with shape torch.Size([103]) from checkpoint, the shape in current model is torch.Size([105]).

I’ve searched a lot and never found anything related that could help me, if anyone could give me a solution I would be very grateful

and I’m also unsure of how to put my own tensoboards in the code to check various aspects of my environment more precisely, for example there are a number of demands, rejections and patios in it, follow the link to my environment for any further questions

this environment was trained in a single agent way through SB3 with perfect execution, now with my duty being to train it multi-agent I am facing the difficulties mentioned above