Pytorch error during evaluation

1. Severity of the issue: (select one)
High: Completely blocks me.

2. Environment:

  • Ray version: 2.44.1
  • Python version: 3.10.2
  • OS: Ubuntu 22.04
  • Cloud/Infrastructure: Local
  • Other libs/tools (if relevant): Python dependencies
    – pygame 2.6.1
    – torch 2.7.0
    – numpy 1.26.4

3. What happened vs. what you expected:

  • Expected: RLlib learns and evaluates PPO for some iterations.
  • Actual: The size of tensor a (5) must match the size of tensor b (3) at non-singleton dimension 1

Hello everyone, I am new to reinforcement learning and RLlib. I am trying to create a custom multi-agent warehouse environment where agents perform delivery tasks from Loader to Unloader. I want them to learn to avoid collisions with each other. Here is the full program GitHub - Kriss213/RL_environment: Mathematical environmet for reinforcement learning

I am having trouble with the training script. I would greatly appreciate any help!

Training script:

import ray
from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig

import configparser

import os
import sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))

from src.Environment import WarehouseEnv
from ray.rllib.env import EnvContext

from copy import deepcopy

if __name__ == "__main__":
    ray.init()

    config = configparser.ConfigParser()
    config.read('config.ini')
    config_dict = WarehouseEnv.parse_config(config)
    env_config = EnvContext(config_dict, worker_index=0)


    env = WarehouseEnv(config=env_config)
    obs_space = deepcopy(env.single_observation_space)
    act_space = deepcopy(env.single_action_space)
    agent_ids = deepcopy(env.agents)
   
    config = (
        PPOConfig()
        .environment(WarehouseEnv,env_config=env_config)
        .framework("torch")
        .env_runners(num_env_runners=0)#.rollouts(num_rollout_workers=0)
        .resources(num_cpus_for_main_process=0)
        .learners(num_gpus_per_learner=1)
        .rl_module(model_config={
            'train_batch_size': 4000,
            'minibatch_size': 128,
            'lr': 5e-4,
            'gamma': 0.99,
            'vf_clip_param': 10.0,
        })
        .multi_agent(
            policies={
                "shared_policy": (None, obs_space, act_space, {}),
            },
            policy_mapping_fn=lambda agent_id, *args, **kwargs: "shared_policy"
        )
        .evaluation(
            evaluation_interval=1,
            evaluation_duration=3,
            evaluation_config={"explore": False}
        )
    )

    tuner = tune.Tuner(
        "PPO",
        run_config=tune.RunConfig(
            name="warehouse_marl_train",
            stop={"training_iteration": 100},
            checkpoint_config=tune.CheckpointConfig(
                checkpoint_frequency=1,
                checkpoint_at_end=True,
            )
        ),
        param_space=config.to_dict()
    )

    tuner.fit()

This gives me an error:

Failure # 1 (occurred at 2025-05-03_11-41-56)
e[36mray::PPO.train()e[39m (pid=324446, ip=10.236.63.56, actor_id=1062579a4dc07d50af153a2b01000000, repr=PPO(env=<class 'src.Environment.WarehouseEnv'>; env-runners=0; learners=0; multi-agent=True))
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 330, in train
    raise skipped from exception_cause(skipped)
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 327, in train
    result = self.step()
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 973, in step
    eval_results = self._run_one_evaluation(parallel_train_future=None)
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 3088, in _run_one_evaluation
    eval_results = self.evaluate(
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 1119, in evaluate
    ) = self._evaluate_on_local_env_runner(self.eval_env_runner)
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 1244, in _evaluate_on_local_env_runner
    episodes = env_runner.sample(
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/env/multi_agent_env_runner.py", line 217, in sample
    samples = self._sample(
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/env/multi_agent_env_runner.py", line 322, in _sample
    to_env = self._module_to_env(
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/connectors/connector_pipeline_v2.py", line 123, in __call__
    batch = connector(
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/connectors/module_to_env/get_actions.py", line 60, in __call__
    self._get_actions(module_data, rl_module[module_id], explore)
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/connectors/module_to_env/get_actions.py", line 91, in _get_actions
    batch[Columns.ACTION_LOGP] = action_dist.logp(actions)
  File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/models/torch/torch_distributions.py", line 597, in logp
    return sum(flat_logps)
RuntimeError: The size of tensor a (5) must match the size of tensor b (3) at non-singleton dimension 1