1. Severity of the issue: (select one)
High: Completely blocks me.
2. Environment:
- Ray version: 2.44.1
- Python version: 3.10.2
- OS: Ubuntu 22.04
- Cloud/Infrastructure: Local
- Other libs/tools (if relevant): Python dependencies
– pygame 2.6.1
– torch 2.7.0
– numpy 1.26.4
3. What happened vs. what you expected:
- Expected: RLlib learns and evaluates PPO for some iterations.
- Actual:
The size of tensor a (5) must match the size of tensor b (3) at non-singleton dimension 1
Hello everyone, I am new to reinforcement learning and RLlib. I am trying to create a custom multi-agent warehouse environment where agents perform delivery tasks from Loader to Unloader. I want them to learn to avoid collisions with each other. Here is the full program GitHub - Kriss213/RL_environment: Mathematical environmet for reinforcement learning
I am having trouble with the training script. I would greatly appreciate any help!
Training script:
import ray
from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
import configparser
import os
import sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
from src.Environment import WarehouseEnv
from ray.rllib.env import EnvContext
from copy import deepcopy
if __name__ == "__main__":
ray.init()
config = configparser.ConfigParser()
config.read('config.ini')
config_dict = WarehouseEnv.parse_config(config)
env_config = EnvContext(config_dict, worker_index=0)
env = WarehouseEnv(config=env_config)
obs_space = deepcopy(env.single_observation_space)
act_space = deepcopy(env.single_action_space)
agent_ids = deepcopy(env.agents)
config = (
PPOConfig()
.environment(WarehouseEnv,env_config=env_config)
.framework("torch")
.env_runners(num_env_runners=0)#.rollouts(num_rollout_workers=0)
.resources(num_cpus_for_main_process=0)
.learners(num_gpus_per_learner=1)
.rl_module(model_config={
'train_batch_size': 4000,
'minibatch_size': 128,
'lr': 5e-4,
'gamma': 0.99,
'vf_clip_param': 10.0,
})
.multi_agent(
policies={
"shared_policy": (None, obs_space, act_space, {}),
},
policy_mapping_fn=lambda agent_id, *args, **kwargs: "shared_policy"
)
.evaluation(
evaluation_interval=1,
evaluation_duration=3,
evaluation_config={"explore": False}
)
)
tuner = tune.Tuner(
"PPO",
run_config=tune.RunConfig(
name="warehouse_marl_train",
stop={"training_iteration": 100},
checkpoint_config=tune.CheckpointConfig(
checkpoint_frequency=1,
checkpoint_at_end=True,
)
),
param_space=config.to_dict()
)
tuner.fit()
This gives me an error:
Failure # 1 (occurred at 2025-05-03_11-41-56)
e[36mray::PPO.train()e[39m (pid=324446, ip=10.236.63.56, actor_id=1062579a4dc07d50af153a2b01000000, repr=PPO(env=<class 'src.Environment.WarehouseEnv'>; env-runners=0; learners=0; multi-agent=True))
File "/home/kriss/.local/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 330, in train
raise skipped from exception_cause(skipped)
File "/home/kriss/.local/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 327, in train
result = self.step()
File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 973, in step
eval_results = self._run_one_evaluation(parallel_train_future=None)
File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 3088, in _run_one_evaluation
eval_results = self.evaluate(
File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 1119, in evaluate
) = self._evaluate_on_local_env_runner(self.eval_env_runner)
File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 1244, in _evaluate_on_local_env_runner
episodes = env_runner.sample(
File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/env/multi_agent_env_runner.py", line 217, in sample
samples = self._sample(
File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/env/multi_agent_env_runner.py", line 322, in _sample
to_env = self._module_to_env(
File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/connectors/connector_pipeline_v2.py", line 123, in __call__
batch = connector(
File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/connectors/module_to_env/get_actions.py", line 60, in __call__
self._get_actions(module_data, rl_module[module_id], explore)
File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/connectors/module_to_env/get_actions.py", line 91, in _get_actions
batch[Columns.ACTION_LOGP] = action_dist.logp(actions)
File "/home/kriss/.local/lib/python3.10/site-packages/ray/rllib/models/torch/torch_distributions.py", line 597, in logp
return sum(flat_logps)
RuntimeError: The size of tensor a (5) must match the size of tensor b (3) at non-singleton dimension 1