MultiAgentEnv works with PPO.train() but not tune.Tuner.fit()

jacob-thrackle · December 3, 2023, 5:07am

How severe does this issue affect your experience of using Ray?

Low/Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have a basic MultiAgentEnv that runs fine with PPO.train() (or so it appears, I’m still learning Ray) but fails with tune.Tuner.fit(). The error is:

ValueError: The two structures don't have the same nested structure.

First structure: type=ndarray str=[0.1 0. ]

Second structure: type=OrderedDict str=OrderedDict([('agent_0', array([ 8.450692e+17, -5.816952e+16], dtype=float32)), ('agent_1', array([-7.0000346e+17, -9.6193201e+17], dtype=float32)), ('agent_2', array([-5.5688862e+17, 7.0265260e+17], dtype=float32)), ('agent_3', array([6.6857392e+17, 8.4390335e+17], dtype=float32)), ('agent_4', array([2.0008603e+17, 1.2134728e+17], dtype=float32)), ('agent_5', array([3.9633845e+17, 7.5621022e+17], dtype=float32)), ('agent_6', array([ 8.7897234e+17, -3.4141877e+17], dtype=float32)), ('agent_7', array([-3.0604514e+17, 3.4147052e+17], dtype=float32)), ('agent_8', array([-6.072403e+17, -9.624188e+17], dtype=float32)), ('agent_9', array([-4.1622738e+17, -5.9749262e+17], dtype=float32))])

More specifically: Substructure "type=OrderedDict str=OrderedDict([('agent_0', array([ 8.450692e+17, -5.816952e+16], dtype=float32)), ('agent_1', array([-7.0000346e+17, -9.6193201e+17], dtype=float32)), ('agent_2', array([-5.5688862e+17, 7.0265260e+17], dtype=float32)), ('agent_3', array([6.6857392e+17, 8.4390335e+17], dtype=float32)), ('agent_4', array([2.0008603e+17, 1.2134728e+17], dtype=float32)), ('agent_5', array([3.9633845e+17, 7.5621022e+17], dtype=float32)), ('agent_6', array([ 8.7897234e+17, -3.4141877e+17], dtype=float32)), ('agent_7', array([-3.0604514e+17, 3.4147052e+17], dtype=float32)), ('agent_8', array([-6.072403e+17, -9.624188e+17], dtype=float32)), ('agent_9', array([-4.1622738e+17, -5.9749262e+17], dtype=float32))])" is a sequence, while substructure "type=ndarray str=[0.1 0. ]" is not

Entire first structure:

.

Entire second structure:

OrderedDict([('agent_0', .), ('agent_1', .), ('agent_2', .), ('agent_3', .), ('agent_4', .), ('agent_5', .), ('agent_6', .), ('agent_7', .), ('agent_8', .), ('agent_9', .)])

(PPO pid=16357) 2023-12-01 20:39:59,702 ERROR actor_manager.py:500 -- Ray error, taking actor 1 out of service. ray::RolloutWorker.apply() (pid=16362, ip=127.0.0.1, actor_id=9d3d30bad174b049a0aebe7f01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x1483ff0a0>)

The code is here in a simplified form. Up until line 275 is just setup and class definitions. PPO.train() is run on line 309. tune.Tuner.fit() is run on line 341. The error clearly indicates that tune.Tuner.fit() is comparing the observation output of a single agent to the observation of all agents but I’m not sure why and even more confusing, the exact same class works with PPO.train(). I’m not sure what the difference is between the two, but I’m guessing it has something to do with the way the MultiAgentEnv is being used under the hood. Any help to get it working with tune.Tuner.fit() would be greatly appreciated!

jacob-thrackle · December 5, 2023, 7:16pm

Solved, sort of.

Policies passed to tune.Tuner.fit() do not infer the observation or action spaces, even if the Agent classes contain a self.observation_space attribute. In other words, each Policy sent to tune.Tuner.fit() via the param_space dictionary requires an observation_space and action_space attribute. In my example:

"policies": {
      "default_policy": PolicySpec(
          policy_class=RandomAction,
          observation_space=gym.spaces.Box(-1e18, 1e18, (2,)), # <-----
          action_space=gym.spaces.Discrete(3), # <-----
      ),
      "learned": PolicySpec(
          config=AlgorithmConfig.overrides(
              model={"use_lstm": True},
              framework_str="torch",
          ),
          observation_space=gym.spaces.Box(-1e18, 1e18, (2,)), # <-----
          action_space=gym.spaces.Discrete(3), # <-----
      ),
  },

While this solves the problem, it does not explain why not passing observation_space or action_space in PPO.train() works while doing the same to tune.Tuner.fit() does not.

Topic		Replies	Views
Help with ppo config in multiagent env with complex observations Configure Algorithm, Training, Evaluation, Scaling	0	37	April 11, 2025
Ray Tune tensor([[nan]]) for HRL (custom MultiAgentEnV)	0	231	September 18, 2023
Ray tune with multi-agent APPO Configure Algorithm, Training, Evaluation, Scaling	4	255	February 27, 2025
PolicyClient and QMix + MultiAgentEnv? RLlib	1	203	August 17, 2023
Nested structure difference RLlib	2	78	July 18, 2024

MultiAgentEnv works with PPO.train() but not tune.Tuner.fit()

Related topics