How severe does this issue affect your experience of using Ray?
- Low/Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I have a basic MultiAgentEnv
that runs fine with PPO.train() (or so it appears, I’m still learning Ray) but fails with tune.Tuner.fit(). The error is:
ValueError: The two structures don't have the same nested structure.
First structure: type=ndarray str=[0.1 0. ]
Second structure: type=OrderedDict str=OrderedDict([('agent_0', array([ 8.450692e+17, -5.816952e+16], dtype=float32)), ('agent_1', array([-7.0000346e+17, -9.6193201e+17], dtype=float32)), ('agent_2', array([-5.5688862e+17, 7.0265260e+17], dtype=float32)), ('agent_3', array([6.6857392e+17, 8.4390335e+17], dtype=float32)), ('agent_4', array([2.0008603e+17, 1.2134728e+17], dtype=float32)), ('agent_5', array([3.9633845e+17, 7.5621022e+17], dtype=float32)), ('agent_6', array([ 8.7897234e+17, -3.4141877e+17], dtype=float32)), ('agent_7', array([-3.0604514e+17, 3.4147052e+17], dtype=float32)), ('agent_8', array([-6.072403e+17, -9.624188e+17], dtype=float32)), ('agent_9', array([-4.1622738e+17, -5.9749262e+17], dtype=float32))])
More specifically: Substructure "type=OrderedDict str=OrderedDict([('agent_0', array([ 8.450692e+17, -5.816952e+16], dtype=float32)), ('agent_1', array([-7.0000346e+17, -9.6193201e+17], dtype=float32)), ('agent_2', array([-5.5688862e+17, 7.0265260e+17], dtype=float32)), ('agent_3', array([6.6857392e+17, 8.4390335e+17], dtype=float32)), ('agent_4', array([2.0008603e+17, 1.2134728e+17], dtype=float32)), ('agent_5', array([3.9633845e+17, 7.5621022e+17], dtype=float32)), ('agent_6', array([ 8.7897234e+17, -3.4141877e+17], dtype=float32)), ('agent_7', array([-3.0604514e+17, 3.4147052e+17], dtype=float32)), ('agent_8', array([-6.072403e+17, -9.624188e+17], dtype=float32)), ('agent_9', array([-4.1622738e+17, -5.9749262e+17], dtype=float32))])" is a sequence, while substructure "type=ndarray str=[0.1 0. ]" is not
Entire first structure:
.
Entire second structure:
OrderedDict([('agent_0', .), ('agent_1', .), ('agent_2', .), ('agent_3', .), ('agent_4', .), ('agent_5', .), ('agent_6', .), ('agent_7', .), ('agent_8', .), ('agent_9', .)])
(PPO pid=16357) 2023-12-01 20:39:59,702 ERROR actor_manager.py:500 -- Ray error, taking actor 1 out of service. ray::RolloutWorker.apply() (pid=16362, ip=127.0.0.1, actor_id=9d3d30bad174b049a0aebe7f01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x1483ff0a0>)
The code is here in a simplified form. Up until line 275 is just setup and class definitions. PPO.train() is run on line 309. tune.Tuner.fit() is run on line 341. The error clearly indicates that tune.Tuner.fit() is comparing the observation output of a single agent to the observation of all agents but I’m not sure why and even more confusing, the exact same class works with PPO.train(). I’m not sure what the difference is between the two, but I’m guessing it has something to do with the way the MultiAgentEnv
is being used under the hood. Any help to get it working with tune.Tuner.fit() would be greatly appreciated!