Hierarchical (Multiagent) environment's reset() function raises error with seemingly correct output

Hey everyone. Very specific bug here.

My reset() function returns exactly what the reset function in the hierarchical_windy_maze example returns in the working example – a dictionary mapping “higher_leve_agent” to the _get_observation.

       ret = self._get_observation() 
        
        return {"higher_level_agent": ret}

On the face of it, _get_observation is calculating something that is well specified: a valid np.array of dimension (192,), which is what the observation space is set up as.

I’ve followed the trace and and it errors out at _process_observation(), an inner rllib function that handles the reset output. I can’t see anything wrong with what I’ve done. Can some savior please swoop in and save me? For reference, my code is erroring at line 377 of the following file: transactive-control-social-game/feudal_env.py at feudal_rl · tsgoten/transactive-control-social-game · GitHub

ValueError: ('Observation ({}) outside given space ({})!', array([0.08283 , 0.08283 , 0.08283 , 0.08283 , 0.08283 , 0.08283 ,
       0.08283 , 0.08283 , 0.0969  , 0.0969  , 0.0969  , 0.0969  ,
       0.0969  , 0.0969  , 0.0969  , 0.0969  , 0.0969  , 0.0969  ,
       0.0969  , 0.0969  , 0.0969  , 0.08283 , 0.08283 , 0.08283 ,
       0.049698, 0.049698, 0.049698, 0.049698, 0.049698, 0.049698,
       0.049698, 0.049698, 0.05814 , 0.05814 , 0.05814 , 0.05814 ,
       0.05814 , 0.05814 , 0.05814 , 0.05814 , 0.05814 , 0.05814 ,
       0.05814 , 0.05814 , 0.05814 , 0.049698, 0.049698, 0.049698,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ]), Box([-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf], [inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf inf inf inf inf], (192,), float32))

Solved… the observation space of the multiagent policies had to be stated as np.float64’s. We were very puzzled by this fix, as we tried forcing the observations themselves to be np.float32’s, but that still brought up the error. So perhaps there is some stage internally at which 64s are required… in any case if anyone else has this issue, pls try our fix and let us know