How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
Recently, I’ve been upgrading my rllib dependency, and I ran into a number of issues starting with version 1.10. I see that in this version there was a bit of an overhaul to the MultiAgentEnv class. I’ve been making some code changes on my end to match up with these changes, and I’m running into some issues. I’m trying to connect with the newest version of ray, which is 1.12.1 as of this post.
I have a simple game called MultiCorridor that I use for testing. This is a multiagent game, and each agent has the following observation space:
observation_space={
'position': Box(0, self.end-1, (1,), int),
'left': MultiBinary(1),
'right': MultiBinary(1)
}
Here is a snippet of the function that returns the observation, which is called from my step and reset functions:
agent_position = self.agents[agent_id].position
if agent_position == 0 or self.corridor[agent_position-1] is None:
left = False
else:
left = True
if agent_position == self.end-1 or self.corridor[agent_position+1] is None:
right = False
else:
right = True
return {
'position': [agent_position],
'left': [left],
'right': [right],
}
I’ve broken down my issue into two parts:
The checker makes design harder
In previous version of rllib, the trainer was smart enough to see that the observations are in the observation space, even though the types don’t match up exactly.
When I attempt to run this with rllib 1.12.1, I get:
ValueError: The observation collected from env.reset was not contained within your env's observation space. Its possible that there was a typemismatch (for example observations of np.float32 and a space ofnp.float64 observations), or that one of the sub-observations wasout of bounds
reset_obs: {'agent0': {'position': [0], 'left': [False], 'right': [False]}}
env.observation_space_sample(): {'agent1': OrderedDict([('left', array([1], dtype=int8)), ('position', array([4])), ('right', array([0], dtype=int8))]), 'agent2': OrderedDict([('left', array([0], dtype=int8)), ('position', array([0])), ('right', array([1], dtype=int8))]), 'agent3': OrderedDict([('left', array([0], dtype=int8)), ('position', array([2])), ('right', array([0], dtype=int8))]), 'agent4': OrderedDict([('left', array([0], dtype=int8)), ('position', array([1])), ('right', array([0], dtype=int8))]), 'agent0': OrderedDict([('left', array([1], dtype=int8)), ('position', array([1])), ('right', array([0], dtype=int8))])}
So I tried making this match up by changing my code. I got to this point:
agent_position = self.agents[agent_id].position
if agent_position == 0 or self.corridor[agent_position-1] is None:
left = False
else:
left = True
if agent_position == self.end-1 or self.corridor[agent_position+1] is None:
right = False
else:
right = True
out = OrderedDict()
out['left'] = np.array([int(left)], dtype=np.int8)
out['position'] = np.array([agent_position])
out['right'] = np.array([int(right)], dtype=np.int8)
return out
and I still get this error:
ValueError: The observation collected from env.reset was not contained within your env's observation space. Its possible that there was a typemismatch (for example observations of np.float32 and a space ofnp.float64 observations), or that one of the sub-observations wasout of bounds
reset_obs: {'agent0': OrderedDict([('left', array([1], dtype=int8)), ('position', array([7])), ('right', array([0], dtype=int8))])}
env.observation_space_sample(): {'agent4': OrderedDict([('left', array([0], dtype=int8)), ('position', array([5])), ('right', array([1], dtype=int8))]), 'agent2': OrderedDict([('left', array([1], dtype=int8)), ('position', array([2])), ('right', array([1], dtype=int8))]), 'agent0': OrderedDict([('left', array([0], dtype=int8)), ('position', array([9])), ('right', array([0], dtype=int8))]), 'agent3': OrderedDict([('left', array([1], dtype=int8)), ('position', array([5])), ('right', array([1], dtype=int8))]), 'agent1': OrderedDict([('left', array([0], dtype=int8)), ('position', array([8])), ('right', array([1], dtype=int8))])}
At this point, I’m not really sure how to further modify the observation to match the type any closer. And besides that, it’s a bit ridiculous to be so detailed in the observation output instead of relying on a smart trainer that can match the types the way it did before.
The checker is inconsistent with the trainer
I turned off the environment checker and ran my env with the super-detailed observation output above. I was able to train and achieve the results that I reached with previous version of rllib. It seems strange to me that the environment checker would fail but the trainer would still run.
Furthermore, I was able to simplify the observations to this:
return {
'position': np.array([agent_position]),
'left': np.array([int(left)]),
'right': np.array([int(right)]),
}
and the training runs. I could not simplify it further (i.e. back to what I had it before).
Suggestions
- The environment checker should not be stricter than the trainer.
- The trainers shouldn’t expect to receive exactly the same type as specified in the space. They should be smarter, at least as smart as they used to be.