Env precheck inconsistent with Trainer

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Recently, I’ve been upgrading my rllib dependency, and I ran into a number of issues starting with version 1.10. I see that in this version there was a bit of an overhaul to the MultiAgentEnv class. I’ve been making some code changes on my end to match up with these changes, and I’m running into some issues. I’m trying to connect with the newest version of ray, which is 1.12.1 as of this post.

I have a simple game called MultiCorridor that I use for testing. This is a multiagent game, and each agent has the following observation space:

observation_space={
    'position': Box(0, self.end-1, (1,), int),
    'left': MultiBinary(1),
    'right': MultiBinary(1)
}

Here is a snippet of the function that returns the observation, which is called from my step and reset functions:

agent_position = self.agents[agent_id].position
if agent_position == 0 or self.corridor[agent_position-1] is None:
    left = False
else:
    left = True
if agent_position == self.end-1 or self.corridor[agent_position+1] is None:
    right = False
else:
    right = True
return {
    'position': [agent_position],
    'left': [left],
    'right': [right],
}

I’ve broken down my issue into two parts:

The checker makes design harder

In previous version of rllib, the trainer was smart enough to see that the observations are in the observation space, even though the types don’t match up exactly.

When I attempt to run this with rllib 1.12.1, I get:

ValueError: The observation collected from env.reset was not contained within your env's observation space. Its possible that there was a typemismatch (for example observations of np.float32 and a space ofnp.float64 observations), or that one of the sub-observations wasout of bounds

 reset_obs: {'agent0': {'position': [0], 'left': [False], 'right': [False]}}

 env.observation_space_sample(): {'agent1': OrderedDict([('left', array([1], dtype=int8)), ('position', array([4])), ('right', array([0], dtype=int8))]), 'agent2': OrderedDict([('left', array([0], dtype=int8)), ('position', array([0])), ('right', array([1], dtype=int8))]), 'agent3': OrderedDict([('left', array([0], dtype=int8)), ('position', array([2])), ('right', array([0], dtype=int8))]), 'agent4': OrderedDict([('left', array([0], dtype=int8)), ('position', array([1])), ('right', array([0], dtype=int8))]), 'agent0': OrderedDict([('left', array([1], dtype=int8)), ('position', array([1])), ('right', array([0], dtype=int8))])}

So I tried making this match up by changing my code. I got to this point:

agent_position = self.agents[agent_id].position
if agent_position == 0 or self.corridor[agent_position-1] is None:
    left = False
else:
    left = True
if agent_position == self.end-1 or self.corridor[agent_position+1] is None:
    right = False
else:
    right = True
out = OrderedDict()
out['left'] = np.array([int(left)], dtype=np.int8)
out['position'] = np.array([agent_position])
out['right'] = np.array([int(right)], dtype=np.int8)
return out

and I still get this error:

ValueError: The observation collected from env.reset was not contained within your env's observation space. Its possible that there was a typemismatch (for example observations of np.float32 and a space ofnp.float64 observations), or that one of the sub-observations wasout of bounds

 reset_obs: {'agent0': OrderedDict([('left', array([1], dtype=int8)), ('position', array([7])), ('right', array([0], dtype=int8))])}

 env.observation_space_sample(): {'agent4': OrderedDict([('left', array([0], dtype=int8)), ('position', array([5])), ('right', array([1], dtype=int8))]), 'agent2': OrderedDict([('left', array([1], dtype=int8)), ('position', array([2])), ('right', array([1], dtype=int8))]), 'agent0': OrderedDict([('left', array([0], dtype=int8)), ('position', array([9])), ('right', array([0], dtype=int8))]), 'agent3': OrderedDict([('left', array([1], dtype=int8)), ('position', array([5])), ('right', array([1], dtype=int8))]), 'agent1': OrderedDict([('left', array([0], dtype=int8)), ('position', array([8])), ('right', array([1], dtype=int8))])}

At this point, I’m not really sure how to further modify the observation to match the type any closer. And besides that, it’s a bit ridiculous to be so detailed in the observation output instead of relying on a smart trainer that can match the types the way it did before.

The checker is inconsistent with the trainer

I turned off the environment checker and ran my env with the super-detailed observation output above. I was able to train and achieve the results that I reached with previous version of rllib. It seems strange to me that the environment checker would fail but the trainer would still run.

Furthermore, I was able to simplify the observations to this:

return {
    'position': np.array([agent_position]),
    'left': np.array([int(left)]),
    'right': np.array([int(right)]),
}

and the training runs. I could not simplify it further (i.e. back to what I had it before).

Suggestions

  1. The environment checker should not be stricter than the trainer.
  2. The trainers shouldn’t expect to receive exactly the same type as specified in the space. They should be smarter, at least as smart as they used to be.

Great, I’m asking just out of curiosity, your env is not gym and you are not subclassed Base.Env. your reset and step methods doesn’t have return. I thought we at least should subclass Base.Env if we don’t use gym. is that the env you are using?

Hi @hossein836, thanks for your quick reply. I use my own simulation framework which does not inherit anything from gym or rllib. This allows me flexibility in my design. In order to connect to Rllib, I wrap my sim in a custom Turn Based Manager, and then I wrap that object in my Multi Agent Wrapper. It’s a bit of a layered onion, but this design enables me to connect to a few different learning libraries as needed.

My environment is a MultiAgentEnv when it gets plugged into tune.

now I see, good experiment overall. good luck :+1:

Upon more experimentation, I discovered that the issue is with my TurnBasedManager. My game is designed to return only a single agent’s (obs, reward, done, info) in each step since I want my agents to take turns and since RLlib only produces actions for agents that report an observation.

Apparently, the env checker doesn’t like this and complains that it only saw the observations from a single agent and not the observations from all available agents. Is this the way env checker is supposed to work?

so what should you pass when you don’t want to pass anything to an agent? none? I hardly remember but I think that was ok to not pass anything related to an agent.

@hossein836 RLlib will generate actions only for those agents who reported an observation from step, so it’s okay to only report output from a subset of agents in each step (e.g. turn based games). This is why it is strange to me that the env checker seems to require output from all agents because it violates this feature.

1 Like

Hey @hossein836 ,

For now, you can turn off env checking with disable_env_checking=True.
I’ll find out if this a design issue or a bug.

Cheers

Author of the env pre-checker here:

Yeah this is an unnecessary detail that we check for. We should be be able to check that spaces are corrected if only specific agents return observations in your environment. Do you mind opening a ticket for this on github?

I’m not sure if we can make the pre-checker as lax as the trainer generally. Right now we use gym directly to do element and space checking, although, you can overload this checking in your environment by writing your own checking functions,

def observation_space_contains(self, x: MultiAgentDict) -> bool:

def action_space_contains(self, x: MultiAgentDict) -> bool:

def action_space_sample(self, agent_ids: list = None) -> MultiAgentDict:

def observation_space_sample(self, agent_ids: list = None) -> MultiEnvDict:
    
3 Likes

Thanks @avnishn, I will do as you suggest.

I’ve set up a PR that deals with this issue by warning instead of raising an error in your case @rusu24edward. I’ve used your library to test this.
You might want to add a super().__init__() to your MultiAgentWrapper, since that’s also something our env checker looks out for!

@avnishn is the expert here and I’ll ask him for review.

1 Like