Unpack_obs doesn't know to expect 1-hot

Hi, I’m not sure if this is a bug or am I missing something. I have an environment with this following obs space:

spaces.Tuple((
     spaces.Box(low=0, high=1, shape=(1+n_distractors, 3, 32, 32)),
     spaces.MultiDiscrete([n_vocab]*max_len),
     spaces.Discrete(2+n_distractors),
))

where n_distractors=1, n_vocab=10, max_len=20. When I try to run my env with PPO trainer, I get this error:

ValueError: Expected flattened obs shape of [..., 6347], got torch.Size([32, 6165])

I did some digging around and found that the error is because when unpacking the observation, the unpack_obs function doesn’t know to expect one-hot vectors. It expects a size of 6156, where 2*3*32*32+20+1=6156. However, the dummy batch generated was 6347, which =2*3*32*32+200+3, which was a one-hot version of the observation.

I wonder how to choose the expected format of input in the unpack function? I tried out both preprocessor_pref, but neither solved this problem.

1 Like

Hey @Aceticia , thanks for this question. This is probably caused by a bug in our Preprocessors, which don’t handle the one-hot case properly in a Tuple space.
We are currently in the process of getting rid of preprocessors entirely (they just seem to cause confusion as they change the observations w/o the user being in control) and you can now do this in the latest master:
_disable_preprocessor_api=True. I just tried it on your example (confirmed the error first) and this setting made the error go away and I was able to run a training iteration with PPO.

If you use the above flag and if you don’t specify a custom model, RLlib will use the “ComplexInputNetwork” by default as you have a Tuple observation space. This model will properly one-hot your Discrete and MultiDiscrete inputs before concating and passing everything though n Dense layers, e.g.:

model:
    fcnet_hiddens: [128, 128]  # <- hidden layer number and sizes

Use your own custom model via:

model:
    custom_model: [SomeModelV2 sub-class]
    custom_model_config: {
        # some c'tor args for your custom model class
    }

I’ll still try to fix the Preprocessor bug now. Thanks for raising this! :slight_smile:

Here is the fix PR. I’m sorry, we seem to have commented out an important test case some time ago due to it timing out. This is why we didn’t catch this regression sooner.
I confirmed that you can run your above example already with the current master, though, using _disable_preprocessor_api=True.

I’m glad my question can help you find the bug! One last question: I’m running a multi-agent game with this env, and I’d like all 2 agents to the same class of policy and custom model, but not share weights. Is the following setup correct? I’m wondering whether I can specify models in the policies item under multiagent , since in the future I might have slightly different models for these two agents.

ModelCatalog.register_custom_model("comm", xxx)
{
    "multiagent": {
        "policies": {"agent1": PolicySpec(), "agent2": PolicySpec()},
        "policy_mapping_fn": (lambda agent_id, **kwargs: agent_id)
    },
    "model": {"custom_model": "comm", "custom_model_config": {xxx}},
}

Hey @Aceticia , yeah, this looks like it would do the job. Two different policies (same class and config). Mapping fn looks good.
I’m assuming your agents have the same name as your policies (“agent1” and “agent2”), which is totally fine! :slight_smile:

The model config will be applied to both policies. You can give each of the policies their own “config overrides” to change them slightly and make them slightly different, if you wanted that. E.g.:

multiagent:
        policies:
          agent1 : PolicySpec(config={"lr": 0.005})
          agent2: PolicySpec(config={"lr": 0.000001})
...
1 Like