Multi-agent Training with two Policies throwing model interfacing error

Hi,

I am working on a multi-agent environment [PettingZoo Atari] where I want to train one agent with one policy [DQN] and the other agent with another policy [X-DQN]. I am using Ray 1.3.0. I came across multi_agent_cartpole.py example, which trains multiple agents on separate policies but with the same policy class. The multi_agent_two_trainers.py example trains one agent with DQN and other with PPO, which is closer to what I want. However, I have run into some issues. My two policies use custom models, but both are different from each other in terms of output [1 is multi-headed, other is not so if DQN outputs 6 values, the other outputs some n times of that]. It seems to validate the model class with the model interface for each of them, i.e. validates X-DQN policy with DQN’s model interface (and throws error for distributional_tf_policy.py which X-DQN is not even using).

This line seems to be the source of the problem where policies are supposedly being validated (referencing the example but my code is similar).

config={
            "multiagent": {
                "policies": policies,

Here is my code snippet

# register the custom env with rllib
    register_env(env_name, lambda config: ParallelPettingZooEnv(env_creator(config)))
    print("Registered Environment {}".format(env_name))

    # register the custom model with rllib
    ModelCatalog.register_custom_model("Atari_XDQN_Model", Atari_XDQN_Model)
    ModelCatalog.register_custom_model("AtariModel", AtariModel)
        
    # generate the obs for custom policy generation
    dummy_env = ParallelPettingZooEnv(env_creator({}))
    obs_space = dummy_env.observation_space
    act_space = dummy_env.action_space
    num_agents = len(dummy_env.agents)

    policies = {
        "x_dqn_policy": (X_DQNPolicy, obs_space, act_space, {}),
        "dqn_policy": (DQNTFPolicy, obs_space, act_space, {}),
    }
    
    def policy_mapping_fn(agent_id, **kwargs):
        agent_dict = {'first_0': 0, 'second_0':1, 'third_0':2, 'fourth_0':3}
    
        if agent_dict[agent_id] % 2 == 0:
            return "x_dqn_policy"    
        else:
            return "dqn_policy"     

    x_dqn_trainer = X_DQNTrainer(
        env=env_name,
        config={
            "log_level": "DEBUG",
            "multiagent": {
                "policies": policies,
                "policy_mapping_fn": policy_mapping_fn,
                "policies_to_train": ["x_dqn_policy"],
            },
            "model": {
                "custom_model": "Atari_XDQN_Model"
            },
        })

    dqn_trainer = DQNTrainer(
        env=env_name,
        config={
            "log_level": "DEBUG",
            "multiagent": {
                "policies": policies,
                "policy_mapping_fn": policy_mapping_fn,
                "policies_to_train": ["dqn_policy"],
            },
            "model": {
                "custom_model": "AtariModel"
            },
        })

The relevant error from the console is where it tries to wrap X-DQN model with a model interface of DistributionalQTFModel (relevant catalog.py code here) which is the model interface for the DQN policy agent here but my custom policy is using my custom model interface.

2021-10-05 03:44:12,594 INFO catalog.py:387 -- Wrapping <class 'models.xdqn_model_manual.Atari_X_DQN_Model'> as <class 'ray.rllib.agents.dqn.distributional_q_tf_model.DistributionalQTFModel'>

and throws error about missing positional argument which are part of X-DQN and are not supposed to be part of DQN.

File "/home/user/miniconda3/envs/env_xdqn/lib/python3.8/site-packages/ray/rllib/agents/dqn/distributional_q_tf_model.py", line 64, in __init__
    super(DistributionalQTFModel, self).__init__(
TypeError: __init__() missing 2 required positional arguments: 'number_of_gammas' and 'number_of_agents'

What am I missing here?

P.S I have read this issue about sequential training (and I am fine with sequential flow) and don’t need a single trainer as in two_trainer_workflow.py which seems more complicated to me.

Thank you and would appreciate any help.

@sven1977 @ericl Am I correct in assuming that two policies with different custom models and model outputs are not supported by this two_trainer method?

@rfali

Based on the stack trace you posted I don’t think the issue is with the two policy approach.

It seems as though you have added two new arguments to the model and they are not being passed in or not passed in in the correct order. You should probably not add them as positional arguments because then you will have to change code within rllib possibly in several places. I would recommend adding them in as kwargs.

The kwargs for the model get filled in by the values in dictionary at config[“model”][“custom_model_config”]

Do you have a full reproduction script you could share?

As an aside, if you are using tune to run these I do not think you need to explicitly create the trainers. You should be able to do tune.run("DQN", config,...).

It will use the following line in the config to determine which trainer to use.

   policies = {
        "x_dqn_policy": (X_DQNPolicy, obs_space, act_space, {}),
        "dqn_policy": (DQNTFPolicy, obs_space, act_space, {}),
    }

The reason you would use the version in the example is if you wanted to alternate training with one type of algorithm for a little bit, then the other algorithm. Based on what you said I do not think that is the type of set up you are going for here.