Hi,
I am working on a multi-agent environment [PettingZoo Atari] where I want to train one agent with one policy [DQN] and the other agent with another policy [X-DQN]. I am using Ray 1.3.0. I came across multi_agent_cartpole.py example, which trains multiple agents on separate policies but with the same policy class. The multi_agent_two_trainers.py example trains one agent with DQN and other with PPO, which is closer to what I want. However, I have run into some issues. My two policies use custom models, but both are different from each other in terms of output [1 is multi-headed, other is not so if DQN outputs 6 values, the other outputs some n times of that]. It seems to validate the model class with the model interface for each of them, i.e. validates X-DQN policy with DQN’s model interface (and throws error for distributional_tf_policy.py which X-DQN is not even using).
This line seems to be the source of the problem where policies
are supposedly being validated (referencing the example but my code is similar).
config={
"multiagent": {
"policies": policies,
Here is my code snippet
# register the custom env with rllib
register_env(env_name, lambda config: ParallelPettingZooEnv(env_creator(config)))
print("Registered Environment {}".format(env_name))
# register the custom model with rllib
ModelCatalog.register_custom_model("Atari_XDQN_Model", Atari_XDQN_Model)
ModelCatalog.register_custom_model("AtariModel", AtariModel)
# generate the obs for custom policy generation
dummy_env = ParallelPettingZooEnv(env_creator({}))
obs_space = dummy_env.observation_space
act_space = dummy_env.action_space
num_agents = len(dummy_env.agents)
policies = {
"x_dqn_policy": (X_DQNPolicy, obs_space, act_space, {}),
"dqn_policy": (DQNTFPolicy, obs_space, act_space, {}),
}
def policy_mapping_fn(agent_id, **kwargs):
agent_dict = {'first_0': 0, 'second_0':1, 'third_0':2, 'fourth_0':3}
if agent_dict[agent_id] % 2 == 0:
return "x_dqn_policy"
else:
return "dqn_policy"
x_dqn_trainer = X_DQNTrainer(
env=env_name,
config={
"log_level": "DEBUG",
"multiagent": {
"policies": policies,
"policy_mapping_fn": policy_mapping_fn,
"policies_to_train": ["x_dqn_policy"],
},
"model": {
"custom_model": "Atari_XDQN_Model"
},
})
dqn_trainer = DQNTrainer(
env=env_name,
config={
"log_level": "DEBUG",
"multiagent": {
"policies": policies,
"policy_mapping_fn": policy_mapping_fn,
"policies_to_train": ["dqn_policy"],
},
"model": {
"custom_model": "AtariModel"
},
})
The relevant error from the console is where it tries to wrap X-DQN model with a model interface of DistributionalQTFModel (relevant catalog.py code here) which is the model interface for the DQN policy agent here but my custom policy is using my custom model interface.
2021-10-05 03:44:12,594 INFO catalog.py:387 -- Wrapping <class 'models.xdqn_model_manual.Atari_X_DQN_Model'> as <class 'ray.rllib.agents.dqn.distributional_q_tf_model.DistributionalQTFModel'>
and throws error about missing positional argument which are part of X-DQN and are not supposed to be part of DQN.
File "/home/user/miniconda3/envs/env_xdqn/lib/python3.8/site-packages/ray/rllib/agents/dqn/distributional_q_tf_model.py", line 64, in __init__
super(DistributionalQTFModel, self).__init__(
TypeError: __init__() missing 2 required positional arguments: 'number_of_gammas' and 'number_of_agents'
What am I missing here?
P.S I have read this issue about sequential training (and I am fine with sequential flow) and don’t need a single trainer as in two_trainer_workflow.py which seems more complicated to me.
Thank you and would appreciate any help.