Hi,
I have some trouble using the policies_to_train
setting in the multiagent
config with MADDPG.
When I try training only one of the policies with this setting I get an exception that ‘obs_1’ can’t be found.
Here is a minimal example to reproduce this, based on the two_step_game example code for MADDPG:
from gym.spaces import Discrete
import ray
from ray import tune
from ray.rllib.examples.env.two_step_game import TwoStepGame
if __name__ == "__main__":
config = {
"env_config": {
"actions_are_logits": True,
},
"multiagent": {
"policies": {
"pol1": (None, Discrete(6), TwoStepGame.action_space, {
"agent_id": 0,
# This fixes the problem
# "use_local_critic": True
}),
"pol2": (None, Discrete(6), TwoStepGame.action_space, {
"agent_id": 1,
}),
},
"policy_mapping_fn": lambda x: "pol1" if x == 0 else "pol2",
"policies_to_train": ['pol1'] # This causes an exception
},
"framework": "tf",
# Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
"num_gpus": 0,
}
ray.init(num_cpus=2)
stop = {
"episode_reward_mean": 7,
"timesteps_total": 50000,
"training_iteration": 200,
}
config = dict(config, **{
"env": TwoStepGame,
})
results = tune.run(# MADDPGTrainer,
'contrib/MADDPG',
stop=stop, config=config, verbose=1)
ray.shutdown()
I ran this with 2.0.0.dev0
I think the problem might be that I am using "use_local_critic": False
for both agents. Now when I am training only one policy, the shared critic still expects the observations of the policy that is not being trained, but due the the policies_to_train
setting these observations are not available.
When I set "use_local_critic": True
for the agent that is training, this exception does not occur. However, this does not solve my problem because my use case is actually the following:
- Train 2 MADDPG agents normally with shared critics and self-play
- Restore the checkpoint but now only train one of the agents, while freezing the other
Is there any way to do this?
The above behavior seems like a bug.
This issue seems similar, but does not solve my issue.