Multi-Agent Training with Different Algorithms

mgerstgrasser · October 11, 2022, 1:40pm

Maybe I can partly answer this! Short answer: No, you can’t directly just pass two algorithms to Tune. If you have two agents and need to train them with different algorithms, then how to best do it depends on whether you need the two agents to learn from the same batch of experiences (my original question), or if you’re happy for them to generate experiences separately.

Note that there are two different “two-trainer” examples:

github.com

ray-project/ray/blob/master/rllib/examples/multi_agent_two_trainers.py

"""Example of using two different training methods at once in multi-agent.

Here we create a number of CartPole agents, some of which are trained with
DQN, and some of which are trained with PPO. We periodically sync weights
between the two algorithms (note that no such syncing is needed when using just
a single training method).

For a simpler example, see also: multiagent_cartpole.py
"""

import argparse
import gym
import os

import ray
from ray.rllib.algorithms.dqn import DQN, DQNTFPolicy, DQNTorchPolicy
from ray.rllib.algorithms.ppo import (
    PPO,
    PPOTF1Policy,
    PPOTF2Policy,

This file has been truncated. show original

and

github.com

ray-project/ray/blob/master/rllib/examples/two_trainer_workflow.py

"""Example of using a custom training workflow.

Here we create a number of CartPole agents, some of which are trained with
DQN, and some of which are trained with PPO. Both are executed concurrently
via a custom training workflow.
"""

import argparse
import os

import ray
from ray import air, tune
from ray.rllib.agents import with_common_config
from ray.rllib.algorithms.algorithm import Algorithm
from ray.rllib.algorithms.dqn.dqn import DEFAULT_CONFIG as DQN_CONFIG
from ray.rllib.algorithms.dqn.dqn_tf_policy import DQNTFPolicy
from ray.rllib.algorithms.dqn.dqn_torch_policy import DQNTorchPolicy
from ray.rllib.algorithms.ppo.ppo import DEFAULT_CONFIG as PPO_CONFIG
from ray.rllib.algorithms.ppo.ppo_tf_policy import PPOTF1Policy
from ray.rllib.algorithms.ppo.ppo_torch_policy import PPOTorchPolicy

This file has been truncated. show original

The first one does separate experiences, the second one does one shared environment and shared experiences.

Separate experiences: This is easier. You train both algorithms completely separately. Each Algorithm has two Policies, but only trains one of them. E.g. DQN has a DQNPolicy and a PPOPolicy, but it uses the PPOPolicy “read-only” to generate experiences. It uses both the policies to generate a sample batch, then it trains the DQNPolicy. Then you sync the DQNPolicy weights to the PPO algorithm’s DQNPolicy, PPO uses both its policies to generate a sample batch, trains its PPOPolicy, then syncs weights back to the DQN algorithm’s PPOPolicy. Rinse and repeat. If you want to pass this into Tune, you could just wrap the workflow in that example into a function trainable.

Shared env and experiences: That’s the example you linked to. This is a lot more difficult, and essentiall you have to write our own Algorithm/Trainer for the specific combination of algorithms you want to use. I would avoid this unless you absolutely have to have both agents act in the same environment. If you want to do any combination other than PPO and DQN, you’d have to start from scratch, basically, and even for PPO+DQN I think the example might be missing a few details.

What’s currently simply not possible in RLlib is to plug-and-play together different Algorithms.

MSchlech · October 11, 2022, 2:01pm

Thanks a lot!
That explains it, however it’s what I was afraid of. I’ll have to apply the second solution as I’m testing heterogeneous agents learning in a competitive game.

mgerstgrasser · October 11, 2022, 2:21pm

Does your setting hinge on one agent observing how the behavior of the other agent changes as the other agent learns? In many cases, you might get away with the separate-experiences workflow, and it is much easier to do currently. I’d think very carefully if you really need to same-env approach.

If you do, another approach that I discussed with Sven at one point would be to have one of the two Algorithms use an offline input reader. So something like the separate-experiences workflow, you have the DQN algorithm generate experiences, then grab them somehow and feed those same experiences into the PPO algorithm. That may or may not be easier than a custom Algorithm.

MSchlech · October 11, 2022, 2:31pm

If I understood it correctly, then yes; I do need the same-env approach.

I’m trying to examine the interaction between different algorithms during the learning process in a competitive multi agent env, so I’ll need the action of every agent in every step.

mgerstgrasser · October 11, 2022, 2:46pm

Ah, are you looking at something like “How did the specific action sampled by one agent influence the learning of the other agent?” (as opposed to the policy of the agent / expectation of actions) - then yes, that would be another scenario where separate experiences might not work.

Topic		Replies	Views
Can't understand training config Configure Algorithm, Training, Evaluation, Scaling	2	31	July 30, 2024
Multi-agent Training with two Policies throwing model interfacing error RLlib	2	809	October 7, 2021
How to run multiple trainers? RLlib	2	330	August 26, 2022
Customize DQN policy in two-trainer multiagent example RLlib	4	374	September 20, 2022
Understanding the Stopping Process for ray.rllib.agents.dqn.DQNTrainer.train() RLlib	4	580	May 26, 2021

Multi-Agent Training with Different Algorithms

Related topics