Asymmetric play multiagent environment

rodrigodelazcano · January 2, 2022, 5:59pm

I am trying to implement a multiagent environment to train it with Rllib’s scalable PPO algorithm. However, the stepping of the environment does not correspond to the MultiAgent environment pipeline. I will appreciate if someone could give me some assistance on how to implement the training of this environment. The pipeline of the environment is the following:

There are two agents A and B
Both agents train the same model with PPO but with independent weights.
Agent A has to end its episode (step x number of times and generate a rollout buffer) before agent B starts its episode. When agent B’s episode ends, agent A goes again.
Agent B’s episode depends on agent A’s final state.

Initially it looks like A and B can be defined as single agent environments. The issue is that B’s episode depends on A’s episode. The ideal will be to have a worker assigned for a pair of A and B environments and collect asynchronously rollout buffers for A and B. Is this possible with Rllib?

stefanbschneider · January 4, 2022, 11:03am

I would model this as a multi-agent environment with two agents, A and B, where each agent has its own policy (i.e., weights).

Inside the environment, you’d keep track of which agent is currently active and need to make sure that the step function only returns next observations for the agent that is acting next.
I.e., observations are a dict of agent ID → observation, and, AFIK, RLlib only gets actions from agents that have an observation. So if your environment always includes only the observation of the agent that is up next (A or B), only this agent is queried for the next action.

In doing so, only one agent acts at a time, still both agents have separate policies and the environment can depend on A and B.

I believe you can then run multiple copies of this environment in parallel, where each environment is specific to pairs of A and B agents. Not sure if duplicating the environment always makes sense in terms of scaling though: RLlib Training APIs — Ray v1.9.1

rodrigodelazcano · January 6, 2022, 12:36pm

Thank you for your prompt and detailed response Stefan! This worked perfectly and there was no need to duplicate the environment.

Topic		Replies	Views
Step by step way to interact with an environment and update an agent Configure Algorithm, Training, Evaluation, Scaling	1	349	May 23, 2023
Agents sharing the environment for efficiency RLlib	3	259	October 29, 2021
Multi-Agent Training with Different Algorithms RLlib	24	3440	October 11, 2022
Can't understand training config Configure Algorithm, Training, Evaluation, Scaling	2	32	July 30, 2024
Multi-agent setting different step sizes for agents and how actions are passed? RLlib	2	599	April 26, 2022

Asymmetric play multiagent environment

Related topics