Vectorized multi-agent setup

richard · February 2, 2021, 8:04pm

It seems like the multi-agent architecture in RLLib expects MultiAgentDict’s for the observations, dones, infos, and rewards.

Are there plans to support a vectorized version of this such that instead of Dict[agent id, np.ndarray] we can simply have np.ndarray’s where the first dimension is assumed to be agent dimension? Of course one constraint we can impose is that all the agents share the same underlying policy.

sven1977 · February 8, 2021, 8:52am

Hey @richard , thanks for sharing this idea. No, we have not thought about a setup like this, where the agent dimension is “just another dim”, like batch or time. I’m guessing this would be very useful for large amounts of agents sharing the same obs/action spaces or - better - policies.

richard · February 8, 2021, 4:00pm

Yes exactly. I’m trying to spec out if there are any gotchas in implementing this: do you happen to have an idea of how involved this would be to implement?

sven1977 · February 12, 2021, 10:46am

I think this could be quite easy, actually.
You would have to let RLlib know via some config flag that the env returns agent-batches instead of agent IDs. So each item in the returned np.array would be corresponding to the agent’s ID:

The env would do (obs space=Discrete(2)):
obs = np.array([0, 1, 1, 0])
return obs, rew, ...

And RLlib would interpret this as:

{0: 0, 1: 1, 2: 1, 3: 0}, where the keys are the agents' implicit IDs.

We would - I think - only have to change the _env_runner generator in rllib/evaluation/sampler.py to interpret raw observations from the env as implicitly agent-wise batches, that’s all. Everything else would still be the same (batched forward pass to calculate actions). Also, we would NOT(!) have to re-write the produced actions anymore into a dict to be sent to the env, but can leave the batched action computations as is as the env would probably want the actions to be np.arrays as well.

Topic		Replies	Views
Question about Environment/Observation construction RLlib	1	385	June 17, 2021
[RLlib] Question on BatchMultiAgent Environment Error RLlib	2	492	February 9, 2021
Mutiagent - Different action space for different agents RLlib	8	1815	August 25, 2022
How to vectorize RLlib MultiAgentEnv similar to Gymnasium's VectorEnv? RLlib	0	170	November 28, 2023
Implementing _forward() Method in PPO Custom Multi-Agent Shared Policy RLlib	1	54	February 19, 2025

Vectorized multi-agent setup

Related topics