Get agent ID in multi-agent setting

Sorry, this is probably a very basic question. I’m not sure where to retrieve the agent_ids for the multiple agents created in a multiagent setting, so I can map the policy functions on. Can anyone please point me there?

Or set the agent_id, if that is easier

Hi @lucas_spangher

The agent_ids are provided by the environment. They are often strings but they need not be. They can be any hashable type. These ids will be the keys in the dictionaries the environment returns from calls to reset and step.

You need to figure what those ids are then write a function that given an agent id returns a key for the appropriate policy that you specified in your multiagent config for rllib.

Ah yes, this is what my question was trying to ask. How in environments are the agent ID’s set?

I didn’t catch this in the examples in the rllib library. I can look through the examples again, as I was looking more in agent configs and such, but, if you have a ready example you can point me to, I’d love to see. Thank you very much!

Are you making your own multiagent env or are you using a pre-made one?

I’m converting a single agent env to multiagent. So I guess I would need to set up the naming conventions?

The environment, CounterfactualMicrogridRLlib, is at the bottom of this file, if you’re curious to see:

Yes it is totally up to you. Rllib gives no intrinsic meaning to agent ids. They are arbitrary but they do need to be unique for every agent in the environment. You could name them 0…n or red, green, blue,…, or agent_0, agent_1,…,agent_n.

If you are going to assign didn’t policies to different types of agents then strings is a good way to go because it makes the policy_mapping_function easy to write. This would be like car_0, car_1, truck _0, bike_0, bike_1, bike_2,…

But, sorry, this is probably sounding completely dumb, but how and where in a multiagent env are individual agents processed and assigned names? I don’t see any function in any multiagent inits that are aware of individual agents, just policies.

You make them up in the environment you are writing.

This example holds a list of agents. In reset and step it gives them an int agent id from 0…n. i is the agent id and a.reset() is providing the observation.

Agent ids are not specified anywhere in the RLlib config. They come from the environment.

Thanks for bearing with me. I appreciate it.

Hey Manny,

I should have been more clear. The example you posted is part of the source of my confusion. I don’t see anywhere that anything like an agent_id is set when agents are created. In the example, self.agents are an unnamed list of identical environments.

Am I correct in understanding then that this example deals with unnamed agents, that agent_id isn’t strictly necessary for multi agent env to work, and you should only set it yourself if you need to assign policies based on it?

If this is the case, perhaps it would be helpful to have an environment similar to the MultiAgentTraffic environment that is on the tutorial of multiagent envs, because the code snippets from that made it seem like the agents required naming generally.


I agree that the example I showed is a bit out of the ordinary but it was the only one I could find easily in the examples.

The “agent_id” is the key used to access an agent in the environments value in the dictionary returned by reset or step.

Let’s say we have an environment with 3 agents and their observation space is a Discrete(1).

We call env.reset() and get the following result.

First we should fire the developer who wrote that environment. :joy:

This environment currently has 3 agents. The agent_ids are 0, “a”, and (1,2).

Internally there is no “name” for the agent. The ID is just the dictionary keys used to access information about an environment. Yes agent_ids are required because multiagent envs must return dictionaries and non-empty dictionaries must have keys.


Here is another emvironment that may make more sense?

I’ve created a multiagent framework that organizes how agents are stored in an environment. You may find it helpful. Design — Abmarl 0.1.3 documentation

Hey Manny!!

Thanks for following up! I’ve been putting some time into this about 2-3 days a week, so apologies for the delayed responses.

SO I’ve looked at in with my team… I realized that one area of oversight was the initial tutorial for multiagent env: RLlib Environments — Ray v1.6.0 which in the first block has us calling the reset() function to get a dictionary with “car_1”, “car_2”, “traffic_light_1”. These are agents, of which only three are running that particular turn (even though the init in the tutorial initializes 20 cars and 5 traffic lights, only three are running that turn.)

My confusion was that in the next code block, we are assigning policies for “car1”, “car2”, and “traffic_light”. These are different than the agents that we are retrieving above… they are policies! There are two car policies. So I thought that somehow the two were and had to be related… that we needed agent_ids when creating the policies.

NOW my understanding is that the action_dict in step() is filled by the values that are returned by reset(), and will correspond to each agent. The policies are created at the beginning and will train over time.

Is that correct?

One further thing I noticed through accidentally leaving a print statement in my policy_mapping_fn() is that it is basically called each step. First I thought this was a bug, but now I understand that you may want to dynamically map the policies to each set of active agents each turn. Is that correct?

 def policy_mapping_fn(agent_id, **kwargs):
      pol_id = agent_id
      return pol_id

BTW – my environment will have an equal number of agents and policies, and maintain the mapping throughout the training. So I think I have it right now, but that was a bit difficult.

Also, BTW , @rusu24edward , thanks! I think I’m going to avoid overhauling my codebase as it is based on RLLib currently, unless your framework can fit into it?

You are right, the agents with observations returned in one step will be the ones that provide actions in the next. I too have learned this from experience, and it would be nice if the tutorials made this explicitly clear for those who design environments from scratch.

The framework that I linked to integrates with Rllib. It provides a couple of neat features: workflow scripts, config files, and the environment interface I define not only makes the agent organization explicit, but it also better separates the part of the simulation that updates the state and the part that returns information to agents. No overhaul needed :slight_smile: I designed it by going through the same pains you are, so you may be able to accelerate your development by using it. I’d be happy to provide more guidance if you want to use it.

Have you had a chance to look at, where I think the agent_ID is set here.

I was using PettingZoo and I also needed to access agent_ids, but the agent_IDs in that environment are strings (see here), so I had to do a dict mapping like following

def policy_mapping_fn(agent_id, **kwargs):
        agent_dict = {'first_0': 0, 'second_0':1, 'third_0':2, 'fourth_0':3}
        if agent_dict[agent_id] % 2 == 0:
            return "dqn_policy"    # Even numbered agents 0,2,4...
            return "ppo_policy"     # Odd numbered agents 1,3,5...