Initialization of multiagent envs

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi, I have been reading several tutorials (see belo) and I am not sure what is required on the init, and what can be left for the 'step()'method.

I am creating my won environment I have 50 (or more) agents in total.They form 2 groups:

  • 1 agent is only the “proposer”, with some convoluted action space that looks like this: ([1,0,0][0.25,0.5, 025])
  • The other 49 are the “responders” with action space just 1 or 0.

Question: I am trying to understand what needs to be defined on the init and what can be left for the step method.

In particular, since I have 49 (or more) responders, I wonder where I need to map them to their action and observation spaces.

option a) Should I create 49 responder-agents entries on the dict defined in the init method.

  • a dictionary of 50 agents IDs
  • a dictionary with 50 entries like this: Something like (very inelegant):

`

self.action_space = gym.spaces.Dict( {
‘proposer’: gym.spaces.Tuple((gym.spaces.MultiBinary(self.n_agents), gym.spaces.Box(low=0.0, high=1.0,shape=(self.n_agents,)))),
‘responder1’: Discrete(2)
‘responder2’: Discrete(2)
…etc
} )

`

option b) Define the init in a more abstract way:

  • a dictionary of 50 agents IDs
  • a dictionary with only 2 entries (instead of 50):

self.action_space = gym.spaces.Dict( {
‘proposer’: gym.spaces.Tuple((gym.spaces.MultiBinary(self.n_agents), gym.spaces.Box(low=0.0, high=1.0,shape=(self.n_agents,)))),
‘responder’: Discrete(2)
})

and then… under this option, on the step method, I will say: “here are the ID’s of 49 agents, map their action to the “responder” in the dictionary (with only 2 entries) I gave you on the init.

NB:
I have already read these tutorials and examples and I am not clear:
self play with open spiel
This one only shows option (a), but seems to suggest that option (b) is possible
multi-agent-and-hierarchical
This one only shows option (a) but it’s for only 2 agents:
multi_agent_different_spaces_for_agents.py

Thanks!

Hi @Username1,

If you are only going to have the two policies one for proposer and one for responder then you should go with option b.

Thank you very much @mannyv
In fact, I am expecting to have one policy for both agents. The proposer and responder.

Do you have any example in Ray where the action and observation spaces are defined through functions instead of being defined on the init method?

Thanks!

@Username1 if they have different observation or action spaces then they cannot share the same policy.

oh! thank you very much for this clarification! @mannyv
Very important, thanks so much!

One question, do you have any example in Ray where the action and observation spaces are defined through functions instead of being defined on the init method?

I am not sure what you mean by defined through a function instead of the init. Can you elaborate or give an example?

Hello @mannyv , thank you very much for your time.

This is a “turn-based” game, where at each time, a (randomly selected) agent proposes, then another responds and after that, a reward is calculated. After, another agent is randomly selected and the process continues.

At each time, the action and observation spaces are different depending on the role of the agent (proposer/responder), so the spaces have to be assigned by a function, as they are not fixed, like in the RLLIB examples.

My story is as follows:

The randomly selected agent “A” proposes
The other agents {B,C,D…} respond: Yes/No
Then a reward is calculated if there is an agreement.
Then another agent is selected as a proposer, say “B”… and so on.

The problem is:

When an agent is selected as a proposer, their action space is:

gym.spaces.Tuple((gym.spaces.MultiBinary(self.n_agents), gym.spaces.Box(low=0.0, high=1.0,shape=(self.n_agents,))))

When an agent is a responder, their observation space is the prrevious agent’s proposal, while their action space is:

Discrete(2) #1=accept,0=reject

So at each turn, a given agent changes its observation and action space.

However, looking at RLLIB examples they are not exactly what I need, since the agent’s observation and action space are defined at init()

I need a function that changes the Obs and Action space every time an agent is randomly selected (i.e. 1 movement of the board). And not when the environment is **instantiated **(i.e., at the init()).

Possible solution (would this work?)
Avoid defining the self.observation_space and self.action_space at the init time and just define it on the step method.

I don’t know if this will work with RLLIB.
(i.e., to define the observation and action spaces ONLY in the step() method before the policy is called)

I was thinking if there is any implementation of RLLIB that you know of, that would achieve this. Here is an example of how they do this in PettingZoo (partially what I need)

Thank you very much!

Hi @Username1,

With rllib the observation and action spaces of an agent_id cannot change during the experiment. One thing you could do is have two agent_ids for each role. Something like "agent1_p" and "agent1_r". Then you have two policies {“proposer”, “responder”} and a policy mapping function with logic like this: "proposer" if agent_id.endswith("_p") else "responder".

Thank you very much!