Initialization of multiagent envs

Username1 · August 25, 2022, 3:44pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi, I have been reading several tutorials (see belo) and I am not sure what is required on the init, and what can be left for the 'step()'method.

I am creating my won environment I have 50 (or more) agents in total.They form 2 groups:

1 agent is only the “proposer”, with some convoluted action space that looks like this: ([1,0,0][0.25,0.5, 025])
The other 49 are the “responders” with action space just 1 or 0.

Question: I am trying to understand what needs to be defined on the init and what can be left for the step method.

In particular, since I have 49 (or more) responders, I wonder where I need to map them to their action and observation spaces.

option a) Should I create 49 responder-agents entries on the dict defined in the init method.

a dictionary of 50 agents IDs
a dictionary with 50 entries like this: Something like (very inelegant):

`

self.action_space = gym.spaces.Dict( {
‘proposer’: gym.spaces.Tuple((gym.spaces.MultiBinary(self.n_agents), gym.spaces.Box(low=0.0, high=1.0,shape=(self.n_agents,)))),
‘responder1’: Discrete(2)
‘responder2’: Discrete(2)
…etc
} )

`

option b) Define the init in a more abstract way:

a dictionary of 50 agents IDs
a dictionary with only 2 entries (instead of 50):

self.action_space = gym.spaces.Dict( {
‘proposer’: gym.spaces.Tuple((gym.spaces.MultiBinary(self.n_agents), gym.spaces.Box(low=0.0, high=1.0,shape=(self.n_agents,)))),
‘responder’: Discrete(2)
})

and then… under this option, on the step method, I will say: “here are the ID’s of 49 agents, map their action to the “responder” in the dictionary (with only 2 entries) I gave you on the init.

NB:
I have already read these tutorials and examples and I am not clear:
self play with open spiel
This one only shows option (a), but seems to suggest that option (b) is possible
multi-agent-and-hierarchical
This one only shows option (a) but it’s for only 2 agents:
multi_agent_different_spaces_for_agents.py

Thanks!

mannyv · August 26, 2022, 1:35pm

Hi @Username1,

If you are only going to have the two policies one for proposer and one for responder then you should go with option b.

Username1 · August 29, 2022, 12:40pm

Thank you very much @mannyv
In fact, I am expecting to have one policy for both agents. The proposer and responder.

Do you have any example in Ray where the action and observation spaces are defined through functions instead of being defined on the init method?

Thanks!

mannyv · August 29, 2022, 1:06pm

@Username1 if they have different observation or action spaces then they cannot share the same policy.

Username1 · August 29, 2022, 1:08pm

oh! thank you very much for this clarification! @mannyv
Very important, thanks so much!

One question, do you have any example in Ray where the action and observation spaces are defined through functions instead of being defined on the init method?

mannyv · August 30, 2022, 12:03pm

I am not sure what you mean by defined through a function instead of the init. Can you elaborate or give an example?

Username1 · August 30, 2022, 12:24pm

Hello @mannyv , thank you very much for your time.

This is a “turn-based” game, where at each time, a (randomly selected) agent proposes, then another responds and after that, a reward is calculated. After, another agent is randomly selected and the process continues.

At each time, the action and observation spaces are different depending on the role of the agent (proposer/responder), so the spaces have to be assigned by a function, as they are not fixed, like in the RLLIB examples.

My story is as follows:

The randomly selected agent “A” proposes
The other agents {B,C,D…} respond: Yes/No
Then a reward is calculated if there is an agreement.
Then another agent is selected as a proposer, say “B”… and so on.

The problem is:

When an agent is selected as a proposer, their action space is:

gym.spaces.Tuple((gym.spaces.MultiBinary(self.n_agents), gym.spaces.Box(low=0.0, high=1.0,shape=(self.n_agents,))))

When an agent is a responder, their observation space is the prrevious agent’s proposal, while their action space is:

Discrete(2) #1=accept,0=reject

So at each turn, a given agent changes its observation and action space.

However, looking at RLLIB examples they are not exactly what I need, since the agent’s observation and action space are defined at init()

I need a function that changes the Obs and Action space every time an agent is randomly selected (i.e. 1 movement of the board). And not when the environment is **instantiated **(i.e., at the init()).

Possible solution (would this work?)
Avoid defining the self.observation_space and self.action_space at the init time and just define it on the step method.

I don’t know if this will work with RLLIB.
(i.e., to define the observation and action spaces ONLY in the step() method before the policy is called)

I was thinking if there is any implementation of RLLIB that you know of, that would achieve this. Here is an example of how they do this in PettingZoo (partially what I need)

github.com

Farama-Foundation/PettingZoo/blob/28dd5b94eed778fc0a41f333632df741819ca5e6/pettingzoo/classic/rps/rps.py#L83


      
              self.action_spaces = {agent: Discrete(num_actions) for agent in self.agents}
              self.observation_spaces = {
                  agent: Discrete(1 + num_actions) for agent in self.agents
              }
          
          
    self.screen = None
              self.history = [0] * (2 * 5)
          
          
    self.reinit()
          
          
def observation_space(self, agent):
              return self.observation_spaces[agent]
          
          
def action_space(self, agent):
              return self.action_spaces[agent]
          
          
def reinit(self):
              self.agents = self.possible_agents[:]
              self._agent_selector = agent_selector(self.agents)
              self.agent_selection = self._agent_selector.next()
              self.rewards = {agent: 0 for agent in self.agents}

Thank you very much!

mannyv · August 31, 2022, 12:29pm

Hi @Username1,

With rllib the observation and action spaces of an agent_id cannot change during the experiment. One thing you could do is have two agent_ids for each role. Something like "agent1_p" and "agent1_r". Then you have two policies {“proposer”, “responder”} and a policy mapping function with logic like this: "proposer" if agent_id.endswith("_p") else "responder".

Username1 · August 31, 2022, 12:56pm

Thank you very much!

Topic		Replies	Views
MultiAgents type actions/observation space defined in environement RLlib	8	1388	May 10, 2022
Mutiagent - Different action space for different agents RLlib	8	1817	August 25, 2022
Different observation space in MultiAgentEnv RLlib	2	748	August 12, 2021
I'm confused about how policy mapping works in configuration RLlib	5	2515	July 29, 2022
Different step space for different agents RLlib	7	841	August 11, 2021

Initialization of multiagent envs

Related topics