Mutiagent - Different action space for different agents

Hello, I have a multiagent environment on which 2 agents bargain. It is like a"turn-based" environment.

Similar to this:
e.g. user1 action → env obs for user2 → user2 action → env obs for user1 → user1 action, etc…

In particular:

In one round, an agent propose a price for an asset, say $50.87
and the other agent replies with Sell/No Sell.

So the action space for agent 1 is continuous (i.e. “box”) as it is a price to offer.
while the action and observation space for the other agent is discrete {0,1} as it is accept/reject the price.

Q: I am not sure how to define the Action and State spaces in my MultiAgent environment so RLLIb can understand it. Since one agent proposes continuous numbers (the action is the number to propose) and the other agent sees these continuous prices and proposes discrete numbers.

In particular, I don’t know how to define this:

def __init__(self, config=None):
super().__init__()
action_space = Box(low=0.0, high=2.0, shape=(1,), dtype=np.float16)
observation_space = Box(low=-np.inf, high=np.inf, shape=(3,), dtype=np.float16)

Note that both agents will be trained with the same policy. The model learns which price to propose and which price to reject, at the same time. So 1 policy for both agents (for now).

Thanks!

1 Like

Hi @Username1,

Welcome to the forum.

Perhaps this post may be a helpful place to start.

Hello @mannyv, thank you very much for this pointer.

I have some questions to clarify.
My setting is as follows: Similar to a “turn-based game” .I have 50 agents. On each round, one agent is picked at random and makes a move. The other agents 49 agents observe the move and each respond with a binary action space (0 or 1). On the next step, a new random agent is selected.

The questions:

1) In the init method.
a) Should I be specifying the number and ID of all agents?
b) Should I set the action space in a generic way in the init ? where should I map the action space available to each agent? (remember, there is one agent that changes every time)

c) The actions of the random agent are a combination of 2 vectors, one discrete and one random. Like this: [(1,0,0…), (0.2, 0.55, …)]. The first vector is composed of binary numbers, the second vector is composed or continuous random numbers.

How do I define such combined action space?

Thank you very much!

Hi @Username1,

Beside @mannyv’s pointer, I would recommend looking into RLlib examples at GitHub:

I guess combining the two above should give you the answer. I would recommend first making those RLlib examples work and then adopt to your needs :slight_smile:

Hello @vlainic, thank you very much for these examples. They are indeed very useful.

My undestanding is that:

a) yes, every agent should have an ID defined at the init
Then at every turn, on the “step” method, one should map the available actions/ observations class to each ID.

what is this method doing? self._spaces_in_preferred_format = True
(seems to be from the parent class)

b) I don’t know how to define an action/observation space with a mix of discrete and continuous vectors.

My action space for the random agent space is as follows: ([vector of 0’s and 1’s], [vector of random numbers])

Would a tuple like this fly: tuple([discrete(50)], [box(50]) ?

Thank you very much for your time.

Hey @Username1,

  1. Exactly.
  • I am not 100% sure what does self._spaces_in_preferred_format = True does, but I found the following comment in the docs:
# Do the action and observation spaces map from agent ids to spaces
# for the individual agents?

So my guess would be that when this is True you expect strict formats as you defined them in env. Just keep it True :smiley:

  1. Yeah, something like:
self.action_space = gym.spaces.Dict(
    {"agent0": gym.spaces.Discrete(2), "agent1": gym.spaces.Box(3)}
)

I hope I helped :slight_smile:

1 Like

yes! thank you very much.

I am confused between using “discrete(number of agents)”

or: “multibinary”

So something like this:

gym.spaces.Tuple(
(gym.spaces.MultiBinary(number of agents),
gym.spaces.Box(low=0.0, high=1.0,shape=(number of agents,)))
)

To put an example, the “proposer” will propose something like:

([1,0,0],[0.5,0.25,0.25])

This is the action space of the proposer and the observation space of the responder.

Thanks!

Well, it’s kind of off-topic what you ask now, and I do not know your application/problem. I advise looking into gym github for what does each space type does. Each py-file has a clear example at the class explanation :wink:

Thank you very much for this example and your answers.

Thanks so much!

1 Like