Multi-Agent System for maximizing the overall reward of all agents?

aronium · April 6, 2021, 9:33am

Hello,

maybe you can help me with the following problem.

System Model:

We assume two Agents A and B, and their corresponding actions a_a and a_b .
The actions space is defined as {1,2,3}. The numbers correspond to the row and column of the reward matrix.
The reward matrices are defined as follows:

Reward Matrix for Agent A:
[11 0 0,
0 0 0,
0 0 -10]

Reward Matrix for Agent B:
[11 0 0,
0 0 0,
0 0 100]

Now, Agent A chooses the column and agent B the row. Both observe the reward from their corresponding matrix.

Optimization Objective

We want to maximize the overall reward, i.e. the sum of the reward of Agent A and B.

Whats the question now?

We solve the above objective by using a centralized critic multi-agent system, following this tutorial (ray/centralized_critic.py at master · ray-project/ray · GitHub).

Unfortunately, this system only optimizes the agents individual reward, as seen here:

Unbenannt

Is there any example of a multi-agent system which is suited to our optimization objective, i.e. maximizing the overall reward?

In our model, this would lead to a overall reward of 90 instead of 22.

Thank you very much.

Best,
Aaron

mannyv · April 10, 2021, 2:43pm

@aronium the reward comes from the environment. What I gleaned from your description is that you have two agents, each is training a seperate policy which means each will store their own samples independently. If you want them to share a global reward then you need to provide that. The natural place to do that is in the environment. I don’t k ow your ultimate goals but one way I would think you do that is to include a “global_mixing” parameter when constructing the environment and define the reward as some variation of (own_reward + self.global_mixing * other_reward) Then you have a continuum where if global_mixing is 0 each would be maximizing their own individual reward and 1 would be a fully shared global reward.

With your current setup I would expect that you should see a switch from 0 0 to 2 2 in the actions somewhere around .21.

Topic		Replies	Views
Multi-agent Env with different reward functions for different agents? RLlib	6	409	September 14, 2021
Global optima with centralized critic (basic understanding) RLlib	10	2342	April 10, 2021
Share rewards in a cooperative multiagent environment RLlib	7	437	September 9, 2022
A single centralized critic for multiple actor agents RLlib	1	592	July 19, 2021
How to distribute the final reward among agents in a fully-cooperative turn-taking environmet? RLlib	4	281	October 28, 2021

Multi-Agent System for maximizing the overall reward of all agents?

Related topics