Multi-Agent System for maximizing the overall reward of all agents?

Hello,

maybe you can help me with the following problem.

System Model:

We assume two Agents A and B, and their corresponding actions a_a and a_b .
The actions space is defined as {1,2,3}. The numbers correspond to the row and column of the reward matrix.
The reward matrices are defined as follows:

Reward Matrix for Agent A:
[11 0 0,
0 0 0,
0 0 -10]

Reward Matrix for Agent B:
[11 0 0,
0 0 0,
0 0 100]

Now, Agent A chooses the column and agent B the row. Both observe the reward from their corresponding matrix.

Optimization Objective

We want to maximize the overall reward, i.e. the sum of the reward of Agent A and B.

Whats the question now?

We solve the above objective by using a centralized critic multi-agent system, following this tutorial (ray/centralized_critic.py at master · ray-project/ray · GitHub).

Unfortunately, this system only optimizes the agents individual reward, as seen here:

Unbenannt

Is there any example of a multi-agent system which is suited to our optimization objective, i.e. maximizing the overall reward?

In our model, this would lead to a overall reward of 90 instead of 22.

Thank you very much.

Best,
Aaron

1 Like

@aronium the reward comes from the environment. What I gleaned from your description is that you have two agents, each is training a seperate policy which means each will store their own samples independently. If you want them to share a global reward then you need to provide that. The natural place to do that is in the environment. I don’t k ow your ultimate goals but one way I would think you do that is to include a “global_mixing” parameter when constructing the environment and define the reward as some variation of (own_reward + self.global_mixing * other_reward) Then you have a continuum where if global_mixing is 0 each would be maximizing their own individual reward and 1 would be a fully shared global reward.

With your current setup I would expect that you should see a switch from 0 0 to 2 2 in the actions somewhere around .21.