Hello,

maybe you can help me with the following problem.

**System Model:**

We assume two Agents A and B, and their corresponding actions a_a and a_b .

The actions space is defined as {1,2,3}. The numbers correspond to the row and column of the reward matrix.

The reward matrices are defined as follows:

Reward Matrix for Agent A:

[11 0 0,

0 0 0,

0 0 -10]

Reward Matrix for Agent B:

[11 0 0,

0 0 0,

0 0 100]

Now, Agent A chooses the column and agent B the row. Both observe the reward from their corresponding matrix.

**Optimization Objective**

We want to maximize the overall reward, i.e. the sum of the reward of Agent A and B.

**Whats the question now?**

We solve the above objective by using a centralized critic multi-agent system, following this tutorial (ray/centralized_critic.py at master Â· ray-project/ray Â· GitHub).

Unfortunately, this system only optimizes the agents individual reward, as seen here:

**Is there any example of a multi-agent system which is suited to our optimization objective, i.e. maximizing the overall reward?**

In our model, this would lead to a overall reward of 90 instead of 22.

Thank you very much.

Best,

Aaron