Hello,
maybe you can help me with the following problem.
System Model:
We assume two Agents A and B, and their corresponding actions a_a and a_b .
The actions space is defined as {1,2,3}. The numbers correspond to the row and column of the reward matrix.
The reward matrices are defined as follows:
Reward Matrix for Agent A:
[11 0 0,
0 0 0,
0 0 -10]
Reward Matrix for Agent B:
[11 0 0,
0 0 0,
0 0 100]
Now, Agent A chooses the column and agent B the row. Both observe the reward from their corresponding matrix.
Optimization Objective
We want to maximize the overall reward, i.e. the sum of the reward of Agent A and B.
Whats the question now?
We solve the above objective by using a centralized critic multi-agent system, following this tutorial (ray/centralized_critic.py at master · ray-project/ray · GitHub).
Unfortunately, this system only optimizes the agents individual reward, as seen here:
Is there any example of a multi-agent system which is suited to our optimization objective, i.e. maximizing the overall reward?
In our model, this would lead to a overall reward of 90 instead of 22.
Thank you very much.
Best,
Aaron