Incompatibility of Differentiable Comms in RL Lib

Hi, I came across this “it is occasionally useful to allow for differentiable communication between agents.
This can allow for efficient modeling of shared computations or communication channels
between agents in the environment. Supporting this feature conflicts with existing RLlib
abstractions for defining policies;” by Eric Liang.
Could you explain why “Supporting this feature conflicts with existing RLlib
abstractions for defining policies” ? Thanks!

Hey @kia , thanks for the question. We have thought about this problem for some time now, sharing models between policies for multi-agent purposes. The key issue is that even though we are able to access the other agents’ batches (environment rollouts, including observations/actions/rewards) inside any agent’s postprocessing/loss function, these data are always static. See for example our and example scripts.

ray/rllib/examples/ In this example, the value function network is NOT shared between different policies (“pol1” and “pol2”), but rather each policy uses its own
value network. The “central” aspect here comes from the fact that these
two value networks (from “pol1” and “pol2”) both see all agents’ observations.

ray/rllib/examples/ Very similar to above setup: No actually shared model between policies (both policies use their own separate value functions and train these independently).

What you want: One policy having access to the model of another policy (agent) in order to compute gradients through this other policy’s model. RLlib cannot currently do this.

1 Like