I’m having trouble figuring out some architectural detail.
TL;DR - I would like to broadcast a message between all the agents sharing the environment.
Specifically I would like each agent to have 2 policies, one that is trained via the environment “true” reward, and one that is trained via some communication protocol loss which I’ll define.
My agents’ model inherits from the “RecurrentTFModelV2” module.
The problem is that I don’t understand how can I return a list of [action, message] out of the forward() function of the model.
Is there a way?