I have followed the ray/centralized_critic.py at master · ray-project/ray (github.com) to get the centralized critic implementation working for more than two agents.
During evaluation I restore the network and then call compute_single_action() on the trained policy.
Now, since the setup is understood here comes my question, is it possible that we do not invoke the centralized critic model since compute_single_action is independent of the critic network? How do we implement such a behavior?
Inpiration example for the question-
We use 4 similar types of agents in training and while evaluation we use 8 similar agents, the new 4 agents will use the policy ids of the first 4 agents.@sven1977 @ericl
Hi @kapilPython,
The actor of the TorchCentralizedModel (self.model) is a decentralized model. This is used by compute_*action. You already have the behavior you seek.
It is the value function that is centralized but that is only called during training either at the end of an episode or all rollouts by postprocess_trajectory or in the loss function.
Hi @mannyv, yeah you are right I was carrying out some experiments to check the validity of what I wanted to do. Yes I confirm the critic and actor are already bifurcated and once deployed will not need other agents observations.
If you were to use the critic to get the estimated return, it would need the other agents observations but just using the actor policy to get actions only requires a single agent’s observations.
For example, you could not use that model to do decentralized training because you would not have the other agents’ observations or actions which are required for the value function.