I wonder if this is possible using RLlib because, as I understand, in centralized critic example (ray/centralized_critic_2.py at master · ray-project/ray · GitHub) each of the agent has its own centralized critic.
Can I implement a custom algorithm that has a single centralized critic that gives a value for all agents given current state and all agents’ action? What options do I have if I want to implement this?