Central Critic from different policies

Sertingolix · July 16, 2021, 12:14pm

Hi there,

Currently I want to implement a central Critic. Sadly not all of my agents act every timestep.

There are two options for a central critic as far as i know.

use a mixin as in centralized_critic.py
→ This would mean that all actors would have to act at the same time. Is there an option around it?
→ I could have a separate agent who could observe the whole env. I found that I might use the episode argument and save something in the user_data field of the Episode. How could I ensure, that one agent gets processed first (the global agent would need to write the field). Preference wise, I think something in this direction would be the cleanest solution.
add global state to EVERY observation. (centralized_critic_2)
→ This certainly would work. I would like to avoid it. as its a huge overhead in every agents action I compute the central value,…

Do you have ideas on how to solve this problem or even just quick suggestions. Thank you.

Btw I’m using torch but I do not think solving this problem in the scope of the Model API helps.

mannyv · July 16, 2021, 12:27pm

What about using option 1 but putting the global state in the info dictionary then looking for it there in the callback? You could put it in one agent or all the agent’s. You would have full control of that part.

If this global state is large it does have the issue of adding a lot of extra data to the sample batch if you add it to every agents info dictionary. On the other hand if you put it in only one agents you would need some more complicated logic to find it if every agent has the possibility of being done before the episode finishes.

Another option that is commonly used is to not mark individual agents as done in the middle of the episode. Instead they would have a “null” observation (usually all zeros) and there would be a noop action. These two changes combined with an action mask that mask out all actions but noop for agents that are unofficially “done”. This approach really only works well in practice with discrete action spaces

Sertingolix · July 19, 2021, 6:44am

Thank you.
Using the info dict is a good idea. Luckily it is available in torch again.

Topic		Replies	Views
A single centralized critic for multiple actor agents RLlib	1	592	July 19, 2021
Centralized critic, but decentralized evaluation RLlib	3	659	November 11, 2021
Global optima with centralized critic (basic understanding) RLlib	10	2344	April 10, 2021
Centralized critic PPO with non-homogenous agents RLlib	0	485	February 27, 2022
More than 2 agents with centralized critic 2 example RLlib	0	288	October 28, 2022

Central Critic from different policies

Related topics