Currently I want to implement a central Critic. Sadly not all of my agents act every timestep.
There are two options for a central critic as far as i know.
use a mixin as in centralized_critic.py
→ This would mean that all actors would have to act at the same time. Is there an option around it?
→ I could have a separate agent who could observe the whole env. I found that I might use the episode argument and save something in the user_data field of the Episode. How could I ensure, that one agent gets processed first (the global agent would need to write the field). Preference wise, I think something in this direction would be the cleanest solution.
add global state to EVERY observation. (centralized_critic_2)
→ This certainly would work. I would like to avoid it. as its a huge overhead in every agents action I compute the central value,…
Do you have ideas on how to solve this problem or even just quick suggestions. Thank you.
Btw I’m using torch but I do not think solving this problem in the scope of the Model API helps.