How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I have some RLlib code involving a multiagent config that I successfully run with a “checkpoint_at_end=True” param and that successfully learns one policy per agent (e.g.: I obtain 4 policies, one for each of my 4 PPO-based agents). However, when I look at the saved checkpoints, I only a unique checkpoint for all agents although I was expecting one checkpoint per agent (i.e. 1 neural-net per agent). Is this possible? If yes, how to configure it?
Thanks