How to set one checkpoint per agent in a multiagent config?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have some RLlib code involving a multiagent config that I successfully run with a “checkpoint_at_end=True” param and that successfully learns one policy per agent (e.g.: I obtain 4 policies, one for each of my 4 PPO-based agents). However, when I look at the saved checkpoints, I only a unique checkpoint for all agents although I was expecting one checkpoint per agent (i.e. 1 neural-net per agent). Is this possible? If yes, how to configure it?

As far as I know, there is still no option to do that, but you can workaround that though.
If you use the same environment, you can just load your checkpoint and only compute actions to the selected agent.
If not, you will need to load the checkpoint in the environment you trained it and then to save the selected agent’s weights, which you will be able to load later.
You can see a code for example here.

Keep in mind that you’re saving only the weights and not the entire state like in a checkpoint.