ValueError when restoring checkpoint with PPO

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I have used Ray 2.0.0 to train a simple FCNet of 2 layers (256, 256) and store the result in a checkpoint. Later, I read in the checkpoint using

algo = ppo.PPO(config = config, env = env)

This works great to do inferences, although the network’s performance is so-so. However, if I train a model with 3 layers, e.g. [300, 128, 64], then the training works well, but restoring the checkpoint for inference results in the following error message: ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group.

I feel like there’s a config item that I should be setting for the optimizer, or something, but don’t find any guidance in the Ray docs. I would be grateful to anyone who can provide some guidance.

Lightbulb - the checkpoint is correctly storing the 3 layers of network structure, and associated info for the optimizer. What I was missing is in creating the algo I only used the default PPO config parameters, which specifies a 2-layer network. So the restore() method was trying to load a 3-layer checkpoint into a 2-layer structure. Once I specify config["model"]["fcnet_hiddens"] = [300, 128, 64] prior to creating algo, the restore works just fine.