Tuning fcnet_hiddens with RLlib PPO ValueError: loaded state dict

I am trying to tune fcnet_hiddens for a simple PPO default network with ray.tune, but it fails on restoring checkpoint. Here is what I did:

hiddens_layers = [3, 10, 20]
hiddens_width = [50, 100, 500]
config['model']['fcnet_hiddens'] = tune.choice([ [w] * l for w in hiddens_width for l in hiddens_layers ])
...
analysis = tune.run(PPOTrainer, config=config, num_samples=5)
checkpoint_path = analysis.get_best_checkpoint(
            metric="episode_reward_mean",
            mode="max",
            trial=analysis.trials[0]
        )

best_config = analysis.get_best_config()
best_config['explore'] = False
agent = PPOTrainer(
    env="my_env",
    config=best_config
)

agent.restore(checkpoint_path) # <<<<<<<< This fails with error

Error:
ValueError: loaded state dict contains a parameter group that doesn’t match the size of optimizer’s group

Any idea how to tune fcnet_hiddens?

analysis.get_best_checkpoint

depends on the trial and we need to find the best trial and to find the best config for it. Here is the running code:

hiddens_layers = [3, 10, 20]
hiddens_width = [50, 100, 500]
config['model']['fcnet_hiddens'] = tune.choice([ [w] * l for w in hiddens_width for l in hiddens_layers ])
...
analysis = tune.run(PPOTrainer, config=config, num_samples=5)
checkpoint_path = analysis.get_best_checkpoint(
            metric="episode_reward_mean",
            mode="max",
            trial=analysis.best_trial # instead of analysis.trials[0]
        )

best_config = analysis.get_best_config()
best_config['explore'] = False
agent = PPOTrainer(
    env="my_env",
    config=best_config
)

agent.restore(checkpoint_path)

@Dejan_Grubisic I am seeing the same error when restoring a checkpoint for a 3-layer network. However, if my network has only 2 layers, the restore works fine. See my post at ValueError when restoring checkpoint with PPO. I suspect you are seeing the same issue I was. I’m a little sleepy, but is it possible that your restore using checkpoint_path is loading a network with different structure than what is defined in best_config?