Compute_single_action(obs, state) of policy and algo: different performance

I trained a model in an environment using PPO.

I restored the model as an algo and a policy from a same checkpoint.

The (average) episode rewards of the two were quite different, which was unexpected.
I checked their actions computed with same observations using compute_single_action(obs).
However, the actions were also significantly different even if I set explore=False.
The weights of the model were the same in the algo and the policy.

I expected they behave the same when using compute_single_action(obs, explore=False).
Did I miss something?
One theory would be: compute_single_action()works a bit differently depending on the object of the method, algo vs policy. This was because another policy from =algo.get_policy() showed almost identical average performance to the policy from the checkpoint, rather than the algo

I did the same thing for default PPO in CartPole-v1. But the average behavior was almost identical… It makes me more confused…

You have to be careful with this since Algorithm may apply additional transformations to inputs and outputs of Policy.
Since you are not running into any issues here, I’ll just shoot into the dark:
Please check the outputs of Policy and Algo and see if Policy produces normalized outputs while Algo produces unsquashed outputs.