Compute_single_action(obs, state) of policy and algo: different performance

I trained a model in an environment using PPO.

I restored the model as an algo and a policy from a same checkpoint.

The (average) episode rewards of the two were quite different, which was unexpected.
I checked their actions computed with same observations using compute_single_action(obs).
However, the actions were also significantly different even if I set explore=False.
The weights of the model were the same in the algo and the policy.

I expected they behave the same when using compute_single_action(obs, explore=False).
Did I miss something?
One theory would be: compute_single_action()works a bit differently depending on the object of the method, algo vs policy. This was because another policy from =algo.get_policy() showed almost identical average performance to the policy from the checkpoint, rather than the algo

I did the same thing for default PPO in CartPole-v1. But the average behavior was almost identical… It makes me more confused…