Hi @SVH,
If you train with a stochastic policy then you would expect your best performance if you also inferere and evaluate with a stochastic policy. You should keep explore=True.
I am not sure if you have any preprocessors but I think I remember @arturn saying that preprocessors are applied with compute_single_action but not compute_actions.