Hey @RickLan . The policy I got from this approach is stochastic. I have raised another issue here for making learned policy deterministic. https://discuss.ray.io/t/getting-deterministic-policy-after-dqn-training/2237. would you like to share any insights on that?