Hey @RickLan . The policy I got from this approach is stochastic. I have raised another issue here for making learned policy deterministic. https://discuss.ray.io/t/getting-deterministic-policy-after-dqn-training/2237. would you like to share any insights on that?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| How to write a trainable - for tuning a deterministic policy? | 9 | 1041 | July 7, 2021 | |
| Policy rollout on Ray Tune 2.0 | 4 | 346 | December 15, 2022 | |
| Restoring RLlib Run Using Tuner.restore | 5 | 683 | February 17, 2024 | |
| Resuming/extending rllib tune experiments | 4 | 473 | November 4, 2023 | |
| RLLib Multiagent: Load only one policy from checkpoint & Compatibility of RLLib/Tune Checkpoints | 9 | 3382 | November 24, 2021 |