Tune results: How to check if best solution of environment is reached

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello Ray community,

for the particular case described in the following, I am using Tune intentionally due to the Weights & Biases logger callback option.

I am wondering about a good way how to check after tuner.fit() if the resulting config leads to the best solution if executed in environment and to check how this best solution looks alike in terms of action sequence.

For small problem instances of my gymasium environment, e.g. with an episode length of 3, I can compute that separately. However, for bigger instances, we should automize that with the outcome described above.

Maybe, something similar to RLlib’s algo.compute_single_action(obs) would help?

Awaiting your suggestions!