Practical advice for RLlib hyperparameter tuning

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

I am curious if anyone has practical experiences to share about tuning hyperparameters in RLlib. Not so much about implementing it - that’s relatively straightforward with Ray Tune - but about which parameters you found most useful to tune, for instance. How do you handle complex parameters like neural network architecture? Do you tune just learning rate, or a whole learning rate schedule? Are there any search algorithms and schedulers that you found particularly useful (or useless)?

Thanks so much!

  • learning rates

  • exploration based params (e.g. temperature/entropy coefficients)

  • batch sizes (for off policy and offline rl)

  • grid search when tuning reward functions (this is applicable when designing rl environments)