[RLLib] Distinguishing hyperparameter tuning from single excution of RL algorithm

Hi everyone,

I am trying to distinguish the commands for ‘hyperparameter tuning of a rllib algorithm’ from the commands for ‘one-time execution of same algorithm with a constant pre-defined values of needed hyperparameters’.
The page https://docs.ray.io/en/master/ray-overview/index.html#gentle-intro
shows a PPO example that uses tune.run() without any hyperparameter space, and the page https://docs.ray.io/en/master/tune/examples/pbt_ppo_example.html shows a PPO example of using tune.run() with a hyperparameter space (and a scheduler).
From this observation, I understand that same tune.run method can be used for both hyperparameter tuning and one-time RL training. If we provide hyperparameter search space and scheduler, it is tuning otherwise it is a single execution of RL algorithm. Please let me know whether my understanding is correct or not.

@Saurabh_Arora Your understanding is correct. I’m not familiar with PopulationBasedTraining. But let’s assume there is a way to turn off hyperparam space for it. In your second link, if you use scalar values instead of tune.choice, this would turn off hyperparam space search and become just be a one-time RL training.

3 Likes

Thanks, @RickLan . I have one more related question. I posted it here: https://discuss.ray.io/t/rlllib-how-to-use-policy-learned-in-tune-run/2222