My thesis project involves using an RL policy to manage a hyperparameter tuning setup. This is kind of in reverse to what people usually do, which is use Ray Tune to tune RLLib hyperparameters. At this point I’m quite familiar with Ray Tune, but I’m having trouble figuring out how to integrate RLLib. I’ve spent a long time reading the documentation and some source code but I’m still very confused.
What I’ve tried to do already is implement a Ray Tune
Scheduler which takes the policy, action space, and observation space (in this case the action space is the set of hyperparameter values) and works similar to the implementations of PBT or PB2. However, it seems like that isn’t how I should be doing this.
Now, I’m trying to implement an
ExternalEnv which contains the state-action space, where the
run() method would contain all of my training/tuning code, and then the environment is passed into the
Scheduler, which will internally call
get_action(), etc. However I’m confused with where the Policy and Trainer fit into this. I’m assuming the environment is passed into the Trainer, but what about the Policy?
I have a dummy Policy implemented which just changes the hyperparameters based on simple logic but I don’t know where to fit it in.
I’m very confused and if someone could help me out that would be great.