Hello,
I have been frustrated training an agent for a challenging environment. I want to use curriculum training, but have had to kludge the whole curriculum together into a single Tuner run, which quickly becomes unwieldy. Maybe I’m thinking about the whole use of HP tuning incorrectly, but here’s what seems logical to me:
- Specify a new model structure & initialize (good HPs are unknown)
- Use Tune to search the HP space over “lesson 1” of the curriculum
- Save the best model from step 2 (presumably a checkpoint) as the basis for lesson 2
- Use Tune to search the HP space over “lesson 2” of the curriculum
- Save the best model from step 4 as the basis for more challenging tasks/lessons
I have no reason to believe that the exact same HPs will be ideal on all lessons. In fact, I would expect, at a minimum, that learning rate would need to be adjusted for some lessons.
The problem is that Tune only seems to store bulky checkpoints that record everything about the current tuning session, so that none of the HPs , environment, or anything else can be altered for a future round of tuning. I feel like the above sequence would be a normal way of doing business, and therefore, flexible and transparent checkpoint handling would be a top priority. Since they are not, I must conclude this is not the way most people think about training an agent. What is a better approach?
On the other hand, if this does sound like a solid approach, I would appreciate some guidance in using the checkpoints accordingly. It feels like Ray.Train provides the kind of checkpoint I want, which is nothing more than the raw NN weights. But I can’t figure out how to extract these from a Tuner checkpoint or how to inject partially trained weights into a new Tuner job.
At the very least, If making this work is possible by any means, it would certainly be nice to see it documented in the Tune user guide somewhere. The Ray docs generally tend to only show elementary examples, and could stand to include some more real-world complexities.
Thanks in advance for any advice!