Ray Tune PBT/PBT2 with Transformers?

I would like to get started with PBT/PBT2 after seeing this presentation. I want to use it as part of Hugging Face Transformers’ Trainer. Some collaboration has happened between Ray and HF, and it is neat that some hyperparameter_search is implemented. There’s even a handy blog post!

In the blog post, an example is given. But looking through the source code of the implementation in Transformers, it is not clear to me which types of schedulers are supported. Can you give an example of how to use PopulationBasedTraining (and/or bandits) with the Transformers Trainer?

Hey @BramVanroy! All schedulers should be supported when doing transformers.hyperparameter_search!

For an example using PBT with HF transformers, you can take a look at this example.

Thank you for your reply, @amogkam!

I have two more questions, specifically with PB2 in mind. Looking at the example that you posted, I see one search space (tune_config) and then the search space in the scheduler itself. How do those work together? How is hp_space used alongside the optimization that goes on in the scheduler?

And second, after hyperparameter_search, how do we load the best/final model into the trainer’s model so that we can use it for inference or testing? Previously with just grid search I used to do it like so:

best_params = trainer.hyperparameter_search(...)

# Set the trainer to the best hyperparameters found
for hparam, v in best_params.hyperparameters.items():
    setattr(trainer.args, hparam, v)

# Save optimal hparams
with output_dir.joinpath("opt_hparams.json").open("w", encoding="utf-8") as hp_out:
    dump(best_params, hp_out, indent=4, sort_keys=True)

# Now train the model from-scratch with the best hparams
train_result = trainer.train()
# ... and then get predictions from the model 
predictions = trainer.predict()

But because PBT/PB2 work differently, I am not sure how to continue from here. Any help is appreciated!