Ray tune hyperband scheduler

Prasanth_vaidya · August 4, 2022, 2:58pm

I am trying to use a hyperband scheduler for hyperparameter tuning on BERT and I have set the following parameters:

         def my_hp_space(trail):
          from ray import tune
          return {

                    "per_device_train_batch_size": tune.choice([16,32]),
                    "weight_decay": tune.uniform(0.0, 0.3),
                    "learning_rate": tune.choice([1e-5, 3e-5, 2e-5, 4e-5,5e-5])
         }

hyperband = HyperBandScheduler(time_attr="training_iteration", 
                               metric="eval_acc",mode="max",
                               reduction_factor=3,  max_t=27)

 best_trail = trainer.hyperparameter_search(
                        hp_space= my_hp_space,
                        direction="maximize", progress_reporter=reporter,
                        backend="ray", resources_per_trial={"cpu": 25, "gpu": 1},
                        scheduler=hyperband, keep_checkpoints_num=1,
                        local_dir=config_dict["logging"]["hp_tuning"],    
                        name="hpband_Anonymized_data_hptuning",  log_to_file=True,  )

As per the documentation in “Tune Trial Schedulers (tune.schedulers) — Ray 2.8.0” setting max_t i.e R >=200, should result in large number of trails but the trails with hugging face trainer’s hyperparameter search always result in 20 trails.

Can any one suggest how can I alter the parameters for executing more trails using hyperband scheduler? I have tried setting the max_t values to 27, 100, 200, 1000 bu the trails always remain 20

Peter_Pirog · August 4, 2022, 3:05pm

@Prasanth_vaidya , value max_t is number of iterations in single trial not number of trials.

max_t – max time units per trial. Trials will be stopped after max_t time units (determined by time_attr) have passed.

Prasanth_vaidya · August 4, 2022, 3:17pm

Yeah, I understood from documentation that the max_t is time units, so I have changed it to 27 assuming that 27 training iterations are allocated to each resource.

This was the output when I have set the max_t to 10 iterations initially,

Can you please help me to understand how the epochs are allocated to the trials in this scenario

Peter_Pirog · August 4, 2022, 3:20pm

Check number of iterations for BERT training. Typically BERT is prepared to be train with very big language corpus, so number of iterations is small. If default number of iterations is 3 and max_t=27, the limit of trainig is the smallest value.
To check this idea You can try max_t=2 and if your experiments will be stopped after 2 iterrations I supose that this is a reason.

Prasanth_vaidya · August 4, 2022, 3:29pm

Sure, I will try this and I understood the reason for the 20 trails since it is the default value from the huggingface hyperparameter search. I assumed that hyperband determines the number of trails.

Topic		Replies	Views
Difficulty Controlling Hyperband number of samples and minimum budget Ray Tune	2	576	May 27, 2021
Ray Tune PBT/PBT2 with Transformers? Ray Tune	3	607	June 29, 2022
Running Tune with nonparallel function Ray Tune	3	300	May 21, 2021
How ray tune hyperband schedule generates and stop trials?	0	228	May 5, 2023
Bert population based hyperparameter tuning with huggingface and raytune Ray Tune	2	648	August 1, 2022

Ray tune hyperband scheduler

Related topics