How to obtain the best schedule of hyperparameters from Population Based Training?

I’m trying to figure out what hyperparameters were used in the best run. After running PBT, I’ve determined the best run using analysis.get_best_trial("valid_acc", "max", "last") . This happened to be train_func_a5634_00031. Then according to instructions on replaying PBT, I checked on pbt_policy_a5634_00031.txt. Here are the first few lines:

["28_beta=3.4159,eta=14.5899,perturbation=exponential,perturbation_shape=2.5071", "19_beta=23.0663,eta=3.2376,perturbation=exponential,perturbation_shape=0.9539", 4, 5, {"eta": 3.2376329198809803, "perturbation": "exponential", "perturbation_shape": 0.9539188581008086, "beta": 23.06629505569105, "checkpoint_interval": 1}, {"eta": 3.8851595038571762, "perturbation": "exponential", "perturbation_shape": 0.7631350864806469, "beta": 18.453036044552842, "checkpoint_interval": 1}]
["33_beta=1.2890,eta=42.9929,perturbation=frechet,perturbation_shape=0.1255", "28_beta=3.4159,eta=14.5899,perturbation=exponential,perturbation_shape=2.5071", 14, 15, {"eta": 3.8851595038571762, "perturbation": "exponential", "perturbation_shape": 0.7631350864806469, "beta": 18.453036044552842, "checkpoint_interval": 1}, {"eta": 4.6621914046286115, "perturbation": "frechet", "perturbation_shape": 0.9157621037767762, "beta": 22.14364325346341, "checkpoint_interval": 1}]
["2_beta=41.3943,eta=7.3419,perturbation=frechet,perturbation_shape=3.2063", "33_beta=1.2890,eta=42.9929,perturbation=frechet,perturbation_shape=0.1255", 24, 20, {"eta": 4.6621914046286115, "perturbation": "frechet", "perturbation_shape": 0.9157621037767762, "beta": 22.14364325346341, "checkpoint_interval": 1}, {"eta": 3.7297531237028894, "perturbation": "frechet", "perturbation_shape": 2.3887281733809287, "beta": 26.57237190415609, "checkpoint_interval": 1}]
["2_beta=41.3943,eta=7.3419,perturbation=frechet,perturbation_shape=3.2063", "5_beta=1.8296,eta=3.9611,perturbation=frechet,perturbation_shape=0.3617", 24, 25, {"eta": 3.961111000205571, "perturbation": "frechet", "perturbation_shape": 0.36171949436433276, "beta": 1.8295985509637074, "checkpoint_interval": 1}, {"eta": 15.059128010534868, "perturbation": "frechet", "perturbation_shape": 0.28937559549146624, "beta": 1.463678840770966, "checkpoint_interval": 1}]

But then I wanted to compare it with other trials and noticed that the exact same hyperparameters were saved in most of the other “pbt_policy” files. I’m really confused by this.

Then antoher debugging step I took was to check on the actual run of that experiment:train_func_a5634_00031_31_beta=2.0440,eta=36.8680,perturbation=frechet,perturbation_shape=1.4262_2023-08-21_06-39-46/result.json. I cannot upload the file here due to size, but when I check on hyperparameters, “eta”, for example, goes like this:

36.867993479769176, 1.0005335007828944, 1.0005335007828944, 1.0005335007828944, 1.0005335007828944, 1.0005335007828944, 1.0005335007828944, 1.0005335007828944, 2.965632119567912, 2.965632119567912, 21.08605099791359]

in 5 iterations increments. Which is obviously very different from what’s in the pbt file. My questions is - which schedule should I use and why?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.