I’m trying to figure out what hyperparameters were used in the best run. After running PBT, I’ve determined the best run using analysis.get_best_trial("valid_acc", "max", "last")
. This happened to be train_func_a5634_00031. Then according to instructions on replaying PBT, I checked on pbt_policy_a5634_00031.txt
. Here are the first few lines:
["28_beta=3.4159,eta=14.5899,perturbation=exponential,perturbation_shape=2.5071", "19_beta=23.0663,eta=3.2376,perturbation=exponential,perturbation_shape=0.9539", 4, 5, {"eta": 3.2376329198809803, "perturbation": "exponential", "perturbation_shape": 0.9539188581008086, "beta": 23.06629505569105, "checkpoint_interval": 1}, {"eta": 3.8851595038571762, "perturbation": "exponential", "perturbation_shape": 0.7631350864806469, "beta": 18.453036044552842, "checkpoint_interval": 1}]
["33_beta=1.2890,eta=42.9929,perturbation=frechet,perturbation_shape=0.1255", "28_beta=3.4159,eta=14.5899,perturbation=exponential,perturbation_shape=2.5071", 14, 15, {"eta": 3.8851595038571762, "perturbation": "exponential", "perturbation_shape": 0.7631350864806469, "beta": 18.453036044552842, "checkpoint_interval": 1}, {"eta": 4.6621914046286115, "perturbation": "frechet", "perturbation_shape": 0.9157621037767762, "beta": 22.14364325346341, "checkpoint_interval": 1}]
["2_beta=41.3943,eta=7.3419,perturbation=frechet,perturbation_shape=3.2063", "33_beta=1.2890,eta=42.9929,perturbation=frechet,perturbation_shape=0.1255", 24, 20, {"eta": 4.6621914046286115, "perturbation": "frechet", "perturbation_shape": 0.9157621037767762, "beta": 22.14364325346341, "checkpoint_interval": 1}, {"eta": 3.7297531237028894, "perturbation": "frechet", "perturbation_shape": 2.3887281733809287, "beta": 26.57237190415609, "checkpoint_interval": 1}]
["2_beta=41.3943,eta=7.3419,perturbation=frechet,perturbation_shape=3.2063", "5_beta=1.8296,eta=3.9611,perturbation=frechet,perturbation_shape=0.3617", 24, 25, {"eta": 3.961111000205571, "perturbation": "frechet", "perturbation_shape": 0.36171949436433276, "beta": 1.8295985509637074, "checkpoint_interval": 1}, {"eta": 15.059128010534868, "perturbation": "frechet", "perturbation_shape": 0.28937559549146624, "beta": 1.463678840770966, "checkpoint_interval": 1}]
But then I wanted to compare it with other trials and noticed that the exact same hyperparameters were saved in most of the other “pbt_policy” files. I’m really confused by this.
Then antoher debugging step I took was to check on the actual run of that experiment:train_func_a5634_00031_31_beta=2.0440,eta=36.8680,perturbation=frechet,perturbation_shape=1.4262_2023-08-21_06-39-46/result.json
. I cannot upload the file here due to size, but when I check on hyperparameters, “eta”, for example, goes like this:
36.867993479769176, 1.0005335007828944, 1.0005335007828944, 1.0005335007828944, 1.0005335007828944, 1.0005335007828944, 1.0005335007828944, 1.0005335007828944, 2.965632119567912, 2.965632119567912, 21.08605099791359]
in 5 iterations increments. Which is obviously very different from what’s in the pbt file. My questions is - which schedule should I use and why?
How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.