(The questions I’m posting here will be fairly simple to answer for any experienced user. I’m still in the beginning stages with Ray and am just playing around with many of its utilities. So, any help will be nice!)
Hi,
I’ve run a PBT experiment on PPO with my custom simulator.
Among the 4 trials I ran, only 1 survived after six hours, as you can see from the figure below. I need help with understanding why the errors occurred. I have included the link to the error logs below. If someone can explain the reasons, it will be really helpful.
Furthermore, there are some things I do not understand about Ray’s implementation of PBT.
Below is part of my code that is concerned with PBT. As you can see, I’m trying to optimize these six hyper-parameters: lambda
, clip_param
, lr
, num_sgd_iter
, sgd_minibatch_size
, train_batch_size
.
The questions I have are:
- I used
tune.qrandint(128, 1024, 128)
, hoping to have the candidates in the search space to be rounded to the integer increments of 128 as the Tune API states. But, in thepbt_global.txt
file, I found values such as 307 and 153. How is this possible? - Can someone help me understand how to interpret the
pbt_global.txt
file? I’m very lost with it. I Here is the link topbt_global.txt
. I mainly want to know what to look to figure out where each trials changed their hyper-parameters using exploration and exploitation. This will be fairly simple for any experience user, I assume.
pbt = PopulationBasedTraining(
time_attr="training_iteration",
perturbation_interval=50,
metric="episode_reward_max",
mode="max",
hyperparam_mutations={
"lambda": lambda: random.uniform(0.95, 1.0),
"clip_param": lambda: random.uniform(0.01, 0.5),
"lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5],
"num_sgd_iter": lambda: random.randint(1, 30),
"sgd_minibatch_size": tune.qrandint(128, 1024, 128),
"train_batch_size": tune.qrandint(2_500, 7_500, 2_500)
}
)
results = tune.run(
"PPO",
name="PBT_PPO",
config=config,
checkpoint_freq=1,
stop={
"time_total_s": 43_200
},
checkpoint_score_attr="episode_reward_max",
scheduler=pbt,
num_samples=4,
local_dir=args.save_dir
)
Thank you!